Hello Support,
since yesterday morning our nagios infrastructure is particularly unstable, Nagios becomes irresponsive (it is not possible to subdue command,it's very slow) and the web interface crashes with the error message to repair MySql tables.
We can not figure out if the problem on the database is the root cause or the effect. The logs of nagios, mysql and the /var/log/messages do not show error messages.
Could you tell us which log to check and eventually enable to carry on the troubleshooting?
We have the embedded perl disabled, NSCA at 2.9version, used RAM doesn't exceed the 3GB over 16GB available.
We are running Nagios XI 2014 R.2.7 on a CentOS 6.6, Nagios Core Version 4.0.8.
Regards,
Francesco
nagios instability
-
npolovenko
- Support Tech
- Posts: 3457
- Joined: Mon May 15, 2017 5:00 pm
Re: nagios instability
@mon-team, How many hosts and services are you monitoring with this XI? I'd start by checking the disk space with df -h and making sure your partitions are not maxed out. Next, run the following command to repair mysql:
If this doesn't fix the issue please send in your system profile.mysqlcheck -r -f -uroot -pnagiosxi --all-databases --use_frm
To send us your system profile. Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and send it to me in a private message.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: nagios instability
Dear npolovenko,
naturally, one of the first thinghs i've checked is disk space and all the other system parameters.
Everything is working fine, the server is doing nothing (the average load avverage on 1m,5m,15m is around 5 and we have actually 8 cores).
We are monitoring more or less 1500 servers and 15.000 services, with 5 worker managed by mod-gearmand.
I've attached the required system profile.
Regards
Francesco
naturally, one of the first thinghs i've checked is disk space and all the other system parameters.
Everything is working fine, the server is doing nothing (the average load avverage on 1m,5m,15m is around 5 and we have actually 8 cores).
We are monitoring more or less 1500 servers and 15.000 services, with 5 worker managed by mod-gearmand.
I've attached the required system profile.
Regards
Francesco
You do not have the required permissions to view the files attached to this post.
-
npolovenko
- Support Tech
- Posts: 3457
- Joined: Mon May 15, 2017 5:00 pm
Re: nagios instability
@mon-team, Looks like your max apache connections limit was reached. Please show me the output of this config:
Also, it looks like you have lots of MRTG configs in the following folder (5830):
I suggest going over the configs in this folder at some point and seeing if any of the devices are no longer monitored. If find some feel free to delete the corresponding .cfg configs. That will stop nagios from constantly polling snmp info.
Also, please run this command and show me the output:cat /etc/httpd/conf/httpd.conf
*It's going to take a while to run because it'll search the whole etc folder for configurations with MaxClients.grep -iRl "MaxClients" /etc/
Also, it looks like you have lots of MRTG configs in the following folder (5830):
This is not related to the current problem but i'm sure it contributes to the system load./var/lib/mrtg/
I suggest going over the configs in this folder at some point and seeing if any of the devices are no longer monitored. If find some feel free to delete the corresponding .cfg configs. That will stop nagios from constantly polling snmp info.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: nagios instability
Thanks for your suggestion. I've attached the http.conf so you can tread it easily.
The command "grep -iRl "MaxClients" /etc/" shows no output.
About the MRTG i'm deleting all rrd files no longer used.
Regards,
Francesco
The command "grep -iRl "MaxClients" /etc/" shows no output.
About the MRTG i'm deleting all rrd files no longer used.
Regards,
Francesco
You do not have the required permissions to view the files attached to this post.
-
npolovenko
- Support Tech
- Posts: 3457
- Joined: Mon May 15, 2017 5:00 pm
Re: nagios instability
@mon-team, Let's increase the number for MaxClients to 500 in your httpd.conf file. I can see that it's mentioned in two places so let's change both. Ideally, you'd reboot the server after this with:
But otherwise the apache restart should be sufficient:shutdown -r now
server httpd restart
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.