Page 1 of 1

nagios instability

Posted: Fri Feb 15, 2019 3:15 am
by mon-team
Hello Support,
since yesterday morning our nagios infrastructure is particularly unstable, Nagios becomes irresponsive (it is not possible to subdue command,it's very slow) and the web interface crashes with the error message to repair MySql tables.
We can not figure out if the problem on the database is the root cause or the effect. The logs of nagios, mysql and the /var/log/messages do not show error messages.
Could you tell us which log to check and eventually enable to carry on the troubleshooting?

We have the embedded perl disabled, NSCA at 2.9version, used RAM doesn't exceed the 3GB over 16GB available.
We are running Nagios XI 2014 R.2.7 on a CentOS 6.6, Nagios Core Version 4.0.8.

Regards,
Francesco

Re: nagios instability

Posted: Fri Feb 15, 2019 11:43 am
by npolovenko
@mon-team, How many hosts and services are you monitoring with this XI? I'd start by checking the disk space with df -h and making sure your partitions are not maxed out. Next, run the following command to repair mysql:
mysqlcheck -r -f -uroot -pnagiosxi --all-databases --use_frm
If this doesn't fix the issue please send in your system profile.
To send us your system profile. Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and send it to me in a private message.

Re: nagios instability

Posted: Wed Feb 20, 2019 9:53 am
by mon-team
Dear npolovenko,
naturally, one of the first thinghs i've checked is disk space and all the other system parameters.
Everything is working fine, the server is doing nothing (the average load avverage on 1m,5m,15m is around 5 and we have actually 8 cores).
We are monitoring more or less 1500 servers and 15.000 services, with 5 worker managed by mod-gearmand.
I've attached the required system profile.
Regards
Francesco

Re: nagios instability

Posted: Wed Feb 20, 2019 2:04 pm
by npolovenko
@mon-team, Looks like your max apache connections limit was reached. Please show me the output of this config:
cat /etc/httpd/conf/httpd.conf
Also, please run this command and show me the output:
grep -iRl "MaxClients" /etc/
*It's going to take a while to run because it'll search the whole etc folder for configurations with MaxClients.

Also, it looks like you have lots of MRTG configs in the following folder (5830):
/var/lib/mrtg/
This is not related to the current problem but i'm sure it contributes to the system load.
I suggest going over the configs in this folder at some point and seeing if any of the devices are no longer monitored. If find some feel free to delete the corresponding .cfg configs. That will stop nagios from constantly polling snmp info.

Re: nagios instability

Posted: Thu Feb 21, 2019 6:05 am
by mon-team
Thanks for your suggestion. I've attached the http.conf so you can tread it easily.
The command "grep -iRl "MaxClients" /etc/" shows no output.

About the MRTG i'm deleting all rrd files no longer used.
Regards,
Francesco

Re: nagios instability

Posted: Thu Feb 21, 2019 2:05 pm
by npolovenko
@mon-team, Let's increase the number for MaxClients to 500 in your httpd.conf file. I can see that it's mentioned in two places so let's change both. Ideally, you'd reboot the server after this with:
shutdown -r now
But otherwise the apache restart should be sufficient:
server httpd restart