nagios instability

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
mon-team
Posts: 171
Joined: Thu Jun 28, 2012 9:22 am

nagios instability

Post by mon-team »

Hello Support,
since yesterday morning our nagios infrastructure is particularly unstable, Nagios becomes irresponsive (it is not possible to subdue command,it's very slow) and the web interface crashes with the error message to repair MySql tables.
We can not figure out if the problem on the database is the root cause or the effect. The logs of nagios, mysql and the /var/log/messages do not show error messages.
Could you tell us which log to check and eventually enable to carry on the troubleshooting?

We have the embedded perl disabled, NSCA at 2.9version, used RAM doesn't exceed the 3GB over 16GB available.
We are running Nagios XI 2014 R.2.7 on a CentOS 6.6, Nagios Core Version 4.0.8.

Regards,
Francesco
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: nagios instability

Post by npolovenko »

@mon-team, How many hosts and services are you monitoring with this XI? I'd start by checking the disk space with df -h and making sure your partitions are not maxed out. Next, run the following command to repair mysql:
mysqlcheck -r -f -uroot -pnagiosxi --all-databases --use_frm
If this doesn't fix the issue please send in your system profile.
To send us your system profile. Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and send it to me in a private message.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
mon-team
Posts: 171
Joined: Thu Jun 28, 2012 9:22 am

Re: nagios instability

Post by mon-team »

Dear npolovenko,
naturally, one of the first thinghs i've checked is disk space and all the other system parameters.
Everything is working fine, the server is doing nothing (the average load avverage on 1m,5m,15m is around 5 and we have actually 8 cores).
We are monitoring more or less 1500 servers and 15.000 services, with 5 worker managed by mod-gearmand.
I've attached the required system profile.
Regards
Francesco
You do not have the required permissions to view the files attached to this post.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: nagios instability

Post by npolovenko »

@mon-team, Looks like your max apache connections limit was reached. Please show me the output of this config:
cat /etc/httpd/conf/httpd.conf
Also, please run this command and show me the output:
grep -iRl "MaxClients" /etc/
*It's going to take a while to run because it'll search the whole etc folder for configurations with MaxClients.

Also, it looks like you have lots of MRTG configs in the following folder (5830):
/var/lib/mrtg/
This is not related to the current problem but i'm sure it contributes to the system load.
I suggest going over the configs in this folder at some point and seeing if any of the devices are no longer monitored. If find some feel free to delete the corresponding .cfg configs. That will stop nagios from constantly polling snmp info.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
mon-team
Posts: 171
Joined: Thu Jun 28, 2012 9:22 am

Re: nagios instability

Post by mon-team »

Thanks for your suggestion. I've attached the http.conf so you can tread it easily.
The command "grep -iRl "MaxClients" /etc/" shows no output.

About the MRTG i'm deleting all rrd files no longer used.
Regards,
Francesco
You do not have the required permissions to view the files attached to this post.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: nagios instability

Post by npolovenko »

@mon-team, Let's increase the number for MaxClients to 500 in your httpd.conf file. I can see that it's mentioned in two places so let's change both. Ideally, you'd reboot the server after this with:
shutdown -r now
But otherwise the apache restart should be sufficient:
server httpd restart
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked