Page 1 of 1

nagios xi gets unusable after an apply configuration

Posted: Wed Oct 10, 2018 10:37 am
by mon-team
Hello Support,
from the last week we started to experience great problems on our Nagios XI platform, due to slowness problems.
We often reach high values of cpu usage (near 100%) but the major problems start when we do an 'Apply Configuration'. During this operation we noticed that the number of services in unknown state increases quickly, and this has never happened.
Passive checks are no more received (i suppose that during an 'Apply configuration' they are put in a queue and processed when the nagios process is up again) and this let the freshness parameter to be exceeded. Some active checks also goes in unknown state.
Sometimes, after the 'Apply configuration', the nagios process remains up for a while and then stops.
The 'Apply configuration' itself last a lot of time (more or less 2 minutes).
The great number of unknown services triggers a series of operations which lead Nagios XI to be unusable.

We noticed that the 'mysqld' process takes a lot of CPU.

Here some information:
* Nagios XI 2014R2.7 on CentOs 6.6
* 4 worker with gearmand 0.33
* ~15000 services on ~1500hosts

What we have done till now:
* check of the last inserted configurations: no issues have been detected
* services: there are no services running in a particularly long time
* check of mysql tables: no error or corrupted indexes have been found
* nagios log: increased the verbosity, no errors found
* nagios XI server and all worker restarted, but nothing changed

What could you suggest us?
Regards
Francesco

Re: nagios xi gets unusable after an apply configuration

Posted: Wed Oct 10, 2018 1:25 pm
by cdienger
Was the machine near 100% already before last week? Were a lot of hosts or services added last week when this started or was there other configuration changes at the time? Applying the configuration requires a lot of reading and writing from the database and file system and it isn't uncommon for it to take bit of time to apply with larger configs. The apply is only going to spike things more if the machine is already consistently almost at 100%.

Increasing the default values in php.ini can sometimes help with the apply process - see https://support.nagios.com/kb/article/n ... e-611.html and I would also suggest reviewing https://assets.nagios.com/downloads/nag ... ios-XI.pdf. Adjusting the reaper settings as described can have a positive impact overall as well as installing a ramdisk.

Re: nagios xi gets unusable after an apply configuration

Posted: Wed Oct 10, 2018 3:05 pm
by mon-team
Thanks cdienger for your reply.
The server was not close to 100% last week, and we didn't add a consistent number of hosts or services. We also tried to rollback to the situation of the last week, unsuccesfully.
For this the reason it is very hard for us to find the root cause of this issue.
Tomorrow in the morning, (now for us is almost night), i'll read the suggested articles and i'll try to tune the configuration.

About the database i have two question:
* we have a very old version of Mysql Server (5.1): is there an article on how to upgrade the only Mysql database on a Nagios XI installation?
* do you think that the PostgresQL database used by Nagios could affect the performaces in some way? (is one of the few point we have still not investigated)

And finally: is it normal that, while performing an 'Apply Configuration', passive checks results get lost?

Thanks,
Francesco

Re: nagios xi gets unusable after an apply configuration

Posted: Thu Oct 11, 2018 10:29 am
by cdienger
It's not normal to miss passive results but with the other behaviors described it isn't surprising.

We don't have a guide for updating the mysql database. I would suggest taking the usual precaution of making backups of XI before upgrading the database if you go this route.

I don't have any reason to suspect postgres at the moment - do you see any errors in logs in /var/lib/pgsql/data/pg_log/postgresql/ or elsewhere?

You can vacuum the postgres database:

psql nagiosxi postgres -c "vacuum;"
psql nagiosxi postgres -c "vacuum analyze;"
psql nagiosxi postgres -c "vacuum full;"