Page 1 of 1
Issue with ndomod? Nagios core freezing
Posted: Thu Apr 26, 2018 12:27 pm
by sigmainformatique
Hello,
As I am making a new XI platform, I have a strange behaviour : Nagios XI is freezing with ndomod, about 30seconds after restarting.
Configuration :
Developpement plateform
MariaDB offloaded
Gearman with 2 workers
About 2000 tests hosts and 20000 services created with the api, by following guidelines in your guide.
No messages in logfile, just nagios freezing.
If I remove ndomod broker call in nagios.cfg, all is working great (150services checks/s)
When nagios freeze, ndo2db continue to write in mariadb for about 30-40s then stop because Nagios do not générate more calls. This i why i think there is an issure with ndomod.
Tried to extand ndomod buffer, but no improvement.
Do you have any idea? My production environment will be about 4000hosts/50000 services.
Regards
Re: Issue with ndomod? Nagios core freezing
Posted: Thu Apr 26, 2018 4:56 pm
by cdienger
How much memory is on each system and what is the spawn-rate and max-worker options set to in the modgearman worker.conf? Run /usr/bin/gearman_top2 on the XI server and keep an eye on the values there to get an idea of what you may need to increase the values too. Make sure to restart the gearman server and workers after making any changes to the config.
Re: Issue with ndomod? Nagios core freezing
Posted: Fri Apr 27, 2018 9:21 am
by sigmainformatique
Hello,
Thank you for your reply.
I think we have found the reason of the issue. it was not due to Gearman, but ndo/mariadb.
The problem was : by default the "generic host" template event handlers are activated (!).
This issue occurs when a large amount of services are CRITICAL. This case could possibly occurs in production, for exemple if we lose a datacenter.
The large amount of handlers causes an issue with /usr/local/nagiosxi/cron/event_handler.php that make a > 100% cpu load for mariadb.
Do you have good practices about partitioning maria db for XI or something like this? Note I have followed instructions in
https://assets.nagios.com/downloads/nag ... zation.pdf
My maria DB CPU load is about 30% in normal condition, I think it is big.
Thank you in advance
Regards
Guillaume
Re: Issue with ndomod? Nagios core freezing
Posted: Fri Apr 27, 2018 10:05 am
by cdienger
Were jobs getting stuck in the eventhandler queue? You want to make sure that any scripts that the eventhandler calls are found on the workers as well as the XI machine. I would still recommend increasing the spawn rate and max workers as this will increase the number of workers for the eventhandler queue as well.
https://support.nagios.com/kb/article/n ... s-513.html also outlines how to increase the maximum number of connections to Mariadb(818 max).
Re: Issue with ndomod? Nagios core freezing
Posted: Fri May 04, 2018 7:43 am
by sigmainformatique
Thank you,
For solving the issue, I have :
- truncated some ndo and nagios xi tables (meta)
- cleaned all gearman queues
- ran the nagiosxi script for reparing database.
Now, Nagios works well with events handlers enabled. I am at a 80 checks/s and all seems good. I will continue my testing.
I have another issue : the message "Availability data is not available when monitoring engine is not running." in reports.
- My monitoring engine is running and displayed correctly in "Home"/"Process info" page
- I have checked all configurations described in
https://assets.nagios.com/downloads/nag ... giosXI.pdf (I use the ramdisk configuration). All configurations are ok and I can see files appearing/disappearing in /var/nagiosramdisk/spool/xidpe directory.
- Done instructions like in
https://support.nagios.com/forum/viewto ... 16&t=43595
What can I check to solve this issue? Any idea?
Thank you in advance
Re: Issue with ndomod? Nagios core freezing
Posted: Fri May 04, 2018 12:29 pm
by cdienger
The reports are created by data in /usr/local/nagios/var/nagios.log and /usr/local/nagios/var/archives - both locations are set in the nagios.cfg. Can you review it and these specifically(log_archive_path and log_file lines) and restart with:
service nagios restart
If this continues to be a problem, please open a new thread.