Issue with ndomod? Nagios core freezing

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
sigmainformatique
Posts: 74
Joined: Mon Apr 23, 2018 8:11 am

Issue with ndomod? Nagios core freezing

Post by sigmainformatique »

Hello,

As I am making a new XI platform, I have a strange behaviour : Nagios XI is freezing with ndomod, about 30seconds after restarting.

Configuration :
Developpement plateform
MariaDB offloaded
Gearman with 2 workers

About 2000 tests hosts and 20000 services created with the api, by following guidelines in your guide.

No messages in logfile, just nagios freezing.
If I remove ndomod broker call in nagios.cfg, all is working great (150services checks/s)
When nagios freeze, ndo2db continue to write in mariadb for about 30-40s then stop because Nagios do not générate more calls. This i why i think there is an issure with ndomod.
Tried to extand ndomod buffer, but no improvement.

Do you have any idea? My production environment will be about 4000hosts/50000 services.
Regards
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Issue with ndomod? Nagios core freezing

Post by cdienger »

How much memory is on each system and what is the spawn-rate and max-worker options set to in the modgearman worker.conf? Run /usr/bin/gearman_top2 on the XI server and keep an eye on the values there to get an idea of what you may need to increase the values too. Make sure to restart the gearman server and workers after making any changes to the config.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
sigmainformatique
Posts: 74
Joined: Mon Apr 23, 2018 8:11 am

Re: Issue with ndomod? Nagios core freezing

Post by sigmainformatique »

Hello,

Thank you for your reply.

I think we have found the reason of the issue. it was not due to Gearman, but ndo/mariadb.
The problem was : by default the "generic host" template event handlers are activated (!).
This issue occurs when a large amount of services are CRITICAL. This case could possibly occurs in production, for exemple if we lose a datacenter.

The large amount of handlers causes an issue with /usr/local/nagiosxi/cron/event_handler.php that make a > 100% cpu load for mariadb.

Do you have good practices about partitioning maria db for XI or something like this? Note I have followed instructions in https://assets.nagios.com/downloads/nag ... zation.pdf

My maria DB CPU load is about 30% in normal condition, I think it is big.

Thank you in advance
Regards
Guillaume
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Issue with ndomod? Nagios core freezing

Post by cdienger »

Were jobs getting stuck in the eventhandler queue? You want to make sure that any scripts that the eventhandler calls are found on the workers as well as the XI machine. I would still recommend increasing the spawn rate and max workers as this will increase the number of workers for the eventhandler queue as well. https://support.nagios.com/kb/article/n ... s-513.html also outlines how to increase the maximum number of connections to Mariadb(818 max).
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
sigmainformatique
Posts: 74
Joined: Mon Apr 23, 2018 8:11 am

Re: Issue with ndomod? Nagios core freezing

Post by sigmainformatique »

Thank you,

For solving the issue, I have :
- truncated some ndo and nagios xi tables (meta)
- cleaned all gearman queues
- ran the nagiosxi script for reparing database.

Now, Nagios works well with events handlers enabled. I am at a 80 checks/s and all seems good. I will continue my testing.

I have another issue : the message "Availability data is not available when monitoring engine is not running." in reports.

- My monitoring engine is running and displayed correctly in "Home"/"Process info" page
- I have checked all configurations described in https://assets.nagios.com/downloads/nag ... giosXI.pdf (I use the ramdisk configuration). All configurations are ok and I can see files appearing/disappearing in /var/nagiosramdisk/spool/xidpe directory.
- Done instructions like in https://support.nagios.com/forum/viewto ... 16&t=43595

What can I check to solve this issue? Any idea?

Thank you in advance
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Issue with ndomod? Nagios core freezing

Post by cdienger »

The reports are created by data in /usr/local/nagios/var/nagios.log and /usr/local/nagios/var/archives - both locations are set in the nagios.cfg. Can you review it and these specifically(log_archive_path and log_file lines) and restart with:

service nagios restart

If this continues to be a problem, please open a new thread.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked