22k hosts, 134k services - 45 mins to apply config?

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
DennisPR
Posts: 149
Joined: Mon May 07, 2012 10:34 am

Re: 22k hosts, 134k services - 45 mins to apply config?

Post by DennisPR »

Hi,

We have upgraded our server to 5.8.1 and that doesn't seem to solve the issue.
These are the last lines of the upgrade loggging :

Code: Select all

Things look okay - No serious problems were detected during the pre-flight check
> Return Code: 0
--------------------------------------
PHP Deprecated:  __autoload() is deprecated, use spl_autoload_register() instead in /usr/local/nagiosxi/html/includes/phpmailer/PHPMailerAutoload.php on line 45
PHP Deprecated:  Function get_magic_quotes_gpc() is deprecated in /usr/local/nagiosxi/html/includes/utils.inc.php on line 256
PHP Deprecated:  Function get_magic_quotes_gpc() is deprecated in /usr/local/nagiosxi/html/includes/utils.inc.php on line 256
PHP Deprecated:  Array and string offset access syntax with curly braces is deprecated in /usr/local/nagiosxi/html/includes/components/ldap_ad_integration/adLDAP/src/classes/adLDAPUsers.php on line 520

Nagios XI Upgrade Complete!
---------------------------
It now seems that the Monitoring engine even stops after 15 to 20 minute every time we start it
I have found the following error in the nagios.log. I don't know if this has anythng to do with it
[1611850535] wproc: Successfully registered manager as @wproc with query handler
[1611850535] wproc: Registry request: name=Core Worker 15864;pid=15864
[1611850535] wproc: Registry request: name=Core Worker 15867;pid=15867
[1611850535] wproc: Registry request: name=Core Worker 15868;pid=15868
[1611850535] wproc: Registry request: name=Core Worker 15866;pid=15866
[1611850535] wproc: Registry request: name=Core Worker 15872;pid=15872
[1611850535] wproc: Registry request: name=Core Worker 15871;pid=15871
[1611850535] wproc: Registry request: name=Core Worker 15869;pid=15869
[1611850535] wproc: Registry request: name=Core Worker 15865;pid=15865
[1611850535] wproc: Registry request: name=Core Worker 15875;pid=15875
[1611850535] wproc: Registry request: name=Core Worker 15870;pid=15870
[1611850535] wproc: Registry request: name=Core Worker 15874;pid=15874
[1611850535] wproc: Registry request: name=Core Worker 15873;pid=15873
[1611850535] NDO-3: NDO 3.0.5 (c) Copyright 2009-2020 Nagios - Nagios Core Development Team
[1611850535] NDO-3: Callbacks registered
[1611850535] NDO-3: Callbacks registered
[1611850535] Event broker module '/usr/local/nagios/bin/ndo.so' initialized successfully.
[1611850535] mod_gearman: initialized version 3.3.0 (libgearman 1.1.19.1)
[1611850535] Event broker module '/usr/lib64/mod_gearman/mod_gearman_nagios4.o' initialized successfully.
[1611850538] WARNING: RLIMIT_NPROC is 127954, total max estimated processes is 335072! You should increase your limits (ulimit -u, or limits.conf)
[1611850539] NDO-3: Started timed_event thread
[1611850539] NDO-3: Started event_handler thread
[1611850539] NDO-3: Started service_check thread
[1611850539] NDO-3: Started host_check thread
[1611850539] NDO-3: Started comment thread
[1611850539] NDO-3: Started downtime thread
[1611850539] NDO-3: Started flapping thread
[1611850539] NDO-3: Started service_status thread
[1611850539] NDO-3: Started host_status thread
[1611850539] NDO-3: Started contact_status thread
[1611850539] NDO-3: Started acknowledgement thread
[1611850539] NDO-3: Started statechange thread
[1611850539] NDO-3: Started notification thread
[1611850539] NDO-3: Ended contact_status thread

When the process stops I see the following in the logging
[1611851302] NDO-3: Ended downtime thread
[1611851302] NDO-3: Ended comment thread
[1611851302] NDO-3: Ended flapping thread
[1611851302] NDO-3: Ended acknowledgement thread
[1611851302] NDO-3: Ended notification thread
[1611851302] NDO-3: Ended event_handler thread
[1611851303] HOST ALERT: 4094.sw01.fr.action;DOWN;SOFT;1;CRITICAL - 10.40.94.2 rta 1074.336ms >= 1000.000ms
[1611851303] NDO-3: Ended statechange thread
[1611851312] Caught SIGSEGV, shutting down...
[1611851312] Caught SIGTERM, shutting down...

I will send you a new profile.zip by pm
Hope you can help us out
Dennis
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: 22k hosts, 134k services - 45 mins to apply config?

Post by benjaminsmith »

Hi Dennis,

If you continue to experience the monitoring engine stopping like that, I would recommend opening a support ticket and referencing this from the post.

https://support.nagios.com/tickets/

Since the db is offloaded, the database log was not in the profile, but you may have corrupted table issue causing ndo3 to kill the Nagios process. Run the repair script as root.

Code: Select all

/usr/local/nagiosxi/scripts/repair_databases.sh
If this issue persists, I would recommend stepping downgrading the ndo to the previous version. Here are the instructions for downgrading with an offloaded databse.

### DOWNGRADE NDO WITH OFFLOADED DB

Code: Select all

service nagios stop
cd /tmp
rm -rf /tmp/nagiosxi
wget https://assets.nagios.com/downloads/nagiosxi/5/xi-5.6.14.tar.gz
tar zxf xi-5.6.14.tar.gz
cd /tmp/nagiosxi
# START OFFLOADED DB SECTION - If you have an offloaded DB you'll need to do these things:
Edit /tmp/nagiosxi/xi-sys.cfg and update 'mysqlpass' value.
Edit /tmp/nagiosxi/subcomponents/ndoutils/mods/cfg/ndo2db.cfg and update 'db_host', 'db_user', and 'db_pass' values.
Edit /tmp/nagiosxi/subcomponents/ndoutils/install and /tmp/nagiosxi/subcomponents/ndoutils/post-install to update all calls to mysql to include -h <db_ip>
# END OFFLOADED DB SECTION
cd /tmp/nagiosxi/subcomponents/ndoutils
./install
chkconfig ndo2db on
service ndo2db start
A couple of other items, since this is a large system, please follow the steps in the article below to increase the max connections for the database.
https://support.nagios.com/kb/article.php?id=513

To increase the limits on the number of processes, edit the /etc/security/limits.conf file and add the following to the bottom of the file.

Code: Select all

*          soft     nproc          262144
*          hard     nproc          262144
Save the change and reboot the server for the change to take effect.
See: https://www.thegeekdiary.com/how-to-set ... -rhel-567/

Minor issue but the PHP notification and warning messages are advisory messages and do not indicate a serious error. You may want to turn those off just to avoid having to use extra system resources to log those messages.

For more details: https://www.php.net/manual/en/function. ... orting.php

Lastly, I noticed the last profile has fewer services, 79007 vs 134708, is this from a different server?

Best Regards,
Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
DennisPR
Posts: 149
Joined: Mon May 07, 2012 10:34 am

Re: 22k hosts, 134k services - 45 mins to apply config?

Post by DennisPR »

I had a session with ssax and he changed soms stuff on the server.
He also asked me to send you 2 log files.
You should have received them in a pm.

Regards,

Dennis
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: 22k hosts, 134k services - 45 mins to apply config?

Post by benjaminsmith »

Hi Dennis,

Sean Sax responded in a PM with some recommendations (e.g. moving the db back to the localhost and kernel message queue settings). Please update the thread once you have a chance to read his response and make those changes.

Best Regards,
Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked