Page 1 of 1

Nagios XI 5.7 Reconfiguration temporary hang

Posted: Wed Jun 10, 2020 10:38 am
by GldRush98
Has anyone seen a reconfiguration temporarily hang? Sometimes a reconfiguration pops by super quick like it should, and some times it is hanging for 90+ seconds.
Here is the difference in the nagios.log I have seen;

Good reconfigure, the shutdown happens instantly (entire reload time: 5 seconds, normal):

Code: Select all

[1591802585] Caught SIGTERM, shutting down...
[1591802585] Caught SIGTERM, shutting down...
[1591802585] Caught SIGTERM, shutting down...
[1591802585] Successfully shutdown... (PID=29307)
[1591802585] NDO-3: Callbacks deregistered
[1591802590] NDO-3: NDO - Shutdown complete
[1591802590] Event broker module '/usr/local/nagios/bin/ndo.so' deinitialized successfully.
[1591802590] Nagios 4.4.6 starting... (PID=30446)
[1591802590] Local time is Wed Jun 10 10:23:10 CDT 2020
[1591802590] LOG VERSION: 2.0
[1591802590] qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
[1591802590] qh: core query handler registered
[1591802590] qh: echo service query handler registered
[1591802590] qh: help for the query handler registered
[1591802590] wproc: Successfully registered manager as @wproc with query handler
[1591802590] wproc: Registry request: name=Core Worker 30449;pid=30449
[1591802590] wproc: Registry request: name=Core Worker 30450;pid=30450
[1591802590] wproc: Registry request: name=Core Worker 30451;pid=30451
[1591802590] wproc: Registry request: name=Core Worker 30452;pid=30452
[1591802590] NDO-3: NDO 3.0.0 (c) Copyright 2009-2020 Nagios - Nagios Core Development Team
[1591802590] NDO-3: Database initialized
[1591802590] NDO-3: Database initialized
[1591802590] NDO-3: Callbacks registered
[1591802590] NDO-3: Callbacks registered
[1591802590] Event broker module '/usr/local/nagios/bin/ndo.so' initialized successfully.
[1591802590] Successfully launched command file worker with pid 30463
[1591802590] NDO-3: Database initialized
And on one that hangs, you can see it waiting 91 seconds before restarting Nagios. It's weird there is never a successful shutdown message, it just does from shutting down to starting:

Code: Select all

[1591802609] Caught SIGTERM, shutting down...
[1591802609] Caught SIGTERM, shutting down...
[1591802609] Caught SIGTERM, shutting down...
[1591802700] Nagios 4.4.6 starting... (PID=31045)
[1591802700] Local time is Wed Jun 10 10:25:00 CDT 2020
[1591802700] LOG VERSION: 2.0
[1591802700] qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
[1591802700] qh: core query handler registered
[1591802700] qh: echo service query handler registered
[1591802700] qh: help for the query handler registered
[1591802700] wproc: Successfully registered manager as @wproc with query handler
[1591802700] wproc: Registry request: name=Core Worker 31050;pid=31050
[1591802700] wproc: Registry request: name=Core Worker 31051;pid=31051
[1591802700] wproc: Registry request: name=Core Worker 31052;pid=31052
[1591802700] wproc: Registry request: name=Core Worker 31049;pid=31049
[1591802700] NDO-3: NDO 3.0.0 (c) Copyright 2009-2020 Nagios - Nagios Core Development Team
[1591802700] NDO-3: Database initialized
[1591802700] NDO-3: Database initialized
[1591802700] NDO-3: Callbacks registered
[1591802700] NDO-3: Callbacks registered
[1591802700] Event broker module '/usr/local/nagios/bin/ndo.so' initialized successfully.
[1591802700] NDO-3: Database initialized
It's not a huge deal in the grand scheme of things, since reconfigurations are still successfully applying, but it does seem like something that shouldn't be happening.

Re: Nagios XI 5.7 Reconfiguration temporary hang

Posted: Wed Jun 10, 2020 4:58 pm
by ssax
It tries to wait for the current checks to finish, sometimes that impacts it because it's trying to stop gracefully.

Is this EL7? If so, please send the output of this command as you may be using the old /etc/init.d/nagios file:

Code: Select all

systemctl status nagios
Otherwise you can increase it following this:

https://support.nagios.com/kb/article/n ... r-172.html

Thank you

Re: Nagios XI 5.7 Reconfiguration temporary hang

Posted: Thu Jun 11, 2020 8:28 am
by GldRush98
It is CentOS7. I don't think I should have any checks taking more than a few seconds to run.

Code: Select all

[root@nagios ~]# systemctl status nagios
● nagios.service - Nagios Core 4.4.6
   Loaded: loaded (/usr/lib/systemd/system/nagios.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2020-06-10 10:40:39 CDT; 21h ago
     Docs: https://www.nagios.org/documentation
  Process: 5204 ExecStopPost=/bin/rm -f /usr/local/nagios/var/rw/nagios.cmd (code=exited, status=0/SUCCESS)
  Process: 5199 ExecStop=/bin/kill -s TERM ${MAINPID} (code=exited, status=0/SUCCESS)
  Process: 5207 ExecStart=/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg (code=exited, status=0/SUCCESS)
  Process: 5206 ExecStartPre=/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg (code=exited, status=0/SUCCESS)
 Main PID: 5210 (nagios)
   CGroup: /system.slice/nagios.service
           ├─5210 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
           ├─5213 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
           ├─5214 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
           ├─5215 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
           ├─5216 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
           └─5236 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

Code: Select all

[root@nagios ~]# cat /etc/init.d/nagios
cat: /etc/init.d/nagios: No such file or directory

Re: Nagios XI 5.7 Reconfiguration temporary hang

Posted: Thu Jun 11, 2020 5:16 pm
by ssax
That info looks proper. Sometimes that happens and it's normal for it to take a little longer if a bunch of checks are queued up.

You can try making these changes (bugs in XI 5.7):

https://support.nagios.com/forum/viewto ... 61#p311261

Before you apply configuration, run this command so we can see what's occurring:

Code: Select all

tail -Fn0 /usr/local/nagiosxi/var/cmdsubsys.log
That should at least give an indication if there is anything else going on.

Please PM me a copy of your profile as well so I can look through the logs, you can download it from Admin > System Profile > Download Profile.

Additionally, please send the output of these commands (as root):
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the first command if your DB is offloaded to another server and/or you've changed the root mysql password

Code: Select all

echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table