Nagios stopping aftter applyconfig
Posted: Wed Aug 26, 2020 10:02 am
Hi
I have some daily automation that updates nagios and applies the configuration using the nagiosxi rest api's.
Periodically after that job runs, not always, the nagios daemon will stop shortly after. Looking where I might troubleshoot this, its possible this may have started happening after I did the last upgrade...
NagiosXi version 5.7.2, we have 3 mod_gearman_workers
Red Hat Enterprise Linux Server release 7.6
the server seems to have more than enough resources
/var/log/messages
Aug 26 08:24:50 vnl1654 nagios: SERVICE ALERT: VNL1351;System Load;WARNING;HARD;3;Load : 1.72 3.49 3.88 : 3.88 > 3.0 : WARNING
Aug 26 08:24:50 vnl1654 nagios: SERVICE ALERT: VNL978;CPU Usage;WARNING;SOFT;1;CPU used 89.0% (>80) : WARNING
Aug 26 08:24:51 vnl1654 systemd[1]: Stopping Nagios Core 4.4.6...
Aug 26 08:24:51 vnl1654 nagios: Caught SIGTERM, shutting down...
Aug 26 08:24:51 vnl1654 nagios: Caught SIGTERM, shutting down...
Aug 26 08:24:51 vnl1654 nagios: Caught SIGTERM, shutting down...
Aug 26 08:24:51 vnl1654 nagios: Successfully shutdown... (PID=12362)
Aug 26 08:24:56 vnl1654 nagios: Event broker module '/usr/local/nagios/bin/ndo.so' deinitialized successfully.
Aug 26 08:24:56 vnl1654 nagios: Event broker module '/usr/lib64/mod_gearman/mod_gearman_nagios4.o' deinitialized successfully.
Aug 26 08:24:56 vnl1654 systemd[1]: Stopped Nagios Core 4.4.6.
Aug 26 08:24:56 vnl1654 systemd[1]: Starting Nagios Core 4.4.6...
Aug 26 08:24:56 vnl1654 nagios[25906]: Nagios Core 4.4.6
Aug 26 08:24:56 vnl1654 nagios[25906]: Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Aug 26 08:24:56 vnl1654 nagios[25906]: Copyright (c) 1999-2009 Ethan Galstad
Aug 26 08:24:56 vnl1654 nagios[25906]: Last Modified: 2020-04-28
Aug 26 08:24:56 vnl1654 nagios[25906]: License: GPL
Aug 26 08:24:56 vnl1654 nagios[25906]: Website: https://www.nagios.org
Aug 26 08:24:56 vnl1654 nagios[25906]: Reading configuration data...
Aug 26 08:24:56 vnl1654 nagios[25906]: Read main config file okay...
Aug 26 08:24:56 vnl1654 nagios[25906]: Read object config files okay...
Aug 26 08:24:56 vnl1654 nagios[25906]: Running pre-flight check on configuration data...
Aug 26 08:24:56 vnl1654 nagios[25906]: Checking objects...
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 13677 services.
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 1523 hosts.
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 7 host groups.
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 2 service groups.
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 128 contacts.
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 19 contact groups.
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 206 commands.
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 8 time periods.
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 0 host escalations.
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 0 service escalations.
Aug 26 08:24:56 vnl1654 nagios[25906]: Checking for circular paths...
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 1523 hosts
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 0 service dependencies
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 0 host dependencies
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 8 timeperiods
Aug 26 08:24:56 vnl1654 nagios[25906]: Checking global event handlers...
Aug 26 08:24:56 vnl1654 nagios[25906]: Checking obsessive compulsive processor commands...
Aug 26 08:24:56 vnl1654 nagios[25906]: Checking misc settings...
Aug 26 08:24:56 vnl1654 nagios[25906]: Total Warnings: 0
Aug 26 08:24:56 vnl1654 nagios[25906]: Total Errors: 0
Aug 26 08:24:56 vnl1654 nagios[25906]: Things look okay - No serious problems were detected during the pre-flight check
Aug 26 08:24:56 vnl1654 systemd[1]: Started Nagios Core 4.4.6.
Aug 26 08:24:56 vnl1654 nagios: Nagios 4.4.6 starting... (PID=25910)
Aug 26 08:24:56 vnl1654 nagios: Local time is Wed Aug 26 08:24:56 MDT 2020
Aug 26 08:24:56 vnl1654 nagios: LOG VERSION: 2.0
Aug 26 08:24:56 vnl1654 nagios: qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
Aug 26 08:24:56 vnl1654 nagios: qh: core query handler registered
Aug 26 08:24:56 vnl1654 nagios: qh: echo service query handler registered
Aug 26 08:24:56 vnl1654 nagios: qh: help for the query handler registered
Aug 26 08:24:56 vnl1654 nagios: wproc: Successfully registered manager as @wproc with query handler
Aug 26 08:24:56 vnl1654 nagios: wproc: Registry request: name=Core Worker 25912;pid=25912
Aug 26 08:24:56 vnl1654 nagios: wproc: Registry request: name=Core Worker 25914;pid=25914
Aug 26 08:24:56 vnl1654 nagios: wproc: Registry request: name=Core Worker 25911;pid=25911
Aug 26 08:24:57 vnl1654 nagios: wproc: Registry request: name=Core Worker 25913;pid=25913
Aug 26 08:24:57 vnl1654 nagios: Event broker module '/usr/local/nagios/bin/ndo.so' initialized successfully.
Aug 26 08:24:57 vnl1654 nagios: mod_gearman: initialized version 3.0.7 (libgearman 0.33)
Aug 26 08:24:57 vnl1654 nagios: Event broker module '/usr/lib64/mod_gearman/mod_gearman_nagios4.o' initialized successfully.
Aug 26 08:24:57 vnl1654 nagios: WARNING: RLIMIT_NPROC is 63450, total max estimated processes is 73496! You should increase your limits (ulimit -u, or limits.conf)
Aug 26 08:25:01 vnl1654 systemd[1]: Created slice User Slice of root.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260615 of user root.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260616 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Created slice User Slice of pcp.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260614 of user pcp.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260619 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260617 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260618 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260620 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260621 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260622 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260623 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260624 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260625 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260626 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Created slice User Slice of questusr.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260628 of user questusr.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260627 of user nagios.
Aug 26 08:25:01 vnl1654 chgfmon[26049]: Failed to stat() file /etc/opt/quest/vas/users.deny, skipping
Aug 26 08:25:01 vnl1654 chgfmon[26049]: Failed to build file records for class user-deny
Aug 26 08:25:01 vnl1654 chgfmon[26049]: Failed to build file records for class user-maps
Aug 26 08:25:02 vnl1654 systemd[1]: Removed slice User Slice of questusr.
Aug 26 08:25:02 vnl1654 systemd[1]: Removed slice User Slice of root.
Aug 26 08:25:03 vnl1654 systemd[1]: Removed slice User Slice of pcp.
Aug 26 08:25:05 vnl1654 nagios: Successfully launched command file worker with pid 26277
Aug 26 08:25:07 vnl1654 snmpd[1715]: Connection from UDP: [199.215.83.194]:47757->[10.40.0.16]:161
Aug 26 08:25:19 vnl1654 nagios: SERVICE ALERT: VNL1285;System Load;OK;HARD;3;Load : 0.00 0.08 2.55 : OK
Aug 26 08:25:31 vnl1654 dbus[1443]: [system] Activating service name='org.freedesktop.problems' (using servicehelper)
Aug 26 08:25:31 vnl1654 dbus[1443]: [system] Successfully activated service 'org.freedesktop.problems'
Aug 26 08:25:33 vnl1654 systemd[1]: Created slice User Slice of root.
Aug 26 08:25:33 vnl1654 systemd[1]: Started Session c17177 of user root.
Aug 26 08:25:33 vnl1654 systemd[1]: Removed slice User Slice of root.
Aug 26 08:25:45 vnl1654 systemd[1]: Created slice User Slice of root.
Aug 26 08:25:45 vnl1654 systemd[1]: Started Session c17178 of user root.
Aug 26 08:25:45 vnl1654 systemd[1]: Removed slice User Slice of root.
Aug 26 08:25:59 vnl1654 nagios: SERVICE ALERT: VNL978;CPU Usage;WARNING;SOFT;2;CPU used 86.0% (>80) : WARNING
Aug 26 08:25:59 vnl1654 nagios: SERVICE ALERT: VNL539;netsnmp Memory Usage;OK;HARD;5;Memory Utilization OK - %used_real is 97.93%, total_real is 7821.5 MB, avail_real is 162.1 MB, cached is 6537.7 MB, buffer is 0.0 MB, %user_real is 19.82%, %cached_real is 83.59%, %buffer_real is 0.00%
Aug 26 08:25:59 vnl1654 nagios: Caught SIGSEGV, shutting down...
Aug 26 08:25:59 vnl1654 systemd[1]: nagios.service: main process exited, code=exited, status=254/n/a
Aug 26 08:25:59 vnl1654 kill[26568]: kill: cannot find process ""
Aug 26 08:25:59 vnl1654 systemd[1]: nagios.service: control process exited, code=exited status=1
Aug 26 08:25:59 vnl1654 nagios: Caught SIGTERM, shutting down...
Aug 26 08:25:59 vnl1654 systemd[1]: Unit nagios.service entered failed state.
Aug 26 08:25:59 vnl1654 systemd[1]: nagios.service failed.
I have some daily automation that updates nagios and applies the configuration using the nagiosxi rest api's.
Periodically after that job runs, not always, the nagios daemon will stop shortly after. Looking where I might troubleshoot this, its possible this may have started happening after I did the last upgrade...
NagiosXi version 5.7.2, we have 3 mod_gearman_workers
Red Hat Enterprise Linux Server release 7.6
the server seems to have more than enough resources
/var/log/messages
Aug 26 08:24:50 vnl1654 nagios: SERVICE ALERT: VNL1351;System Load;WARNING;HARD;3;Load : 1.72 3.49 3.88 : 3.88 > 3.0 : WARNING
Aug 26 08:24:50 vnl1654 nagios: SERVICE ALERT: VNL978;CPU Usage;WARNING;SOFT;1;CPU used 89.0% (>80) : WARNING
Aug 26 08:24:51 vnl1654 systemd[1]: Stopping Nagios Core 4.4.6...
Aug 26 08:24:51 vnl1654 nagios: Caught SIGTERM, shutting down...
Aug 26 08:24:51 vnl1654 nagios: Caught SIGTERM, shutting down...
Aug 26 08:24:51 vnl1654 nagios: Caught SIGTERM, shutting down...
Aug 26 08:24:51 vnl1654 nagios: Successfully shutdown... (PID=12362)
Aug 26 08:24:56 vnl1654 nagios: Event broker module '/usr/local/nagios/bin/ndo.so' deinitialized successfully.
Aug 26 08:24:56 vnl1654 nagios: Event broker module '/usr/lib64/mod_gearman/mod_gearman_nagios4.o' deinitialized successfully.
Aug 26 08:24:56 vnl1654 systemd[1]: Stopped Nagios Core 4.4.6.
Aug 26 08:24:56 vnl1654 systemd[1]: Starting Nagios Core 4.4.6...
Aug 26 08:24:56 vnl1654 nagios[25906]: Nagios Core 4.4.6
Aug 26 08:24:56 vnl1654 nagios[25906]: Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Aug 26 08:24:56 vnl1654 nagios[25906]: Copyright (c) 1999-2009 Ethan Galstad
Aug 26 08:24:56 vnl1654 nagios[25906]: Last Modified: 2020-04-28
Aug 26 08:24:56 vnl1654 nagios[25906]: License: GPL
Aug 26 08:24:56 vnl1654 nagios[25906]: Website: https://www.nagios.org
Aug 26 08:24:56 vnl1654 nagios[25906]: Reading configuration data...
Aug 26 08:24:56 vnl1654 nagios[25906]: Read main config file okay...
Aug 26 08:24:56 vnl1654 nagios[25906]: Read object config files okay...
Aug 26 08:24:56 vnl1654 nagios[25906]: Running pre-flight check on configuration data...
Aug 26 08:24:56 vnl1654 nagios[25906]: Checking objects...
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 13677 services.
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 1523 hosts.
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 7 host groups.
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 2 service groups.
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 128 contacts.
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 19 contact groups.
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 206 commands.
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 8 time periods.
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 0 host escalations.
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 0 service escalations.
Aug 26 08:24:56 vnl1654 nagios[25906]: Checking for circular paths...
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 1523 hosts
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 0 service dependencies
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 0 host dependencies
Aug 26 08:24:56 vnl1654 nagios[25906]: Checked 8 timeperiods
Aug 26 08:24:56 vnl1654 nagios[25906]: Checking global event handlers...
Aug 26 08:24:56 vnl1654 nagios[25906]: Checking obsessive compulsive processor commands...
Aug 26 08:24:56 vnl1654 nagios[25906]: Checking misc settings...
Aug 26 08:24:56 vnl1654 nagios[25906]: Total Warnings: 0
Aug 26 08:24:56 vnl1654 nagios[25906]: Total Errors: 0
Aug 26 08:24:56 vnl1654 nagios[25906]: Things look okay - No serious problems were detected during the pre-flight check
Aug 26 08:24:56 vnl1654 systemd[1]: Started Nagios Core 4.4.6.
Aug 26 08:24:56 vnl1654 nagios: Nagios 4.4.6 starting... (PID=25910)
Aug 26 08:24:56 vnl1654 nagios: Local time is Wed Aug 26 08:24:56 MDT 2020
Aug 26 08:24:56 vnl1654 nagios: LOG VERSION: 2.0
Aug 26 08:24:56 vnl1654 nagios: qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
Aug 26 08:24:56 vnl1654 nagios: qh: core query handler registered
Aug 26 08:24:56 vnl1654 nagios: qh: echo service query handler registered
Aug 26 08:24:56 vnl1654 nagios: qh: help for the query handler registered
Aug 26 08:24:56 vnl1654 nagios: wproc: Successfully registered manager as @wproc with query handler
Aug 26 08:24:56 vnl1654 nagios: wproc: Registry request: name=Core Worker 25912;pid=25912
Aug 26 08:24:56 vnl1654 nagios: wproc: Registry request: name=Core Worker 25914;pid=25914
Aug 26 08:24:56 vnl1654 nagios: wproc: Registry request: name=Core Worker 25911;pid=25911
Aug 26 08:24:57 vnl1654 nagios: wproc: Registry request: name=Core Worker 25913;pid=25913
Aug 26 08:24:57 vnl1654 nagios: Event broker module '/usr/local/nagios/bin/ndo.so' initialized successfully.
Aug 26 08:24:57 vnl1654 nagios: mod_gearman: initialized version 3.0.7 (libgearman 0.33)
Aug 26 08:24:57 vnl1654 nagios: Event broker module '/usr/lib64/mod_gearman/mod_gearman_nagios4.o' initialized successfully.
Aug 26 08:24:57 vnl1654 nagios: WARNING: RLIMIT_NPROC is 63450, total max estimated processes is 73496! You should increase your limits (ulimit -u, or limits.conf)
Aug 26 08:25:01 vnl1654 systemd[1]: Created slice User Slice of root.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260615 of user root.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260616 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Created slice User Slice of pcp.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260614 of user pcp.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260619 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260617 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260618 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260620 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260621 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260622 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260623 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260624 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260625 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260626 of user nagios.
Aug 26 08:25:01 vnl1654 systemd[1]: Created slice User Slice of questusr.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260628 of user questusr.
Aug 26 08:25:01 vnl1654 systemd[1]: Started Session 260627 of user nagios.
Aug 26 08:25:01 vnl1654 chgfmon[26049]: Failed to stat() file /etc/opt/quest/vas/users.deny, skipping
Aug 26 08:25:01 vnl1654 chgfmon[26049]: Failed to build file records for class user-deny
Aug 26 08:25:01 vnl1654 chgfmon[26049]: Failed to build file records for class user-maps
Aug 26 08:25:02 vnl1654 systemd[1]: Removed slice User Slice of questusr.
Aug 26 08:25:02 vnl1654 systemd[1]: Removed slice User Slice of root.
Aug 26 08:25:03 vnl1654 systemd[1]: Removed slice User Slice of pcp.
Aug 26 08:25:05 vnl1654 nagios: Successfully launched command file worker with pid 26277
Aug 26 08:25:07 vnl1654 snmpd[1715]: Connection from UDP: [199.215.83.194]:47757->[10.40.0.16]:161
Aug 26 08:25:19 vnl1654 nagios: SERVICE ALERT: VNL1285;System Load;OK;HARD;3;Load : 0.00 0.08 2.55 : OK
Aug 26 08:25:31 vnl1654 dbus[1443]: [system] Activating service name='org.freedesktop.problems' (using servicehelper)
Aug 26 08:25:31 vnl1654 dbus[1443]: [system] Successfully activated service 'org.freedesktop.problems'
Aug 26 08:25:33 vnl1654 systemd[1]: Created slice User Slice of root.
Aug 26 08:25:33 vnl1654 systemd[1]: Started Session c17177 of user root.
Aug 26 08:25:33 vnl1654 systemd[1]: Removed slice User Slice of root.
Aug 26 08:25:45 vnl1654 systemd[1]: Created slice User Slice of root.
Aug 26 08:25:45 vnl1654 systemd[1]: Started Session c17178 of user root.
Aug 26 08:25:45 vnl1654 systemd[1]: Removed slice User Slice of root.
Aug 26 08:25:59 vnl1654 nagios: SERVICE ALERT: VNL978;CPU Usage;WARNING;SOFT;2;CPU used 86.0% (>80) : WARNING
Aug 26 08:25:59 vnl1654 nagios: SERVICE ALERT: VNL539;netsnmp Memory Usage;OK;HARD;5;Memory Utilization OK - %used_real is 97.93%, total_real is 7821.5 MB, avail_real is 162.1 MB, cached is 6537.7 MB, buffer is 0.0 MB, %user_real is 19.82%, %cached_real is 83.59%, %buffer_real is 0.00%
Aug 26 08:25:59 vnl1654 nagios: Caught SIGSEGV, shutting down...
Aug 26 08:25:59 vnl1654 systemd[1]: nagios.service: main process exited, code=exited, status=254/n/a
Aug 26 08:25:59 vnl1654 kill[26568]: kill: cannot find process ""
Aug 26 08:25:59 vnl1654 systemd[1]: nagios.service: control process exited, code=exited status=1
Aug 26 08:25:59 vnl1654 nagios: Caught SIGTERM, shutting down...
Aug 26 08:25:59 vnl1654 systemd[1]: Unit nagios.service entered failed state.
Aug 26 08:25:59 vnl1654 systemd[1]: nagios.service failed.