nagios.service restart breaks checks
Posted: Tue Nov 14, 2023 7:28 am
I'm trying to track down a few issues with a Nagios Core implementation. The first problem, which may actually fix my second is when I make a configuration change, validate using /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg, and restart the nagios.service, it "breaks" the application. A reboot of the entire server fixes these errors and the application conducts checks. Looking for someone to steer me in the right direction. Here is my nagios.log following a nagios.service kick.
root@nagioshost etc]# systemctl restart nagios.service
[root@nagioshost etc]# ls
cgi.cfg cgi.cfg.bak htpasswd.users nagios.cfg objects resource.cfg
[root@nagioshost etc]# cd ..
[root@nagioshost nagios]# tail -f var/nagios.log
[1699945778] Caught SIGTERM, shutting down...
[1699945778] Successfully shutdown... (PID=1620)
[1699945778] Nagios 4.4.13 starting... (PID=4069163)
[1699945778] Local time is Tue Nov 14 08:09:38 CET 2023
[1699945778] LOG VERSION: 2.0
[1699945778] qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
[1699945778] qh: core query handler registered
[1699945778] qh: echo service query handler registered
[1699945778] qh: help for the query handler registered
[1699945778] wproc: Successfully registered manager as @wproc with query handler
[1699945796] Successfully launched command file worker with pid 4069913
[1699945796] Unable to send check for host 'epo website' to worker (ret=-2)
[1699945796] Unable to run check for service 'System Uptime' on host 'exchange mail'
[1699945796] Unable to run check for service 'Process Count' on host 'tenable sc'
[1699945796] Unable to send check for host 'backup 1' to worker (ret=-2)
[1699945806] Unable to send check for host 'backup 2' to worker (ret=-2)
[1699945807] Unable to run check for service 'System Uptime' on host 'sql server'
[1699945807] Unable to run check for service 'Memory Usage' on host 'tenable sc'
[1699945807] Unable to send check for host 'owa website' to worker (ret=-2)
[1699945807] Unable to send check for host 'sql server' to worker (ret=-2)
[1699945808] Unable to run check for service 'Disk Space C:' on host 'chat'
[1699945811] Unable to run check for service 'CPU Usage' on host 'file server'
[1699945812] Unable to run check for service 'Chat Client' on host 'chat'
[1699945816] Unable to run check for service 'System Uptime' on host 'chat'
[1699945817] Unable to run check for service 'Memory Usage' on host 'backup 2'
root@nagioshost etc]# systemctl restart nagios.service
[root@nagioshost etc]# ls
cgi.cfg cgi.cfg.bak htpasswd.users nagios.cfg objects resource.cfg
[root@nagioshost etc]# cd ..
[root@nagioshost nagios]# tail -f var/nagios.log
[1699945778] Caught SIGTERM, shutting down...
[1699945778] Successfully shutdown... (PID=1620)
[1699945778] Nagios 4.4.13 starting... (PID=4069163)
[1699945778] Local time is Tue Nov 14 08:09:38 CET 2023
[1699945778] LOG VERSION: 2.0
[1699945778] qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
[1699945778] qh: core query handler registered
[1699945778] qh: echo service query handler registered
[1699945778] qh: help for the query handler registered
[1699945778] wproc: Successfully registered manager as @wproc with query handler
[1699945796] Successfully launched command file worker with pid 4069913
[1699945796] Unable to send check for host 'epo website' to worker (ret=-2)
[1699945796] Unable to run check for service 'System Uptime' on host 'exchange mail'
[1699945796] Unable to run check for service 'Process Count' on host 'tenable sc'
[1699945796] Unable to send check for host 'backup 1' to worker (ret=-2)
[1699945806] Unable to send check for host 'backup 2' to worker (ret=-2)
[1699945807] Unable to run check for service 'System Uptime' on host 'sql server'
[1699945807] Unable to run check for service 'Memory Usage' on host 'tenable sc'
[1699945807] Unable to send check for host 'owa website' to worker (ret=-2)
[1699945807] Unable to send check for host 'sql server' to worker (ret=-2)
[1699945808] Unable to run check for service 'Disk Space C:' on host 'chat'
[1699945811] Unable to run check for service 'CPU Usage' on host 'file server'
[1699945812] Unable to run check for service 'Chat Client' on host 'chat'
[1699945816] Unable to run check for service 'System Uptime' on host 'chat'
[1699945817] Unable to run check for service 'Memory Usage' on host 'backup 2'