Nagios XI - Crashed
Posted: Tue Sep 06, 2016 4:51 am
Hi,
Wondering could you help me with trying to figure out the reason why nagios services stopped in the early hours of this morning for 2 hrs?
In the nagios.log I could see the following
No issues or any entries seen in the mysqld.log either that could attribute to this.
Wondering could you help me with trying to figure out the reason why nagios services stopped in the early hours of this morning for 2 hrs?
In the nagios.log I could see the following
Code: Select all
[1473123760] SERVICE ALERT: JW8F5Z1.mgmt;RT-OWN0-01 Networking;CRITICAL;SOFT;1;ESX3 CRITICAL - HOST-VM NET Unknown error
[1473123767] wproc: Core Worker 19304: job 346 (pid=30380) timed out. Killing it
[1473123767] wproc: GLOBAL SERVICE EVENTHANDLER job 346 from worker Core Worker 19304 timed out after 30.01s
[1473123767] wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
[1473123767] wproc: stderr line 01: PHP Deprecated: Comments starting with '#' are deprecated in /etc/php.ini on line 946 in Unknown on line 0
[1473123767] Warning: Global service event handler command '/usr/bin/php /usr/local/nagiosxi/scripts/handle_nagioscore_event.php --handler-type=service --host="GWDSS22.mgmt" --service="SB00B Memory" --hostaddress="172.17.4.9" --hoststate=UP --hoststateid=0 --hosteventid=0 --hostproblemid=0 --servicestate=OK --servicestateid=0 --lastservicestate=CRITICAL --lastservicestateid=2 --servicestatetype=SOFT --currentattempt=2 --maxattempts=5 --serviceeventid=191407 --serviceproblemid=0 --serviceoutput="ESX3 OK - SB00B mem usage=2.99 %" --longserviceoutput="" --servicedowntime=0' timed out after 0.00 seconds
[1473123767] wproc: Core Worker 19304: job 346 (pid=30380): Dormant child reaped
[1473123790] wproc: Core Worker 19303: job 403 (pid=31869) timed out. Killing it
[1473123790] wproc: GLOBAL SERVICE EVENTHANDLER job 403 from worker Core Worker 19303 timed out after 30.01s
[1473123790] wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
[1473[1473132376] Warning: A system time change of 1712 seconds (0d 0h 28m 32s forwards in time) has been detected. Compensating...
[1473132459] SERVICE ALERT: localhost;Current Load;OK;HARD;4;OK - load average: 2.58, 1.46, 1.16