How event handlers work and why timeout happens in execution
Posted: Tue Oct 24, 2017 2:38 am
I have issues with nagios
-> event handler call for the 3rd time doesnt happen before the service reacheds CRITICAL HARD state
this is wat happents
1. event handler call with CRITICAL SOFT 1 - success but service is till doen
2. event handler call with CRITICAL SOFT 2 - call is success but app is down
3. event handler call with CRITICAL SOFT 3 doesnt happen.. error from nagios.log is as follows
[1508751396] SERVICE ALERT: localhost;xxx-service;CRITICAL;SOFT;3;CRITICAL - Socket timeout after 10 seconds
[1508751396] SERVICE EVENT HANDLER: localhost;xxx-service;CRITICAL;SOFT;3;recover-service!xx!xx!prod!xx!1.0.246!/TOMEE/instances/xx_1.0.246.pid
[1508751426] wproc: Core Worker 41603: job 239 (pid=6855) timed out. Killing it
[1508751426] wproc: SERVICE EVENTHANDLER job 239 from worker Core Worker 41603 timed out after 30.03s
[1508751426] wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
[1508751426] Warning: Service event handler command '/u/gls/Monitoring/nagios/scripts/serviceRecovery.sh CRITICAL SOFT 3 ' timed out after 0.00 seconds
[1508751426] wproc: Core Worker 41603: job 239 (pid=6855): Dormant child reaped
the configuration file had 30 sec for event handler timeout .
Wanted to know why this event handler is getting timedout.. how to do that ?
-> event handler call for the 3rd time doesnt happen before the service reacheds CRITICAL HARD state
this is wat happents
1. event handler call with CRITICAL SOFT 1 - success but service is till doen
2. event handler call with CRITICAL SOFT 2 - call is success but app is down
3. event handler call with CRITICAL SOFT 3 doesnt happen.. error from nagios.log is as follows
[1508751396] SERVICE ALERT: localhost;xxx-service;CRITICAL;SOFT;3;CRITICAL - Socket timeout after 10 seconds
[1508751396] SERVICE EVENT HANDLER: localhost;xxx-service;CRITICAL;SOFT;3;recover-service!xx!xx!prod!xx!1.0.246!/TOMEE/instances/xx_1.0.246.pid
[1508751426] wproc: Core Worker 41603: job 239 (pid=6855) timed out. Killing it
[1508751426] wproc: SERVICE EVENTHANDLER job 239 from worker Core Worker 41603 timed out after 30.03s
[1508751426] wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
[1508751426] Warning: Service event handler command '/u/gls/Monitoring/nagios/scripts/serviceRecovery.sh CRITICAL SOFT 3 ' timed out after 0.00 seconds
[1508751426] wproc: Core Worker 41603: job 239 (pid=6855): Dormant child reaped
the configuration file had 30 sec for event handler timeout .
Wanted to know why this event handler is getting timedout.. how to do that ?