Page 1 of 1

Timeout without warning?

Posted: Wed Mar 12, 2014 5:16 am
by idefixgallier
Hi!

Code: Select all

[12-03-2014 09:50:07] wproc: Core Worker 4124: job 95209 (pid=18245): Dormant child reaped
Informational Message[12-03-2014 09:50:02] wproc: Core Worker 4124: Failed to reap child with pid 18245. Next attempt @ 1394614207.572255
Informational Message[12-03-2014 09:50:02] wproc: Core Worker 4124: tv.tv_sec is currently 1394614202
Informational Message[12-03-2014 09:50:02] Warning: Check of service 'Scalix-Webmail' on host 'sxalumni.fhstp.ac.at' timed out after 90.004s!
Informational Message[12-03-2014 09:50:02] wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Informational Message[12-03-2014 09:50:02] wproc: host=sxalumni.fhstp.ac.at; service=Scalix-Webmail;
Informational Message[12-03-2014 09:50:02] wproc: command: /opt/nagios/libexec/check_http -H sxalumni.fhstp.ac.at -u /webmail/ -I 195.202.144.4 -t 99
Informational Message[12-03-2014 09:50:02] wproc: CHECK job 95209 from worker Core Worker 4124 timed out after 90.00s
Informational Message[12-03-2014 09:50:02] wproc: Core Worker 4124: job 95209 (pid=18245) timed out. Killing it
This timeout was never reported (and of course no event handler was started) - can somebody give me a hint where to search why?

Martin

Re: Timeout without warning?

Posted: Wed Mar 12, 2014 2:02 pm
by sreinhardt
Please run the following command and output the results.

Code: Select all

grep -i 'timeout' /usr/local/nagios/etc/nagios.cfg

Re: Timeout without warning?

Posted: Wed Mar 12, 2014 2:46 pm
by idefixgallier
here you go :)

Code: Select all

root@nagios:/opt/nagios/etc# grep -i timeout nagios.cfg
# TIMEOUT VALUES
service_check_timeout=90
host_check_timeout=30
event_handler_timeout=240
notification_timeout=120
ocsp_timeout=5
perfdata_timeout=5
lg
Martin

Re: Timeout without warning?

Posted: Wed Mar 12, 2014 3:19 pm
by sreinhardt
I'm not sure if this is an expected thing with core or not. However the worker killed your check before it could return any output or status for it timing out as the max service timeout within core is set to 90 seconds and your plugin had a timeout of 99 seconds. Increasing the max timeout within core or decreasing it on the plugin below 90 should resolve this issue. I will check on what behavior is expected when core kills a check, as I would expect at least an unknown if not critical.

Re: Timeout without warning?

Posted: Thu Mar 13, 2014 1:30 am
by idefixgallier
As I have written the grep -i timeout output it has become
reasonable for me

service_check_timeout=90 and a ... -t 99 could not work together very well...

(and now the "but" :) )

but
until 4.0.3 this raised no problems. I will lower the -t value, it is more logical
in the way 4.0.3 handles that.

Thank you!

Re: Timeout without warning?

Posted: Thu Mar 13, 2014 10:45 am
by lmiltchev
Let us know if we can mark this topic as "resolved".

Re: Timeout without warning?

Posted: Thu Mar 13, 2014 11:46 am
by idefixgallier
I think you can - thank you for your help!!!