Page 1 of 1
Service check timed out
Posted: Mon Mar 06, 2017 10:20 am
by suresh.ramasamy
Hi
I am running Nagios Core and I created my own plugin with shell script to monitor few services.
we are getting an alert with error "Service check Timeout" and while running the same script in CLI getting response within less than 1 second. can you help me to fix this issue
Re: Service check timed out
Posted: Mon Mar 06, 2017 1:33 pm
by dwhitfield
What user are you running the check as on the CLI? What are the permissions of the script?
I'm not sure what would be taking it so long if it's the proper user with proper permissions, but the default service check timeout in the nagios.cfg file is 60 seconds.
You could try changing it by editing that section of /usr/local/nagios/etc/nagios.cfg from
service_check_timeout=60
to
service_check_timeout=120
Save the file and restart nagios by running
service nagios restart
Try that and let us know if it works.
Re: Service check timed out
Posted: Tue Mar 07, 2017 6:01 am
by suresh.ramasamy
am running all plugins with Nagios user. Even in CLI tried with same user.
Yes i read about service_check_timeout and tried increasing it to 120 and got the same error service check time out error with 120 seconds.
Here is the error before extending service_check_timeout
(Service check timed out after 60.03 seconds)
After extending
Service check timed out after 120.06 seconds
Here is the error captured in nagios.log
[1488873168] wproc: Core Worker 27536: job 2 (pid=28234): Dormant child reaped
[1488873173] wproc: Core Worker 27538: job 2 (pid=28270) timed out. Killing it
[1488873173] wproc: CHECK job 2 from worker Core Worker 27538 timed out after 120.06s
[1488873173] wproc: host=axcbqa01; service=cuna_eapp_dev ALL Nodes Memory usage;
[1488873173] wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
[1488873173] Warning: Check of service 'cuna_eapp_dev ALL Nodes Memory usage' on host 'axcbqa01' timed out after 120.062s!
[1488873173] SERVICE ALERT: axcbqa01;cuna_eapp_dev ALL Nodes Memory usage;CRITICAL;SOFT;2;(Service check timed out after 120.06 seconds)
[1488873173] wproc: Core Worker 27538: job 2 (pid=28270): Dormant child reaped
is it something that we need to tune max_concurrent_checks to fix this issue ?
Re: Service check timed out
Posted: Tue Mar 07, 2017 1:49 pm
by dwhitfield
Could you post the script for review? Also, please post your nagios.cfg.
These last few questions may be irrelevant, but they will help us check for known bugs. What version of Core are you using? Was it compiled from source or installed from distro repos? On what OS/version is nagios running? cat /etc/*-release may be of use.
Re: Service check timed out
Posted: Fri Apr 28, 2017 6:33 am
by suresh.ramasamy
Sorry for the delay.
CentOS release 6.6 (Final)
Nagios® Core⢠Version 4.0.8
Yes, Nagios was compiled from the source.
we are seeing too many errors by adding more monitors. expect your expertise on this ASAP.
Please find the attached nagios.log and script.
Re: Service check timed out
Posted: Fri Apr 28, 2017 7:26 am
by tacolover101
i can help if you post the service configuration, as well as the example of you running it over the CLI with a response.
also, what are the permissions on the script? ll /path/to/your/file will show us.
Re: Service check timed out
Posted: Fri Apr 28, 2017 12:07 pm
by tgriep
If we can see the service config and how the check command it defined, that would help out a lot.
Also, can you run the check in a command prompt in the Nagios system while logged in as the nagios user and post that as well?
Thanks.
Re: Service check timed out
Posted: Tue May 30, 2017 3:12 am
by suresh.ramasamy
Here is the output when i run it in CLI as nagios user.
[nagios@nagios libexec]$ ./check_couch -H IP -P 8091 -b Bucketname -c mem_used -n node1 -W 75% -C 85%
Memory Usage OK - 6 % | Usage-Node1=104128808;;; Allocated-Node1=1610612736;;; Usage-AllNodes=312795064;;;
Here is the Service configuration.
define command{
command_name check_couch
command_line $USER1$/check_couch -H $HOSTADDRESS$ -p $ARG1$ -b $ARG2$ -c $ARG3$ -n $ARG4$ -W $ARG5$ -C $ARG6$
}
Re: Service check timed out
Posted: Tue May 30, 2017 1:59 pm
by tgriep
Can you post the service checks for this host / service?
Code: Select all
axcbqa01;cuna_eapp_dev ALL Nodes Memory usage
Or can you post or PM me the following file from the Nagios server?
Code: Select all
/usr/local/nagios/var/objects.cache
The objects file should have the service check definition. You may need to zip it up it is it too large.