Service check timed out
-
- Posts: 6
- Joined: Mon Mar 06, 2017 10:12 am
Service check timed out
Hi
I am running Nagios Core and I created my own plugin with shell script to monitor few services.
we are getting an alert with error "Service check Timeout" and while running the same script in CLI getting response within less than 1 second. can you help me to fix this issue
I am running Nagios Core and I created my own plugin with shell script to monitor few services.
we are getting an alert with error "Service check Timeout" and while running the same script in CLI getting response within less than 1 second. can you help me to fix this issue
-
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: Service check timed out
What user are you running the check as on the CLI? What are the permissions of the script?
I'm not sure what would be taking it so long if it's the proper user with proper permissions, but the default service check timeout in the nagios.cfg file is 60 seconds.
You could try changing it by editing that section of /usr/local/nagios/etc/nagios.cfg from
service_check_timeout=60
to
service_check_timeout=120
Save the file and restart nagios by running
service nagios restart
Try that and let us know if it works.
I'm not sure what would be taking it so long if it's the proper user with proper permissions, but the default service check timeout in the nagios.cfg file is 60 seconds.
You could try changing it by editing that section of /usr/local/nagios/etc/nagios.cfg from
service_check_timeout=60
to
service_check_timeout=120
Save the file and restart nagios by running
service nagios restart
Try that and let us know if it works.
-
- Posts: 6
- Joined: Mon Mar 06, 2017 10:12 am
Re: Service check timed out
am running all plugins with Nagios user. Even in CLI tried with same user.
Yes i read about service_check_timeout and tried increasing it to 120 and got the same error service check time out error with 120 seconds.
Here is the error before extending service_check_timeout
(Service check timed out after 60.03 seconds)
After extending
Service check timed out after 120.06 seconds
Here is the error captured in nagios.log
[1488873168] wproc: Core Worker 27536: job 2 (pid=28234): Dormant child reaped
[1488873173] wproc: Core Worker 27538: job 2 (pid=28270) timed out. Killing it
[1488873173] wproc: CHECK job 2 from worker Core Worker 27538 timed out after 120.06s
[1488873173] wproc: host=axcbqa01; service=cuna_eapp_dev ALL Nodes Memory usage;
[1488873173] wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
[1488873173] Warning: Check of service 'cuna_eapp_dev ALL Nodes Memory usage' on host 'axcbqa01' timed out after 120.062s!
[1488873173] SERVICE ALERT: axcbqa01;cuna_eapp_dev ALL Nodes Memory usage;CRITICAL;SOFT;2;(Service check timed out after 120.06 seconds)
[1488873173] wproc: Core Worker 27538: job 2 (pid=28270): Dormant child reaped
is it something that we need to tune max_concurrent_checks to fix this issue ?
Yes i read about service_check_timeout and tried increasing it to 120 and got the same error service check time out error with 120 seconds.
Here is the error before extending service_check_timeout
(Service check timed out after 60.03 seconds)
After extending
Service check timed out after 120.06 seconds
Here is the error captured in nagios.log
[1488873168] wproc: Core Worker 27536: job 2 (pid=28234): Dormant child reaped
[1488873173] wproc: Core Worker 27538: job 2 (pid=28270) timed out. Killing it
[1488873173] wproc: CHECK job 2 from worker Core Worker 27538 timed out after 120.06s
[1488873173] wproc: host=axcbqa01; service=cuna_eapp_dev ALL Nodes Memory usage;
[1488873173] wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
[1488873173] Warning: Check of service 'cuna_eapp_dev ALL Nodes Memory usage' on host 'axcbqa01' timed out after 120.062s!
[1488873173] SERVICE ALERT: axcbqa01;cuna_eapp_dev ALL Nodes Memory usage;CRITICAL;SOFT;2;(Service check timed out after 120.06 seconds)
[1488873173] wproc: Core Worker 27538: job 2 (pid=28270): Dormant child reaped
is it something that we need to tune max_concurrent_checks to fix this issue ?
-
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: Service check timed out
Could you post the script for review? Also, please post your nagios.cfg.
These last few questions may be irrelevant, but they will help us check for known bugs. What version of Core are you using? Was it compiled from source or installed from distro repos? On what OS/version is nagios running? cat /etc/*-release may be of use.
These last few questions may be irrelevant, but they will help us check for known bugs. What version of Core are you using? Was it compiled from source or installed from distro repos? On what OS/version is nagios running? cat /etc/*-release may be of use.
-
- Posts: 6
- Joined: Mon Mar 06, 2017 10:12 am
Re: Service check timed out
Sorry for the delay.
CentOS release 6.6 (Final)
Nagios® Core™ Version 4.0.8
Yes, Nagios was compiled from the source.
we are seeing too many errors by adding more monitors. expect your expertise on this ASAP.
Please find the attached nagios.log and script.
CentOS release 6.6 (Final)
Nagios® Core™ Version 4.0.8
Yes, Nagios was compiled from the source.
we are seeing too many errors by adding more monitors. expect your expertise on this ASAP.
Please find the attached nagios.log and script.
- Attachments
-
- check_couch.txt
- script
- (31.19 KiB) Downloaded 441 times
-
- nagios.cfg
- (46.03 KiB) Downloaded 453 times
- tacolover101
- Posts: 432
- Joined: Mon Apr 10, 2017 11:55 am
Re: Service check timed out
i can help if you post the service configuration, as well as the example of you running it over the CLI with a response.
also, what are the permissions on the script? ll /path/to/your/file will show us.
also, what are the permissions on the script? ll /path/to/your/file will show us.
Re: Service check timed out
If we can see the service config and how the check command it defined, that would help out a lot.
Also, can you run the check in a command prompt in the Nagios system while logged in as the nagios user and post that as well?
Thanks.
Also, can you run the check in a command prompt in the Nagios system while logged in as the nagios user and post that as well?
Thanks.
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
- Posts: 6
- Joined: Mon Mar 06, 2017 10:12 am
Re: Service check timed out
Here is the output when i run it in CLI as nagios user.
[nagios@nagios libexec]$ ./check_couch -H IP -P 8091 -b Bucketname -c mem_used -n node1 -W 75% -C 85%
Memory Usage OK - 6 % | Usage-Node1=104128808;;; Allocated-Node1=1610612736;;; Usage-AllNodes=312795064;;;
Here is the Service configuration.
define command{
command_name check_couch
command_line $USER1$/check_couch -H $HOSTADDRESS$ -p $ARG1$ -b $ARG2$ -c $ARG3$ -n $ARG4$ -W $ARG5$ -C $ARG6$
}
[nagios@nagios libexec]$ ./check_couch -H IP -P 8091 -b Bucketname -c mem_used -n node1 -W 75% -C 85%
Memory Usage OK - 6 % | Usage-Node1=104128808;;; Allocated-Node1=1610612736;;; Usage-AllNodes=312795064;;;
Here is the Service configuration.
define command{
command_name check_couch
command_line $USER1$/check_couch -H $HOSTADDRESS$ -p $ARG1$ -b $ARG2$ -c $ARG3$ -n $ARG4$ -W $ARG5$ -C $ARG6$
}
Re: Service check timed out
Can you post the service checks for this host / service?
Or can you post or PM me the following file from the Nagios server?
The objects file should have the service check definition. You may need to zip it up it is it too large.
Code: Select all
axcbqa01;cuna_eapp_dev ALL Nodes Memory usage
Code: Select all
/usr/local/nagios/var/objects.cache
Be sure to check out our Knowledgebase for helpful articles and solutions!