Service check timed out

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
suresh.ramasamy
Posts: 6
Joined: Mon Mar 06, 2017 10:12 am

Service check timed out

Post by suresh.ramasamy »

Hi

I am running Nagios Core and I created my own plugin with shell script to monitor few services.

we are getting an alert with error "Service check Timeout" and while running the same script in CLI getting response within less than 1 second. can you help me to fix this issue
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Service check timed out

Post by dwhitfield »

What user are you running the check as on the CLI? What are the permissions of the script?

I'm not sure what would be taking it so long if it's the proper user with proper permissions, but the default service check timeout in the nagios.cfg file is 60 seconds.
You could try changing it by editing that section of /usr/local/nagios/etc/nagios.cfg from
service_check_timeout=60
to
service_check_timeout=120

Save the file and restart nagios by running
service nagios restart

Try that and let us know if it works.
suresh.ramasamy
Posts: 6
Joined: Mon Mar 06, 2017 10:12 am

Re: Service check timed out

Post by suresh.ramasamy »

am running all plugins with Nagios user. Even in CLI tried with same user.

Yes i read about service_check_timeout and tried increasing it to 120 and got the same error service check time out error with 120 seconds.

Here is the error before extending service_check_timeout
(Service check timed out after 60.03 seconds)
After extending
Service check timed out after 120.06 seconds

Here is the error captured in nagios.log

[1488873168] wproc: Core Worker 27536: job 2 (pid=28234): Dormant child reaped
[1488873173] wproc: Core Worker 27538: job 2 (pid=28270) timed out. Killing it
[1488873173] wproc: CHECK job 2 from worker Core Worker 27538 timed out after 120.06s
[1488873173] wproc: host=axcbqa01; service=cuna_eapp_dev ALL Nodes Memory usage;
[1488873173] wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
[1488873173] Warning: Check of service 'cuna_eapp_dev ALL Nodes Memory usage' on host 'axcbqa01' timed out after 120.062s!
[1488873173] SERVICE ALERT: axcbqa01;cuna_eapp_dev ALL Nodes Memory usage;CRITICAL;SOFT;2;(Service check timed out after 120.06 seconds)
[1488873173] wproc: Core Worker 27538: job 2 (pid=28270): Dormant child reaped


is it something that we need to tune max_concurrent_checks to fix this issue ?
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Service check timed out

Post by dwhitfield »

Could you post the script for review? Also, please post your nagios.cfg.

These last few questions may be irrelevant, but they will help us check for known bugs. What version of Core are you using? Was it compiled from source or installed from distro repos? On what OS/version is nagios running? cat /etc/*-release may be of use.
suresh.ramasamy
Posts: 6
Joined: Mon Mar 06, 2017 10:12 am

Re: Service check timed out

Post by suresh.ramasamy »

Sorry for the delay.

CentOS release 6.6 (Final)
Nagios® Core™ Version 4.0.8
Yes, Nagios was compiled from the source.

we are seeing too many errors by adding more monitors. expect your expertise on this ASAP.

Please find the attached nagios.log and script.
Attachments
check_couch.txt
script
(31.19 KiB) Downloaded 441 times
nagios.cfg
(46.03 KiB) Downloaded 453 times
User avatar
tacolover101
Posts: 432
Joined: Mon Apr 10, 2017 11:55 am

Re: Service check timed out

Post by tacolover101 »

i can help if you post the service configuration, as well as the example of you running it over the CLI with a response.

also, what are the permissions on the script? ll /path/to/your/file will show us.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Service check timed out

Post by tgriep »

If we can see the service config and how the check command it defined, that would help out a lot.
Also, can you run the check in a command prompt in the Nagios system while logged in as the nagios user and post that as well?
Thanks.
Be sure to check out our Knowledgebase for helpful articles and solutions!
suresh.ramasamy
Posts: 6
Joined: Mon Mar 06, 2017 10:12 am

Re: Service check timed out

Post by suresh.ramasamy »

Here is the output when i run it in CLI as nagios user.

[nagios@nagios libexec]$ ./check_couch -H IP -P 8091 -b Bucketname -c mem_used -n node1 -W 75% -C 85%
Memory Usage OK - 6 % | Usage-Node1=104128808;;; Allocated-Node1=1610612736;;; Usage-AllNodes=312795064;;;

Here is the Service configuration.

define command{
command_name check_couch
command_line $USER1$/check_couch -H $HOSTADDRESS$ -p $ARG1$ -b $ARG2$ -c $ARG3$ -n $ARG4$ -W $ARG5$ -C $ARG6$
}
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Service check timed out

Post by tgriep »

Can you post the service checks for this host / service?

Code: Select all

axcbqa01;cuna_eapp_dev ALL Nodes Memory usage
Or can you post or PM me the following file from the Nagios server?

Code: Select all

/usr/local/nagios/var/objects.cache
The objects file should have the service check definition. You may need to zip it up it is it too large.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked