I have been fighting with a custom perl script that sends a test transaction through a credit card processing API. We run it every minute and have the retry set to 30 seconds. I know this is pretty frequent but is necessary for our business to know ASAP about a failure. About once or twice an evening we get a false positive with the result of "(Service Check Timed Out)" but based on the logging in our script, the script is never even invoked. You can see in the logs below the script runs at 2:40 and 2:42 but not at 2:41.
[Tue Jul 24 02:40:37 2012] Script started
[Tue Jul 24 02:40:37 2012] Using config file /usr/local/nagios/libexec/config-autoresp-10.xml
[Tue Jul 24 02:40:37 2012] ServiceKey: C37875FC7F91300C
[Tue Jul 24 02:40:37 2012] IdentToken found
[Tue Jul 24 02:40:37 2012] BaseURL: https://cws-01.ipcommerce.com
[Tue Jul 24 02:40:37 2012] SignOn SOAP message found
[Tue Jul 24 02:40:37 2012] Authorize SOAP message found
[Tue Jul 24 02:40:37 2012] URL Postfix: /2.0
[Tue Jul 24 02:40:37 2012] using BaseURL: https://cws-01.ipcommerce.com/2.0
[Tue Jul 24 02:40:37 2012] SAK found in ident token: C37875FC7F91300C
[Tue Jul 24 02:40:37 2012] signon duration 0.422683
[Tue Jul 24 02:40:37 2012] signon response code:200
[Tue Jul 24 02:40:37 2012] Session token found in SignOn response:C37875FC7F91300C
[Tue Jul 24 02:40:38 2012] txn duration: 0.803918
[Tue Jul 24 02:40:38 2012] txn response code 200 response msg: <REMOVED XML>
[Tue Jul 24 02:40:38 2012] Script ended
[Tue Jul 24 02:42:37 2012] Script started
[Tue Jul 24 02:42:37 2012] Using config file /usr/local/nagios/libexec/config-autoresp-10.xml
[Tue Jul 24 02:42:37 2012] ServiceKey: C37875FC7F91300C
[Tue Jul 24 02:42:37 2012] IdentToken found
[Tue Jul 24 02:42:37 2012] BaseURL: https://cws-01.ipcommerce.com
[Tue Jul 24 02:42:37 2012] SignOn SOAP message found
[Tue Jul 24 02:42:37 2012] Authorize SOAP message found
[Tue Jul 24 02:42:37 2012] URL Postfix: /2.0
[Tue Jul 24 02:42:37 2012] using BaseURL: https://cws-01.ipcommerce.com/2.0
[Tue Jul 24 02:42:37 2012] SAK found in ident token: C37875FC7F91300C
[Tue Jul 24 02:42:37 2012] signon duration 0.372244
[Tue Jul 24 02:42:37 2012] signon response code:200
[Tue Jul 24 02:42:37 2012] Session token found in SignOn response:C37875FC7F91300C
[Tue Jul 24 02:42:38 2012] txn duration: 0.756385
[Tue Jul 24 02:42:38 2012] txn response code 200 response msg: <REMOVED XML>
[Tue Jul 24 02:42:38 2012] Script ended
This is the output from nagios:
[07-24-2012 02:42:47] SERVICE ALERT: cws-01.ipcommerce.com;Check CWS PSP v10+;OK;HARD;1;OK: At https://cws-01.ipcommerce.com/2.0/Txn/C37875FC7F91300C
[07-24-2012 02:42:37] SERVICE ALERT: cws-01.ipcommerce.com;Check CWS PSP v10+;CRITICAL;HARD;1;(Service Check Timed Out)
I'm just looking for any advice on this to check/try, waking a bunch of people up at random times in the night isn't ideal. It doesn't happen at a consistent time but the weird thing is is never happens during the day when we are in the office. We even moved the check to a fresh nagios install it the issue happens there as well. Thanks in advance for any help.
(Service Check Timed Out) with a custom perl script
Re: (Service Check Timed Out) with a custom perl script
Do you have multiples instances of nagios running? It would be apparent than a check timing out every once and a while I think, but its possible, and it doesn't really make a lot of sense that it wouldn't be logging the advent. Is there some passive check being sent thats causing that error, because that might not show up in the logs.
Nicholas Scott
Former Nagios employee
Former Nagios employee
Re: (Service Check Timed Out) with a custom perl script
Only one instance of nagios per server. We are running this check once a min on two separate servers in the same environment.
No passive checks.
No passive checks.
Re: (Service Check Timed Out) with a custom perl script
Did you check for multiple instances? Because if that was the case it would be inadvertent, not intentional (sorry if you actually did check, but the way you worded your post made it unclear). Try running 'pgrep -l nagios' and see if there are multiple nagios processes. Also, are you using the embedded perl interpreter?
Re: (Service Check Timed Out) with a custom perl script
There were two results from that command but one was a child process of the other (based on the analysis of a more linux savy co-worker) so I don't think that's it.
How could I tell if I'm using the embedded interpreter or not?
How could I tell if I'm using the embedded interpreter or not?