(Service Check Timed Out) with a custom perl script
Posted: Tue Jul 24, 2012 12:05 pm
I have been fighting with a custom perl script that sends a test transaction through a credit card processing API. We run it every minute and have the retry set to 30 seconds. I know this is pretty frequent but is necessary for our business to know ASAP about a failure. About once or twice an evening we get a false positive with the result of "(Service Check Timed Out)" but based on the logging in our script, the script is never even invoked. You can see in the logs below the script runs at 2:40 and 2:42 but not at 2:41.
[Tue Jul 24 02:40:37 2012] Script started
[Tue Jul 24 02:40:37 2012] Using config file /usr/local/nagios/libexec/config-autoresp-10.xml
[Tue Jul 24 02:40:37 2012] ServiceKey: C37875FC7F91300C
[Tue Jul 24 02:40:37 2012] IdentToken found
[Tue Jul 24 02:40:37 2012] BaseURL: https://cws-01.ipcommerce.com
[Tue Jul 24 02:40:37 2012] SignOn SOAP message found
[Tue Jul 24 02:40:37 2012] Authorize SOAP message found
[Tue Jul 24 02:40:37 2012] URL Postfix: /2.0
[Tue Jul 24 02:40:37 2012] using BaseURL: https://cws-01.ipcommerce.com/2.0
[Tue Jul 24 02:40:37 2012] SAK found in ident token: C37875FC7F91300C
[Tue Jul 24 02:40:37 2012] signon duration 0.422683
[Tue Jul 24 02:40:37 2012] signon response code:200
[Tue Jul 24 02:40:37 2012] Session token found in SignOn response:C37875FC7F91300C
[Tue Jul 24 02:40:38 2012] txn duration: 0.803918
[Tue Jul 24 02:40:38 2012] txn response code 200 response msg: <REMOVED XML>
[Tue Jul 24 02:40:38 2012] Script ended
[Tue Jul 24 02:42:37 2012] Script started
[Tue Jul 24 02:42:37 2012] Using config file /usr/local/nagios/libexec/config-autoresp-10.xml
[Tue Jul 24 02:42:37 2012] ServiceKey: C37875FC7F91300C
[Tue Jul 24 02:42:37 2012] IdentToken found
[Tue Jul 24 02:42:37 2012] BaseURL: https://cws-01.ipcommerce.com
[Tue Jul 24 02:42:37 2012] SignOn SOAP message found
[Tue Jul 24 02:42:37 2012] Authorize SOAP message found
[Tue Jul 24 02:42:37 2012] URL Postfix: /2.0
[Tue Jul 24 02:42:37 2012] using BaseURL: https://cws-01.ipcommerce.com/2.0
[Tue Jul 24 02:42:37 2012] SAK found in ident token: C37875FC7F91300C
[Tue Jul 24 02:42:37 2012] signon duration 0.372244
[Tue Jul 24 02:42:37 2012] signon response code:200
[Tue Jul 24 02:42:37 2012] Session token found in SignOn response:C37875FC7F91300C
[Tue Jul 24 02:42:38 2012] txn duration: 0.756385
[Tue Jul 24 02:42:38 2012] txn response code 200 response msg: <REMOVED XML>
[Tue Jul 24 02:42:38 2012] Script ended
This is the output from nagios:
[07-24-2012 02:42:47] SERVICE ALERT: cws-01.ipcommerce.com;Check CWS PSP v10+;OK;HARD;1;OK: At https://cws-01.ipcommerce.com/2.0/Txn/C37875FC7F91300C
[07-24-2012 02:42:37] SERVICE ALERT: cws-01.ipcommerce.com;Check CWS PSP v10+;CRITICAL;HARD;1;(Service Check Timed Out)
I'm just looking for any advice on this to check/try, waking a bunch of people up at random times in the night isn't ideal. It doesn't happen at a consistent time but the weird thing is is never happens during the day when we are in the office. We even moved the check to a fresh nagios install it the issue happens there as well. Thanks in advance for any help.
[Tue Jul 24 02:40:37 2012] Script started
[Tue Jul 24 02:40:37 2012] Using config file /usr/local/nagios/libexec/config-autoresp-10.xml
[Tue Jul 24 02:40:37 2012] ServiceKey: C37875FC7F91300C
[Tue Jul 24 02:40:37 2012] IdentToken found
[Tue Jul 24 02:40:37 2012] BaseURL: https://cws-01.ipcommerce.com
[Tue Jul 24 02:40:37 2012] SignOn SOAP message found
[Tue Jul 24 02:40:37 2012] Authorize SOAP message found
[Tue Jul 24 02:40:37 2012] URL Postfix: /2.0
[Tue Jul 24 02:40:37 2012] using BaseURL: https://cws-01.ipcommerce.com/2.0
[Tue Jul 24 02:40:37 2012] SAK found in ident token: C37875FC7F91300C
[Tue Jul 24 02:40:37 2012] signon duration 0.422683
[Tue Jul 24 02:40:37 2012] signon response code:200
[Tue Jul 24 02:40:37 2012] Session token found in SignOn response:C37875FC7F91300C
[Tue Jul 24 02:40:38 2012] txn duration: 0.803918
[Tue Jul 24 02:40:38 2012] txn response code 200 response msg: <REMOVED XML>
[Tue Jul 24 02:40:38 2012] Script ended
[Tue Jul 24 02:42:37 2012] Script started
[Tue Jul 24 02:42:37 2012] Using config file /usr/local/nagios/libexec/config-autoresp-10.xml
[Tue Jul 24 02:42:37 2012] ServiceKey: C37875FC7F91300C
[Tue Jul 24 02:42:37 2012] IdentToken found
[Tue Jul 24 02:42:37 2012] BaseURL: https://cws-01.ipcommerce.com
[Tue Jul 24 02:42:37 2012] SignOn SOAP message found
[Tue Jul 24 02:42:37 2012] Authorize SOAP message found
[Tue Jul 24 02:42:37 2012] URL Postfix: /2.0
[Tue Jul 24 02:42:37 2012] using BaseURL: https://cws-01.ipcommerce.com/2.0
[Tue Jul 24 02:42:37 2012] SAK found in ident token: C37875FC7F91300C
[Tue Jul 24 02:42:37 2012] signon duration 0.372244
[Tue Jul 24 02:42:37 2012] signon response code:200
[Tue Jul 24 02:42:37 2012] Session token found in SignOn response:C37875FC7F91300C
[Tue Jul 24 02:42:38 2012] txn duration: 0.756385
[Tue Jul 24 02:42:38 2012] txn response code 200 response msg: <REMOVED XML>
[Tue Jul 24 02:42:38 2012] Script ended
This is the output from nagios:
[07-24-2012 02:42:47] SERVICE ALERT: cws-01.ipcommerce.com;Check CWS PSP v10+;OK;HARD;1;OK: At https://cws-01.ipcommerce.com/2.0/Txn/C37875FC7F91300C
[07-24-2012 02:42:37] SERVICE ALERT: cws-01.ipcommerce.com;Check CWS PSP v10+;CRITICAL;HARD;1;(Service Check Timed Out)
I'm just looking for any advice on this to check/try, waking a bunch of people up at random times in the night isn't ideal. It doesn't happen at a consistent time but the weird thing is is never happens during the day when we are in the office. We even moved the check to a fresh nagios install it the issue happens there as well. Thanks in advance for any help.