Page 1 of 1
Faced issue with check_http
Posted: Wed Apr 22, 2015 1:06 pm
by ankukreja
We recently faced an issue where our payment provider service was returning 502 error code , but Nagios was not able to detect and report any errors.
Here are the logs application logs for that time
2015-04-12 13:44:02.410 INFO [main] [com.digi.ecommerce.cronjob.fulfillment.logic.finance.CheckPaymentStatus] (CheckPaymentStatus.java:257) - Error Number = 50
Error Category = null
Exception =>
java.io.IOException: Server returned HTTP response code: 502 for URL:
https://www.xxxxx.com/epayment/webservi ... ryCardInfo
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:234)
Defined service is -
define service{
use generic-service
host_name localhost
service_description xxxxx.com
servicegroups connectivity
check_command check_custom_http!-H
www.xxxxx.com -u "/epayment/webservice/TxInquiryCardDetails/TxDetailsInquiry.asmx/TxDetailsInquiryCardInfo?MerchantCode=yyy&ReferenceNo=zzzz&Amount=2165.00&Version=2" -S -w3 -c5
}
# 'check_custom_http' command definition
define command{
command_name check_custom_http
command_line $USER1$/check_http $ARG1$
}
Can you please suggest what could be the reason of alert not firing ? any debugging steps which we can do to identify the problem ?
Re: Faced issue with check_http
Posted: Wed Apr 22, 2015 5:08 pm
by jdalrymple
I assume it's not broken now. Can you recreate the problem somehow, and if so can you curl the URL so we can check out the output?
Re: Faced issue with check_http
Posted: Wed Apr 22, 2015 5:13 pm
by jolson
According to:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
The server, while acting as a gateway or proxy, received an invalid response from the upstream server it accessed in attempting to fulfill the request.
check_http should resolve 502 codes as CRITICAL:
Code: Select all
./check_http -H www.getstatuscode.com -u "/502"
HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 37518 bytes in 1.619 second response time |time=1.619153s;;;0.000000 size=37518B;;;0
Is it possible that the 502 error was resolved by the time Nagios checked it next? If you run the command from the CLI, does it return appropriate results?
Code: Select all
/usr/local/nagios/libexec/check_http -H www.xxxxx.com -u "/epayment/webservice/TxInquiryCardDetails/TxDetailsInquiry.asmx/TxDetailsInquiryCardInfo?MerchantCode=yyy&ReferenceNo=zzzz&Amount=2165.00&Version=2" -S -w3 -c5
Is a proper certificate in place for the '-S' to succeed?
Re: Faced issue with check_http
Posted: Wed Apr 22, 2015 7:03 pm
by ankukreja
jdalrymple wrote:I assume it's not broken now. Can you recreate the problem somehow, and if so can you curl the URL so we can check out the output?
Was it broken earlier at some point of time ? How can I verify if I am using the latest plugin of Check_http or older one ?
Re: Faced issue with check_http
Posted: Wed Apr 22, 2015 7:13 pm
by ankukreja
jolson wrote:According to:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
The server, while acting as a gateway or proxy, received an invalid response from the upstream server it accessed in attempting to fulfill the request.
check_http should resolve 502 codes as CRITICAL:
Code: Select all
./check_http -H www.getstatuscode.com -u "/502"
HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 37518 bytes in 1.619 second response time |time=1.619153s;;;0.000000 size=37518B;;;0
Is it possible that the 502 error was resolved by the time Nagios checked it next? If you run the command from the CLI, does it return appropriate results?
Code: Select all
/usr/local/nagios/libexec/check_http -H www.xxxxx.com -u "/epayment/webservice/TxInquiryCardDetails/TxDetailsInquiry.asmx/TxDetailsInquiryCardInfo?MerchantCode=yyy&ReferenceNo=zzzz&Amount=2165.00&Version=2" -S -w3 -c5
Is a proper certificate in place for the '-S' to succeed?
Issue was persistent for long time . Yes the proper certificates are in place and I am getting below response when I am running the command now -
HTTP OK: HTTP/1.1 200 OK - 1150 bytes in 1.382 second response time |time=1.381566s;3.000000;5.000000;0.000000 size=1150B;;;0
Re: Faced issue with check_http
Posted: Thu Apr 23, 2015 10:14 am
by jdalrymple
ankukreja wrote:Was it broken earlier at some point of time ? How can I verify if I am using the latest plugin of Check_http or older one ?
You can verify the version of check_http at the command line as I indicated:
Code: Select all
[jdalrymple@localhost ~]$ /usr/local/nagios/libexec/check_http -V
check_http v2.0.3 (nagios-plugins 2.0.3)
As jolson indicated, the check_http plugin is working properly with regard to returning critical response when encountering a 502 error:
Code: Select all
[jdalrymple@localhost ~]$ /usr/local/nagios/libexec/check_http -H www.example.com -u /
HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 323 bytes in 0.297 second response time |time=0.296953s;;;0.000000 size=323B;;;0
[jdalrymple@localhost ~]$ echo $?
2
Unless you can recreate the problem I'm not sure how we'll be able to help. Ideally we'd need you to curl the URL while the error is occurring.
Re: Faced issue with check_http
Posted: Fri Apr 24, 2015 1:26 am
by ankukreja
Here is the version which we are using -
check_http v1.4.15 (nagios-plugins 1.4.15)
Can this be the issue ? How to update?
Re: Faced issue with check_http
Posted: Fri Apr 24, 2015 9:10 am
by jolson
ankukreja,
Could you test your version of check_http against this URL to see if it alerts properly?
Code: Select all
/usr/local/nagios/libexec/check_http -H www.getstatuscode.com -u "/502"
What is the result of the above? If it resolves as OK, an upgrade may fix the problems that you're having.
Re: Faced issue with check_http
Posted: Fri Apr 24, 2015 1:16 pm
by ankukreja
That seems to be correct . Thanks for all your help till now
[oracle@xxxxx ~]$ /usr/local/nagios/libexec/check_http -H
www.getstatuscode.com -u "/502"
HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 37371 bytes in 2.580 second response time |time=2.580020s;;;0.000000 size=37371B;;;0
[oracle@ecrpt01 ~]$
[oracle@ecrpt01 ~]$
[oracle@ecrpt01 ~]$
[oracle@xxxxx ~]$ /usr/local/nagios/libexec/check_http -H
www.getstatuscode.com -u "/503"
HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 37426 bytes in 2.088 second response time |time=2.088210s;;;0.000000 size=37426B;;;0
[oracle@ecrpt01 ~]$
Re: Faced issue with check_http
Posted: Fri Apr 24, 2015 1:59 pm
by jolson
I had a hunch that this may have to do with your '-S' flag in your check_http command definition - so jdalrymple and I gave it a test.
He set up an https site and made it return a 502, check_http did not pick up on this:
Code: Select all
/usr/local/nagios/libexec/check_http -H x.x.x.x -S
HTTP OK: HTTP/1.1 301 Moved Permanently - 391 bytes in 0.121 second response time |time=0.120680s;;;0.000000 size=391B;;;0
We added the following to the end of the check:
Code: Select all
/usr/local/nagios/libexec/check_http -H x.x.x.x -S -s /
HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 323 bytes in 0.052 second response time |time=0.052211s;;;0.000000 size=323B;;;0
It began working properly.
We think it may have something to do with nginx - but of course there's no way to test this out without a reproduction on your end. The next time you see the site hit a 502 state, you might try tacking '-s /' to the end of your check.