Page 1 of 1

check_http - Services constantly flapping

Posted: Fri Mar 06, 2015 11:21 am
by gshergill
Hi,

Been a while since I've been on Nagios, have had some major projects taking up too much time =(

Setup has been running fine for quite a while now, but yesterday we needed to add http checks to 4 of our websites. This sounded simple enough, but for some reason the services are constantly flapping.

Below is the command definition (I've removed the -w and -c to keep it simple) and the service definitions:

Code: Select all

define command {
       command_name     check_http_hosts
       command_line     /usr/local/nagios/libexec/check_http -H $ARG1
}

define service {
        use                             http-service
        host_name                       localhost
        service_description             HTTP - www.globility.co.uk
        check_command                   check_http_hosts!www.globility.co.uk
}
.... (x3 more)
From the command line this works every time, non stop (as both root and the nagios user):

Code: Select all

nagios@nagios:/usr/local/nagios/libexec# ./check_http -H www.globility.co.uk
HTTP OK: HTTP/1.0 200 OK - 10020 bytes in 0.034 second response time |time=0.033887s;;;0.000000 size=10020B;;;0
From the admin interface though it will occasionally stop working with the following message:
Name or service not known
HTTP CRITICAL - Unable to open TCP socket
In the past we had one http check running non stop which never had trouble. We removed that one and added these 4 new ones and are seeing this behaviour on them all. We are unable to add the old one back to check as that domain is no longer in service.

Thanks for any help and I'm to provide more information if needed.

Kind Regards,

Gary Shergill

Re: check_http - Services constantly flapping

Posted: Fri Mar 06, 2015 4:26 pm
by scottwilkerson
Looking at the command you posted I see a couple problems

Code: Select all

define command {
       command_name     check_http_hosts
       command_line     /usr/local/nagios/libexec/check_http -H $ARG1
}
The $ARG1 seems to be missing the ending $

You may also want to add a timeout, and make the whole command something like this

Code: Select all

[code]define command {
       command_name     check_http_hosts
       command_line     /usr/local/nagios/libexec/check_http -H $ARG1$ -t 20
}
[/code]

Re: check_http - Services constantly flapping

Posted: Mon Mar 09, 2015 6:29 am
by gshergill
Hi,

Sorry, that was my mistake when manually copying over from the console (the missing '$').

I've added the timeout as suggested and the issue continues.

Is there an issue with running check_http on external websites from the host "localhost"?

Thank you.

Kind Regards,

Gary Shergill

Re: check_http - Services constantly flapping

Posted: Mon Mar 09, 2015 11:34 am
by gshergill
Hi,

I'll continue monitoring this over the next few days to confirm but after some more detailed investigates I found the issue was on the network.

I'm unsure why it wasn't showing itself on old services, but after adding a new service I noticed the same behaviour on there (simple remote desktop port monitor).

Found there was some trouble on the network where Nagios was running where packets were randomly being lost. Forcing a check of an old service, or adding a new service, showed this issue. I'm unsure why checks running on schedule weren't seeing this packet loss though.

For now, everything is okay, but I'll post if the issue pops up again and the network is 100% fine. Unsure why old checks were only seeing this after a force recheck though...

Thank you.

Kind Regards,

Gary Shergill

Re: check_http - Services constantly flapping

Posted: Mon Mar 09, 2015 4:24 pm
by jolson
Sounds good - please keep us in the loop. Thanks!

Re: check_http - Services constantly flapping

Posted: Fri Mar 27, 2015 7:41 am
by gshergill
Hi,

Just to let you know that this is resolved, not seeing any trouble since the network changes.

Kind Regards,

Gary Shergill