Page 4 of 9

Re: Sporadic 'Connection refused' errors in 4.2.4

PostPosted: Tue Jan 17, 2017 5:56 am
by kernow5000
I've cheated and set service_check_timeout=65 just to test. Will report back.

Passive checks are an option, I don't currently have any configured.

Re: Sporadic 'Connection refused' errors in 4.2.4

PostPosted: Tue Jan 17, 2017 11:13 am
by rkennedy
Sounds good - let us know how the testing goes.

Re: Sporadic 'Connection refused' errors in 4.2.4

PostPosted: Wed Jan 18, 2017 5:50 am
by kernow5000
I'm not sure if it's relevant but I've changed all the http checks that were checking out OK (but also being redirected via 302 by various rewrite/config rules on the webservers) to specifically check a file.
Just to reduce the redirection and such.

Mainly because I noticed that these were the checks occasionally failing with 'connection refused'

I've also put in retry_check_interval values for some of the hosts as well as max_check_attempts > 1 (Currently I was just alerting on the first check every time)

Didn't get as many SMS's from the same old hosts last night at least.

Hopefully this'll smooth a few things out, but we'll see.

Re: Sporadic 'Connection refused' errors in 4.2.4

PostPosted: Wed Jan 18, 2017 3:23 pm
by rkennedy
You could also increase the notification_interval for a longer length before sending a notification as well.

Let us know if you have any further questions.

Re: Sporadic 'Connection refused' errors in 4.2.4

PostPosted: Wed Jan 18, 2017 3:30 pm
by dwhitfield
kernow5000 wrote:I might just remove the check for that host ... ha


So, I notice a couple of things going back through this thread. One, would it be possible to spin up a Core server just for this one check? If you have a server doing no other checks, the other checks can't be getting in the way (well, unless they are sitting on the same physical host, or there is network latency, but realistically...)

You mention that multiple times these have come through at 4am, although I see they are not just at 4am. Have you spoken with your network team or any of the admins of the servers that are having these strange errors to see if they do anything at 4am.

Also, one thing that may be less drastic than removing the check is scheduling downtime. I don't know how critical that server is to be running at night, but you could just schedule it being down for the most annoying hours. I know that's not ideal but it seems better on the face than removing the check altogether.

Re: Sporadic 'Connection refused' errors in 4.2.4

PostPosted: Thu Jan 19, 2017 4:35 am
by kernow5000
Thanks for your suggestions guys. Another Nagios server is certainly possible. That's one option for testing.

I'll look into notification_interval also.

Sadly the boxes are at different providers, but I could ask if they have any of these issues at certain times.

Actually, since tweaking retry_interval and max_checks or whatever it seems to have smoothed out.

Will keep you posted.

Re: Sporadic 'Connection refused' errors in 4.2.4

PostPosted: Thu Jan 19, 2017 5:19 pm
by rkennedy
Let us know if you have any further questions!

Re: Sporadic 'Connection refused' errors in 4.2.4

PostPosted: Wed Jan 25, 2017 4:11 am
by kernow5000
Ugh, a good 60 'connection refused' SMS errors last night, on the same few hosts. I'm going to take these down to email only for now. But at this point I have to think about sacking this off and looking at other availability monitoring sadly :(

Re: Sporadic 'Connection refused' errors in 4.2.4

PostPosted: Wed Jan 25, 2017 4:54 am
by kernow5000
...and pretty much all of those were port 443 / SSL connections.

Hmmmmmmm

Re: Sporadic 'Connection refused' errors in 4.2.4

PostPosted: Wed Jan 25, 2017 6:00 am
by kernow5000
Haven't seen this one before.

CRITICAL - Plugin timed out while executing system call
On a DNS check this time.