Sporadic 'Connection refused' errors in 4.2.4

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
kernow5000
Posts: 58
Joined: Mon Jan 09, 2017 9:06 am

Re: Sporadic 'Connection refused' errors in 4.2.4

Post by kernow5000 »

Yesterday I tweaked a few timeouts on checks to be higher.

I got a single failed->ok email only notification pair for one host - which is in a service check with two other hosts. The two other hosts checked out fine.

Same old connection refused error, same Error 11's in syslog.

Jan 10 04:21:38 REDACTED nagios: job 836 (pid=3439): read() returned error 11
Jan 10 04:26:38 REDACTED nagios: job 841 (pid=5283): read() returned error 11

email alerts:

State: CRITICAL
Date/Time: Tue Jan 10 04:21:38 GMT 2017
Additional Info:
connect to address REDACTED and port 443: Connection refused

State: OK
Date/Time: Tue Jan 10 04:26:38 GMT 2017
Additional Info:
HTTP OK: HTTP/1.1 301 Moved Permanently - 472 bytes in 0.079 second response time


As you can see these match up to the syslog notifications.
As many developers have said, these error 11's are possibly just informational, but I'd love to get rid of these 'connection refused' false positives.

Shaun
kernow5000
Posts: 58
Joined: Mon Jan 09, 2017 9:06 am

Re: Sporadic 'Connection refused' errors in 4.2.4

Post by kernow5000 »

Same host just got the same connection refused error - same error 11 in syslog too

check_http works fine from the command line, as does telnet. Funny how it was this one at 4AM this morning too.

Jan 10 16:31:38 backupserver nagios: job 1675 (pid=4218): read() returned error 11

from nagios.log
[1484022098] SERVICE ALERT: REDACTED;HTTPS check;CRITICAL;HARD;1;connect to address REDACTED and port 443: Connection refused
[1484022098] SERVICE NOTIFICATION: external;REDACTED;HTTPS check;CRITICAL;notify-service-by-email;connect to address REDACTED and port 443: Connection refused
[1484022398] SERVICE ALERT: REDACTED;HTTPS check;OK;HARD;1;HTTP OK: HTTP/1.1 301 Moved Permanently - 472 bytes in 0.079 second response time
[1484022398] SERVICE NOTIFICATION: external;REDACTED;HTTPS check;OK;notify-service-by-email;HTTP OK: HTTP/1.1 301 Moved Permanently - 472 bytes in 0.079 second response time
[1484065298] SERVICE ALERT: REDACTED;HTTPS check;CRITICAL;HARD;1;connect to address REDACTED and port 443: Connection refuse
[1484065298] SERVICE NOTIFICATION: external;REDACTED;HTTPS check;CRITICAL;notify-service-by-email;connect to address REDACTED and port 443: Connection refused
kernow5000
Posts: 58
Joined: Mon Jan 09, 2017 9:06 am

Re: Sporadic 'Connection refused' errors in 4.2.4

Post by kernow5000 »

I might just remove the check for that host ... ha
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Sporadic 'Connection refused' errors in 4.2.4

Post by dwhitfield »

Looks like how those are stored changed in 4.2.3. It might just be a matter of changing your log level.
nagios: job XX (pid=YY): read() returned error 11 (changed from LOG_ERR to LOG_NOTICE)
https://github.com/NagiosEnterprises/na ... /Changelog

I'd be happy to do a bit more digging, but if removing the check is ok for you, that works for me too. :)
kernow5000
Posts: 58
Joined: Mon Jan 09, 2017 9:06 am

Re: Sporadic 'Connection refused' errors in 4.2.4

Post by kernow5000 »

Now it's failed and gone to critical and eventually sent an SMS instead of an email for that host.

Host is fine and completely accessible.


connect to address REDACTED and port 443: Connection refused


Does it matter I'm using host_name instead of host_address in host blocks?

I don't want this to turn into a nublet-nagios-config-101 thread as I think I can manage that by myself. But this one host ... bah!
Not to mention I don't know if the others are fixed now or just being rather quiet.
kernow5000
Posts: 58
Joined: Mon Jan 09, 2017 9:06 am

Re: Sporadic 'Connection refused' errors in 4.2.4

Post by kernow5000 »

Info: CRITICAL - Socket timeout


Socket timeout, hmm - same host

Different host: connect to address REDACTED and port 25: Connection refused



I really don't understand how 99% of the time it's fine and then has these little blips. However at least I know it's working I guess.
Weird how it's always connection refused errors, when nothing changes on the host side and the platform is completely fine and operational.
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Sporadic 'Connection refused' errors in 4.2.4

Post by dwhitfield »

What's the output of ulimit -a on the servers that are returning connection refused?
kernow5000
Posts: 58
Joined: Mon Jan 09, 2017 9:06 am

Re: Sporadic 'Connection refused' errors in 4.2.4

Post by kernow5000 »

Hi,

Code: Select all

[ec2-user@redacted ~]$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 15734
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 15734
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
Last edited by dwhitfield on Wed Jan 11, 2017 12:25 pm, edited 1 time in total.
Reason: code blocks FTW
kernow5000
Posts: 58
Joined: Mon Jan 09, 2017 9:06 am

Re: Sporadic 'Connection refused' errors in 4.2.4

Post by kernow5000 »

Another - but this wasn't really a false positive as the platform had technically failed. Still.. wasn't expecting a connection refused.

Jan 11 15:33:03 REDACTED nagios: job 1555 (pid=25022): read() returned error 11
Jan 11 15:38:03 REDACTED nagios: job 1562 (pid=26781): read() returned error 11


***** Nagios *****
Notification Type: PROBLEM
Service: HTTPS check text
Host: REDACTED
Address: REDACTED
State: CRITICAL
Date/Time: Wed Jan 11 15:33:03 GMT 2017
Additional Info:
connect to address REDACTED and port 443: Connection refused



***** Nagios *****
Notification Type: RECOVERY
Service: HTTPS check text
Host: REDACTED
Address: REDACTED
State: OK
Date/Time: Wed Jan 11 15:38:03 GMT 2017
Additional Info:
HTTP OK: HTTP/1.1 200 OK - 253 bytes in 0.009 second response time




Weird
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Sporadic 'Connection refused' errors in 4.2.4

Post by dwhitfield »

FWIW, here's the block that looks different on mine.

Code: Select all

open files                      (-n) 10000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
The open files is the only one that really jumps out at me.

https://access.redhat.com/solutions/61334 should be of use.

Please let us know if you see any changes after increasing the limits.
Locked