Sporadic 'Connection refused' errors in 4.2.4
-
- Posts: 58
- Joined: Mon Jan 09, 2017 9:06 am
Re: Sporadic 'Connection refused' errors in 4.2.4
Yesterday I tweaked a few timeouts on checks to be higher.
I got a single failed->ok email only notification pair for one host - which is in a service check with two other hosts. The two other hosts checked out fine.
Same old connection refused error, same Error 11's in syslog.
Jan 10 04:21:38 REDACTED nagios: job 836 (pid=3439): read() returned error 11
Jan 10 04:26:38 REDACTED nagios: job 841 (pid=5283): read() returned error 11
email alerts:
State: CRITICAL
Date/Time: Tue Jan 10 04:21:38 GMT 2017
Additional Info:
connect to address REDACTED and port 443: Connection refused
State: OK
Date/Time: Tue Jan 10 04:26:38 GMT 2017
Additional Info:
HTTP OK: HTTP/1.1 301 Moved Permanently - 472 bytes in 0.079 second response time
As you can see these match up to the syslog notifications.
As many developers have said, these error 11's are possibly just informational, but I'd love to get rid of these 'connection refused' false positives.
Shaun
I got a single failed->ok email only notification pair for one host - which is in a service check with two other hosts. The two other hosts checked out fine.
Same old connection refused error, same Error 11's in syslog.
Jan 10 04:21:38 REDACTED nagios: job 836 (pid=3439): read() returned error 11
Jan 10 04:26:38 REDACTED nagios: job 841 (pid=5283): read() returned error 11
email alerts:
State: CRITICAL
Date/Time: Tue Jan 10 04:21:38 GMT 2017
Additional Info:
connect to address REDACTED and port 443: Connection refused
State: OK
Date/Time: Tue Jan 10 04:26:38 GMT 2017
Additional Info:
HTTP OK: HTTP/1.1 301 Moved Permanently - 472 bytes in 0.079 second response time
As you can see these match up to the syslog notifications.
As many developers have said, these error 11's are possibly just informational, but I'd love to get rid of these 'connection refused' false positives.
Shaun
-
- Posts: 58
- Joined: Mon Jan 09, 2017 9:06 am
Re: Sporadic 'Connection refused' errors in 4.2.4
Same host just got the same connection refused error - same error 11 in syslog too
check_http works fine from the command line, as does telnet. Funny how it was this one at 4AM this morning too.
Jan 10 16:31:38 backupserver nagios: job 1675 (pid=4218): read() returned error 11
from nagios.log
[1484022098] SERVICE ALERT: REDACTED;HTTPS check;CRITICAL;HARD;1;connect to address REDACTED and port 443: Connection refused
[1484022098] SERVICE NOTIFICATION: external;REDACTED;HTTPS check;CRITICAL;notify-service-by-email;connect to address REDACTED and port 443: Connection refused
[1484022398] SERVICE ALERT: REDACTED;HTTPS check;OK;HARD;1;HTTP OK: HTTP/1.1 301 Moved Permanently - 472 bytes in 0.079 second response time
[1484022398] SERVICE NOTIFICATION: external;REDACTED;HTTPS check;OK;notify-service-by-email;HTTP OK: HTTP/1.1 301 Moved Permanently - 472 bytes in 0.079 second response time
[1484065298] SERVICE ALERT: REDACTED;HTTPS check;CRITICAL;HARD;1;connect to address REDACTED and port 443: Connection refuse
[1484065298] SERVICE NOTIFICATION: external;REDACTED;HTTPS check;CRITICAL;notify-service-by-email;connect to address REDACTED and port 443: Connection refused
check_http works fine from the command line, as does telnet. Funny how it was this one at 4AM this morning too.
Jan 10 16:31:38 backupserver nagios: job 1675 (pid=4218): read() returned error 11
from nagios.log
[1484022098] SERVICE ALERT: REDACTED;HTTPS check;CRITICAL;HARD;1;connect to address REDACTED and port 443: Connection refused
[1484022098] SERVICE NOTIFICATION: external;REDACTED;HTTPS check;CRITICAL;notify-service-by-email;connect to address REDACTED and port 443: Connection refused
[1484022398] SERVICE ALERT: REDACTED;HTTPS check;OK;HARD;1;HTTP OK: HTTP/1.1 301 Moved Permanently - 472 bytes in 0.079 second response time
[1484022398] SERVICE NOTIFICATION: external;REDACTED;HTTPS check;OK;notify-service-by-email;HTTP OK: HTTP/1.1 301 Moved Permanently - 472 bytes in 0.079 second response time
[1484065298] SERVICE ALERT: REDACTED;HTTPS check;CRITICAL;HARD;1;connect to address REDACTED and port 443: Connection refuse
[1484065298] SERVICE NOTIFICATION: external;REDACTED;HTTPS check;CRITICAL;notify-service-by-email;connect to address REDACTED and port 443: Connection refused
-
- Posts: 58
- Joined: Mon Jan 09, 2017 9:06 am
Re: Sporadic 'Connection refused' errors in 4.2.4
I might just remove the check for that host ... ha
-
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: Sporadic 'Connection refused' errors in 4.2.4
Looks like how those are stored changed in 4.2.3. It might just be a matter of changing your log level.
I'd be happy to do a bit more digging, but if removing the check is ok for you, that works for me too.
https://github.com/NagiosEnterprises/na ... /Changelognagios: job XX (pid=YY): read() returned error 11 (changed from LOG_ERR to LOG_NOTICE)
I'd be happy to do a bit more digging, but if removing the check is ok for you, that works for me too.
-
- Posts: 58
- Joined: Mon Jan 09, 2017 9:06 am
Re: Sporadic 'Connection refused' errors in 4.2.4
Now it's failed and gone to critical and eventually sent an SMS instead of an email for that host.
Host is fine and completely accessible.
connect to address REDACTED and port 443: Connection refused
Does it matter I'm using host_name instead of host_address in host blocks?
I don't want this to turn into a nublet-nagios-config-101 thread as I think I can manage that by myself. But this one host ... bah!
Not to mention I don't know if the others are fixed now or just being rather quiet.
Host is fine and completely accessible.
connect to address REDACTED and port 443: Connection refused
Does it matter I'm using host_name instead of host_address in host blocks?
I don't want this to turn into a nublet-nagios-config-101 thread as I think I can manage that by myself. But this one host ... bah!
Not to mention I don't know if the others are fixed now or just being rather quiet.
-
- Posts: 58
- Joined: Mon Jan 09, 2017 9:06 am
Re: Sporadic 'Connection refused' errors in 4.2.4
Info: CRITICAL - Socket timeout
Socket timeout, hmm - same host
Different host: connect to address REDACTED and port 25: Connection refused
I really don't understand how 99% of the time it's fine and then has these little blips. However at least I know it's working I guess.
Weird how it's always connection refused errors, when nothing changes on the host side and the platform is completely fine and operational.
Socket timeout, hmm - same host
Different host: connect to address REDACTED and port 25: Connection refused
I really don't understand how 99% of the time it's fine and then has these little blips. However at least I know it's working I guess.
Weird how it's always connection refused errors, when nothing changes on the host side and the platform is completely fine and operational.
-
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: Sporadic 'Connection refused' errors in 4.2.4
What's the output of ulimit -a on the servers that are returning connection refused?
-
- Posts: 58
- Joined: Mon Jan 09, 2017 9:06 am
Re: Sporadic 'Connection refused' errors in 4.2.4
Hi,
Code: Select all
[ec2-user@redacted ~]$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 15734
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 15734
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Last edited by dwhitfield on Wed Jan 11, 2017 12:25 pm, edited 1 time in total.
Reason: code blocks FTW
Reason: code blocks FTW
-
- Posts: 58
- Joined: Mon Jan 09, 2017 9:06 am
Re: Sporadic 'Connection refused' errors in 4.2.4
Another - but this wasn't really a false positive as the platform had technically failed. Still.. wasn't expecting a connection refused.
Jan 11 15:33:03 REDACTED nagios: job 1555 (pid=25022): read() returned error 11
Jan 11 15:38:03 REDACTED nagios: job 1562 (pid=26781): read() returned error 11
***** Nagios *****
Notification Type: PROBLEM
Service: HTTPS check text
Host: REDACTED
Address: REDACTED
State: CRITICAL
Date/Time: Wed Jan 11 15:33:03 GMT 2017
Additional Info:
connect to address REDACTED and port 443: Connection refused
***** Nagios *****
Notification Type: RECOVERY
Service: HTTPS check text
Host: REDACTED
Address: REDACTED
State: OK
Date/Time: Wed Jan 11 15:38:03 GMT 2017
Additional Info:
HTTP OK: HTTP/1.1 200 OK - 253 bytes in 0.009 second response time
Weird
Jan 11 15:33:03 REDACTED nagios: job 1555 (pid=25022): read() returned error 11
Jan 11 15:38:03 REDACTED nagios: job 1562 (pid=26781): read() returned error 11
***** Nagios *****
Notification Type: PROBLEM
Service: HTTPS check text
Host: REDACTED
Address: REDACTED
State: CRITICAL
Date/Time: Wed Jan 11 15:33:03 GMT 2017
Additional Info:
connect to address REDACTED and port 443: Connection refused
***** Nagios *****
Notification Type: RECOVERY
Service: HTTPS check text
Host: REDACTED
Address: REDACTED
State: OK
Date/Time: Wed Jan 11 15:38:03 GMT 2017
Additional Info:
HTTP OK: HTTP/1.1 200 OK - 253 bytes in 0.009 second response time
Weird
-
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: Sporadic 'Connection refused' errors in 4.2.4
FWIW, here's the block that looks different on mine.
The open files is the only one that really jumps out at me.
https://access.redhat.com/solutions/61334 should be of use.
Please let us know if you see any changes after increasing the limits.
Code: Select all
open files (-n) 10000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
https://access.redhat.com/solutions/61334 should be of use.
Please let us know if you see any changes after increasing the limits.