Sporadic 'Connection refused' errors in 4.2.4

An open discussion forum for obtaining help with Nagios Core. Nagios Core users of all experience levels are welcome here. Subforum have been created for the discussion of Nagios Core and Nagios Plugin development.

NOTE: The SourceForge.net mailing lists have been deprecated in favor of this forum in order to expedite support and provide additional features not available on the old mailing list.

Re: Sporadic 'Connection refused' errors in 4.2.4

Postby kernow5000 » Tue Jan 10, 2017 5:13 am

Yesterday I tweaked a few timeouts on checks to be higher.

I got a single failed->ok email only notification pair for one host - which is in a service check with two other hosts. The two other hosts checked out fine.

Same old connection refused error, same Error 11's in syslog.

Jan 10 04:21:38 REDACTED nagios: job 836 (pid=3439): read() returned error 11
Jan 10 04:26:38 REDACTED nagios: job 841 (pid=5283): read() returned error 11

email alerts:

State: CRITICAL
Date/Time: Tue Jan 10 04:21:38 GMT 2017
Additional Info:
connect to address REDACTED and port 443: Connection refused

State: OK
Date/Time: Tue Jan 10 04:26:38 GMT 2017
Additional Info:
HTTP OK: HTTP/1.1 301 Moved Permanently - 472 bytes in 0.079 second response time


As you can see these match up to the syslog notifications.
As many developers have said, these error 11's are possibly just informational, but I'd love to get rid of these 'connection refused' false positives.

Shaun
kernow5000
 
Posts: 22
Joined: Mon Jan 09, 2017 9:06 am

Re: Sporadic 'Connection refused' errors in 4.2.4

Postby kernow5000 » Tue Jan 10, 2017 11:39 am

Same host just got the same connection refused error - same error 11 in syslog too

check_http works fine from the command line, as does telnet. Funny how it was this one at 4AM this morning too.

Jan 10 16:31:38 backupserver nagios: job 1675 (pid=4218): read() returned error 11

from nagios.log
[1484022098] SERVICE ALERT: REDACTED;HTTPS check;CRITICAL;HARD;1;connect to address REDACTED and port 443: Connection refused
[1484022098] SERVICE NOTIFICATION: external;REDACTED;HTTPS check;CRITICAL;notify-service-by-email;connect to address REDACTED and port 443: Connection refused
[1484022398] SERVICE ALERT: REDACTED;HTTPS check;OK;HARD;1;HTTP OK: HTTP/1.1 301 Moved Permanently - 472 bytes in 0.079 second response time
[1484022398] SERVICE NOTIFICATION: external;REDACTED;HTTPS check;OK;notify-service-by-email;HTTP OK: HTTP/1.1 301 Moved Permanently - 472 bytes in 0.079 second response time
[1484065298] SERVICE ALERT: REDACTED;HTTPS check;CRITICAL;HARD;1;connect to address REDACTED and port 443: Connection refuse
[1484065298] SERVICE NOTIFICATION: external;REDACTED;HTTPS check;CRITICAL;notify-service-by-email;connect to address REDACTED and port 443: Connection refused
kernow5000
 
Posts: 22
Joined: Mon Jan 09, 2017 9:06 am

Re: Sporadic 'Connection refused' errors in 4.2.4

Postby kernow5000 » Tue Jan 10, 2017 11:54 am

I might just remove the check for that host ... ha
kernow5000
 
Posts: 22
Joined: Mon Jan 09, 2017 9:06 am

Re: Sporadic 'Connection refused' errors in 4.2.4

Postby dwhitfield » Tue Jan 10, 2017 12:10 pm

Looks like how those are stored changed in 4.2.3. It might just be a matter of changing your log level.

nagios: job XX (pid=YY): read() returned error 11 (changed from LOG_ERR to LOG_NOTICE)

https://github.com/NagiosEnterprises/na ... /Changelog

I'd be happy to do a bit more digging, but if removing the check is ok for you, that works for me too. :)
User avatar
dwhitfield
Support Tech
 
Posts: 1451
Joined: Wed Sep 21, 2016 10:29 am
Location: Nagios Enterprises, LLC

Re: Sporadic 'Connection refused' errors in 4.2.4

Postby kernow5000 » Tue Jan 10, 2017 12:12 pm

Now it's failed and gone to critical and eventually sent an SMS instead of an email for that host.

Host is fine and completely accessible.


connect to address REDACTED and port 443: Connection refused


Does it matter I'm using host_name instead of host_address in host blocks?

I don't want this to turn into a nublet-nagios-config-101 thread as I think I can manage that by myself. But this one host ... bah!
Not to mention I don't know if the others are fixed now or just being rather quiet.
kernow5000
 
Posts: 22
Joined: Mon Jan 09, 2017 9:06 am

Re: Sporadic 'Connection refused' errors in 4.2.4

Postby kernow5000 » Tue Jan 10, 2017 12:24 pm

Info: CRITICAL - Socket timeout


Socket timeout, hmm - same host

Different host: connect to address REDACTED and port 25: Connection refused



I really don't understand how 99% of the time it's fine and then has these little blips. However at least I know it's working I guess.
Weird how it's always connection refused errors, when nothing changes on the host side and the platform is completely fine and operational.
kernow5000
 
Posts: 22
Joined: Mon Jan 09, 2017 9:06 am

Re: Sporadic 'Connection refused' errors in 4.2.4

Postby dwhitfield » Tue Jan 10, 2017 1:03 pm

What's the output of ulimit -a on the servers that are returning connection refused?
User avatar
dwhitfield
Support Tech
 
Posts: 1451
Joined: Wed Sep 21, 2016 10:29 am
Location: Nagios Enterprises, LLC

Re: Sporadic 'Connection refused' errors in 4.2.4

Postby kernow5000 » Wed Jan 11, 2017 4:18 am

Hi,

Code: Select all
[ec2-user@redacted ~]$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 15734
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 15734
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
Last edited by dwhitfield on Wed Jan 11, 2017 12:25 pm, edited 1 time in total.
Reason: code blocks FTW
kernow5000
 
Posts: 22
Joined: Mon Jan 09, 2017 9:06 am

Re: Sporadic 'Connection refused' errors in 4.2.4

Postby kernow5000 » Wed Jan 11, 2017 10:48 am

Another - but this wasn't really a false positive as the platform had technically failed. Still.. wasn't expecting a connection refused.

Jan 11 15:33:03 REDACTED nagios: job 1555 (pid=25022): read() returned error 11
Jan 11 15:38:03 REDACTED nagios: job 1562 (pid=26781): read() returned error 11


***** Nagios *****
Notification Type: PROBLEM
Service: HTTPS check text
Host: REDACTED
Address: REDACTED
State: CRITICAL
Date/Time: Wed Jan 11 15:33:03 GMT 2017
Additional Info:
connect to address REDACTED and port 443: Connection refused



***** Nagios *****
Notification Type: RECOVERY
Service: HTTPS check text
Host: REDACTED
Address: REDACTED
State: OK
Date/Time: Wed Jan 11 15:38:03 GMT 2017
Additional Info:
HTTP OK: HTTP/1.1 200 OK - 253 bytes in 0.009 second response time




Weird
kernow5000
 
Posts: 22
Joined: Mon Jan 09, 2017 9:06 am

Re: Sporadic 'Connection refused' errors in 4.2.4

Postby dwhitfield » Wed Jan 11, 2017 12:31 pm

FWIW, here's the block that looks different on mine.

Code: Select all
open files                      (-n) 10000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240


The open files is the only one that really jumps out at me.

https://access.redhat.com/solutions/61334 should be of use.

Please let us know if you see any changes after increasing the limits.
User avatar
dwhitfield
Support Tech
 
Posts: 1451
Joined: Wed Sep 21, 2016 10:29 am
Location: Nagios Enterprises, LLC

PreviousNext

Return to Nagios Core

Who is online

Users browsing this forum: No registered users and 7 guests