Sporadic 'Connection refused' errors in 4.2.4
-
- Posts: 58
- Joined: Mon Jan 09, 2017 9:06 am
Re: Sporadic 'Connection refused' errors in 4.2.4
And lots more connection refused errors today. Apart from silencing those checks or turning off SMS for them... which defeats the whole point.
I just don't understand anymore!
I just don't understand anymore!
-
- Posts: 58
- Joined: Mon Jan 09, 2017 9:06 am
Re: Sporadic 'Connection refused' errors in 4.2.4
Simple 'check_smtp' check failing against a host on TCP 25.
Run it manually 1000 times, checks out fine each and every time. Sat in nagios as red / critical.
Doesn't compute.
Run it manually 1000 times, checks out fine each and every time. Sat in nagios as red / critical.
Doesn't compute.
Re: Sporadic 'Connection refused' errors in 4.2.4
I would like to take a step back.
- How much resources do you have allocated to this machine? (ram + CPU + disks)
- How many checks do you have running on this machine? (host and service)
- What is the full output of ps -eo pcpu,args --sort=-%cpu and this command -
- Do you have local checks running against the Nagios machine to see where the resources are peaking at all?
kernow5000 wrote:Ugh, a good 60 'connection refused' SMS errors last night, on the same few hosts. I'm going to take these down to email only for now. But at this point I have to think about sacking this off and looking at other availability monitoring sadly
480GB is quite a bit of data, and depending on how it's prioritizing your traffic / IO, this could definitely affect the Nagios checks.It's running bacula as well which runs in the evening. But only for about 480GB of data.
- How much resources do you have allocated to this machine? (ram + CPU + disks)
- How many checks do you have running on this machine? (host and service)
- What is the full output of ps -eo pcpu,args --sort=-%cpu and this command -
Code: Select all
ps axo rss,comm,pid | awk '{ proc_list[$2]++; proc_list[$2 "," 1] += $1; } \
END { for (proc in proc_list) { printf("%d\t%s\n", \
proc_list[proc "," 1],proc); }}' | sort -n | sort -rn | awk '{$1/=1024;printf "%.0fMB\t",$1}{print $2}'
Former Nagios Employee
-
- Posts: 58
- Joined: Mon Jan 09, 2017 9:06 am
Re: Sporadic 'Connection refused' errors in 4.2.4
Sorry, it's not 480GB of data *per* night. Bacula on that box stores a vault for three or four boxes which totals 480GB. The nightly jobs are next to nothing, a few gigabytes at the absolute most.
Specs:
Quad Core Xeon, 32GB RAM, mechanical RAID 5.
I'm doing about 54 service checks I think? Hardly anything.
Specs:
Quad Core Xeon, 32GB RAM, mechanical RAID 5.
I'm doing about 54 service checks I think? Hardly anything.
Code: Select all
%CPU COMMAND
0.1 /usr/libexec/mysqld --basedir=/usr --datadir=/home/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/home/mysql/mysql.sock
0.0 /sbin/init
0.0 [kthreadd]
0.0 [migration/0]
0.0 [ksoftirqd/0]
0.0 [stopper/0]
0.0 [watchdog/0]
0.0 [migration/1]
0.0 [stopper/1]
0.0 [ksoftirqd/1]
0.0 [watchdog/1]
0.0 [migration/2]
0.0 [stopper/2]
0.0 [ksoftirqd/2]
0.0 [watchdog/2]
0.0 [migration/3]
0.0 [stopper/3]
0.0 [ksoftirqd/3]
0.0 [watchdog/3]
0.0 [migration/4]
0.0 [stopper/4]
0.0 [ksoftirqd/4]
0.0 [watchdog/4]
0.0 [migration/5]
0.0 [stopper/5]
0.0 [ksoftirqd/5]
0.0 [watchdog/5]
0.0 [migration/6]
0.0 [stopper/6]
0.0 [ksoftirqd/6]
0.0 [watchdog/6]
0.0 [migration/7]
0.0 [stopper/7]
0.0 [ksoftirqd/7]
0.0 [watchdog/7]
0.0 [events/0]
0.0 [events/1]
0.0 [events/2]
0.0 [events/3]
0.0 [events/4]
0.0 [events/5]
0.0 [events/6]
0.0 [events/7]
0.0 [events/0]
0.0 [events/1]
0.0 [events/2]
0.0 [events/3]
0.0 [events/4]
0.0 [events/5]
0.0 [events/6]
0.0 [events/7]
0.0 [events_long/0]
0.0 [events_long/1]
0.0 [events_long/2]
0.0 [events_long/3]
0.0 [events_long/4]
0.0 [events_long/5]
0.0 [events_long/6]
0.0 [events_long/7]
0.0 [events_power_ef]
0.0 [events_power_ef]
0.0 [events_power_ef]
0.0 [events_power_ef]
0.0 [events_power_ef]
0.0 [events_power_ef]
0.0 [events_power_ef]
0.0 [events_power_ef]
0.0 [cgroup]
0.0 [khelper]
0.0 [netns]
0.0 [async/mgr]
0.0 [pm]
0.0 [sync_supers]
0.0 [bdi-default]
0.0 [kintegrityd/0]
0.0 [kintegrityd/1]
0.0 [kintegrityd/2]
0.0 [kintegrityd/3]
0.0 [kintegrityd/4]
0.0 [kintegrityd/5]
0.0 [kintegrityd/6]
0.0 [kintegrityd/7]
0.0 [kblockd/0]
0.0 [kblockd/1]
0.0 [kblockd/2]
0.0 [kblockd/3]
0.0 [kblockd/4]
0.0 [kblockd/5]
0.0 [kblockd/6]
0.0 [kblockd/7]
0.0 [kacpid]
0.0 [kacpi_notify]
0.0 [kacpi_hotplug]
0.0 [ata_aux]
0.0 [ata_sff/0]
0.0 [ata_sff/1]
0.0 [ata_sff/2]
0.0 [ata_sff/3]
0.0 [ata_sff/4]
0.0 [ata_sff/5]
0.0 [ata_sff/6]
0.0 [ata_sff/7]
0.0 [ksuspend_usbd]
0.0 [khubd]
0.0 [kseriod]
0.0 [md/0]
0.0 [md/1]
0.0 [md/2]
0.0 [md/3]
0.0 [md/4]
0.0 [md/5]
0.0 [md/6]
0.0 [md/7]
0.0 [md_misc/0]
0.0 [md_misc/1]
0.0 [md_misc/2]
0.0 [md_misc/3]
0.0 [md_misc/4]
0.0 [md_misc/5]
0.0 [md_misc/6]
0.0 [md_misc/7]
0.0 [linkwatch]
0.0 [khungtaskd]
0.0 [kswapd0]
0.0 [ksmd]
0.0 [khugepaged]
0.0 [aio/0]
0.0 [aio/1]
0.0 [aio/2]
0.0 [aio/3]
0.0 [aio/4]
0.0 [aio/5]
0.0 [aio/6]
0.0 [aio/7]
0.0 [crypto/0]
0.0 [crypto/1]
0.0 [crypto/2]
0.0 [crypto/3]
0.0 [crypto/4]
0.0 [crypto/5]
0.0 [crypto/6]
0.0 [crypto/7]
0.0 [kthrotld/0]
0.0 [kthrotld/1]
0.0 [kthrotld/2]
0.0 [kthrotld/3]
0.0 [kthrotld/4]
0.0 [kthrotld/5]
0.0 [kthrotld/6]
0.0 [kthrotld/7]
0.0 [kpsmoused]
0.0 [usbhid_resumer]
0.0 [deferwq]
0.0 [kdmremove]
0.0 [kstriped]
0.0 [scsi_eh_0]
0.0 [fw_event_mpt2sa]
0.0 [poll_mpt2sas0_s]
0.0 [kdmflush]
0.0 [kdmflush]
0.0 [jbd2/dm-1-8]
0.0 [ext4-dio-unwrit]
0.0 /sbin/udevd -d
0.0 [kipmi0]
0.0 [kdmflush]
0.0 [jbd2/sda1-8]
0.0 [ext4-dio-unwrit]
0.0 [jbd2/dm-2-8]
0.0 [ext4-dio-unwrit]
0.0 [kauditd]
0.0 [flush-253:1]
0.0 auditd
0.0 [flush-253:2]
0.0 /sbin/rsyslogd -i /var/run/syslogd.pid -c 5
0.0 [kondemand/0]
0.0 [kondemand/1]
0.0 [kondemand/2]
0.0 [kondemand/3]
0.0 [kondemand/4]
0.0 [kondemand/5]
0.0 [kondemand/6]
0.0 [kondemand/7]
0.0 irqbalance --pid=/var/run/irqbalance.pid
0.0 dbus-daemon --system
0.0 /usr/sbin/acpid
0.0 hald
0.0 hald-runner
0.0 hald-addon-input: Listening on /dev/input/event0 /dev/input/event2
0.0 hald-addon-acpi: listening on acpid socket /var/run/acpid.socket
0.0 /usr/sbin/httpd
0.0 pickup -l -t fifo -u
0.0 xinetd -stayalive -pidfile /var/run/xinetd.pid
0.0 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
0.0 /bin/sh /usr/bin/mysqld_safe --datadir=/home/mysql --socket=/home/mysql/mysql.sock --pid-file=/var/run/mysqld/mysqld.pid --basedir=/usr --user=mysql
0.0 bacula-dir -c /etc/bacula/bacula-dir.conf
0.0 bacula-fd -c /etc/bacula/bacula-fd.conf
0.0 bacula-sd -c /etc/bacula/bacula-sd.conf
0.0 abrt-dump-oops -d /var/spool/abrt -rwx /var/log/messages
0.0 crond
0.0 /usr/sbin/atd
0.0 /usr/sbin/certmonger -S -p /var/run/certmonger.pid
0.0 /sbin/mingetty /dev/tty1
0.0 /sbin/mingetty /dev/tty2
0.0 /sbin/mingetty /dev/tty3
0.0 /sbin/mingetty /dev/tty4
0.0 /sbin/mingetty /dev/tty5
0.0 /sbin/mingetty /dev/tty6
0.0 /usr/libexec/postfix/master
0.0 qmgr -l -t fifo -u
0.0 /usr/sbin/dnsmasq
0.0 /usr/sbin/sshd
0.0 /usr/sbin/abrtd
0.0 /usr/libexec/nss_pcache 7143427 off /etc/httpd/alias
0.0 /usr/sbin/httpd
0.0 /usr/sbin/httpd
0.0 /usr/bin/crlhelper 7176211 10131 /etc/httpd/alias
0.0 /usr/sbin/httpd
0.0 /usr/sbin/httpd
0.0 /usr/sbin/httpd
0.0 /usr/sbin/httpd
0.0 /usr/sbin/httpd
0.0 /usr/sbin/httpd
0.0 /usr/sbin/httpd
0.0 /usr/sbin/httpd
0.0 /usr/sbin/snmpd -Lf /dev/null -p /var/run/snmpd.pid
0.0 /usr/sbin/httpd
0.0 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
0.0 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
0.0 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
0.0 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
0.0 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
0.0 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
0.0 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
0.0 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
0.0 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
0.0 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
0.0 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
0.0 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
0.0 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
0.0 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
0.0 sshd: shaun [priv]
0.0 sshd: shaun@pts/0
0.0 -bash
0.0 sudo ps -eo pcpu,args --sort=-%cpu
0.0 ps -eo pcpu,args --sort=-%cpu
0.0 /sbin/udevd -d
0.0 /sbin/udevd -d
-
- Posts: 58
- Joined: Mon Jan 09, 2017 9:06 am
Re: Sporadic 'Connection refused' errors in 4.2.4
Code: Select all
313MB httpd
60MB mysqld
26MB php
23MB nagios
16MB python
7MB sshd
6MB snmpd
5MB rsyslogd
4MB hald
4MB crlhelper
3MB qmgr
3MB mingetty
3MB udevd
3MB bacula-dir
3MB master
3MB showq
3MB pickup
3MB nss_pcache
3MB bacula-fd
3MB abrt-dump-oops
3MB crond
3MB bacula-sd
3MB abrtd
2MB sh
2MB ntpd
2MB rrdtool
2MB awk
2MB bash
2MB sort
2MB mysqld_safe
2MB init
1MB hald-addon-inpu
1MB certmonger
1MB hald-runner
1MB dbus-daemon
1MB hald-addon-acpi
1MB ps
1MB xinetd
1MB auditd
1MB less
1MB dnsmasq
1MB irqbalance
1MB acpid
0MB atd
0MB xinetd,1
0MB watchdog/7,1
0MB watchdog/7
0MB watchdog/6,1
0MB watchdog/6
0MB watchdog/5,1
0MB watchdog/5
0MB watchdog/4,1
0MB watchdog/4
0MB watchdog/3,1
0MB watchdog/3
0MB watchdog/2,1
0MB watchdog/2
0MB watchdog/1,1
0MB watchdog/1
0MB watchdog/0,1
0MB watchdog/0
0MB usbhid_resumer,1
0MB usbhid_resumer
0MB udevd,1
0MB sync_supers,1
0MB sync_supers
0MB stopper/7,1
0MB stopper/7
0MB stopper/6,1
0MB stopper/6
0MB stopper/5,1
0MB stopper/5
0MB stopper/4,1
0MB stopper/4
0MB stopper/3,1
0MB stopper/3
0MB stopper/2,1
0MB stopper/2
0MB stopper/1,1
0MB stopper/1
0MB stopper/0,1
0MB stopper/0
0MB sshd,1
0MB sort,1
0MB snmpd,1
0MB showq,1
0MB sh,1
0MB scsi_eh_0,1
0MB scsi_eh_0
0MB rsyslogd,1
0MB rrdtool,1
0MB qmgr,1
0MB python,1
0MB ps,1
0MB poll_mpt2sas0_s,1
0MB poll_mpt2sas0_s
0MB pm,1
0MB pm
0MB pickup,1
0MB php,1
0MB ntpd,1
0MB nss_pcache,1
0MB netns,1
0MB netns
0MB nagios,1
0MB mysqld_safe,1
0MB mysqld,1
0MB mingetty,1
0MB migration/7,1
0MB migration/7
0MB migration/6,1
0MB migration/6
0MB migration/5,1
0MB migration/5
0MB migration/4,1
0MB migration/4
0MB migration/3,1
0MB migration/3
0MB migration/2,1
0MB migration/2
0MB migration/1,1
0MB migration/1
0MB migration/0,1
0MB migration/0
0MB md_misc/7,1
0MB md_misc/7
0MB md_misc/6,1
0MB md_misc/6
0MB md_misc/5,1
0MB md_misc/5
0MB md_misc/4,1
0MB md_misc/4
0MB md_misc/3,1
0MB md_misc/3
0MB md_misc/2,1
0MB md_misc/2
0MB md_misc/1,1
0MB md_misc/1
0MB md_misc/0,1
0MB md_misc/0
0MB md/7,1
0MB md/7
0MB md/6,1
0MB md/6
0MB md/5,1
0MB md/5
0MB md/4,1
0MB md/4
0MB md/3,1
0MB md/3
0MB md/2,1
0MB md/2
0MB md/1,1
0MB md/1
0MB md/0,1
0MB md/0
0MB master,1
0MB linkwatch,1
0MB linkwatch
0MB less,1
0MB kthrotld/7,1
0MB kthrotld/7
0MB kthrotld/6,1
0MB kthrotld/6
0MB kthrotld/5,1
0MB kthrotld/5
0MB kthrotld/4,1
0MB kthrotld/4
0MB kthrotld/3,1
0MB kthrotld/3
0MB kthrotld/2,1
0MB kthrotld/2
0MB kthrotld/1,1
0MB kthrotld/1
0MB kthrotld/0,1
0MB kthrotld/0
0MB kthreadd,1
0MB kthreadd
0MB kswapd0,1
0MB kswapd0
0MB ksuspend_usbd,1
0MB ksuspend_usbd
0MB kstriped,1
0MB kstriped
0MB ksoftirqd/7,1
0MB ksoftirqd/7
0MB ksoftirqd/6,1
0MB ksoftirqd/6
0MB ksoftirqd/5,1
0MB ksoftirqd/5
0MB ksoftirqd/4,1
0MB ksoftirqd/4
0MB ksoftirqd/3,1
0MB ksoftirqd/3
0MB ksoftirqd/2,1
0MB ksoftirqd/2
0MB ksoftirqd/1,1
0MB ksoftirqd/1
0MB ksoftirqd/0,1
0MB ksoftirqd/0
0MB ksmd,1
0MB ksmd
0MB kseriod,1
0MB kseriod
0MB kpsmoused,1
0MB kpsmoused
0MB kondemand/7,1
0MB kondemand/7
0MB kondemand/6,1
0MB kondemand/6
0MB kondemand/5,1
0MB kondemand/5
0MB kondemand/4,1
0MB kondemand/4
0MB kondemand/3,1
0MB kondemand/3
0MB kondemand/2,1
0MB kondemand/2
0MB kondemand/1,1
0MB kondemand/1
0MB kondemand/0,1
0MB kondemand/0
0MB kipmi0,1
0MB kipmi0
0MB kintegrityd/7,1
0MB kintegrityd/7
0MB kintegrityd/6,1
0MB kintegrityd/6
0MB kintegrityd/5,1
0MB kintegrityd/5
0MB kintegrityd/4,1
0MB kintegrityd/4
0MB kintegrityd/3,1
0MB kintegrityd/3
0MB kintegrityd/2,1
0MB kintegrityd/2
0MB kintegrityd/1,1
0MB kintegrityd/1
0MB kintegrityd/0,1
0MB kintegrityd/0
0MB khungtaskd,1
0MB khungtaskd
0MB khugepaged,1
0MB khugepaged
0MB khubd,1
0MB khubd
0MB khelper,1
0MB khelper
0MB kdmremove,1
0MB kdmremove
0MB kdmflush,1
0MB kdmflush
0MB kblockd/7,1
0MB kblockd/7
0MB kblockd/6,1
0MB kblockd/6
0MB kblockd/5,1
0MB kblockd/5
0MB kblockd/4,1
0MB kblockd/4
0MB kblockd/3,1
0MB kblockd/3
0MB kblockd/2,1
0MB kblockd/2
0MB kblockd/1,1
0MB kblockd/1
0MB kblockd/0,1
0MB kblockd/0
0MB kauditd,1
0MB kauditd
0MB kacpi_notify,1
0MB kacpi_notify
0MB kacpi_hotplug,1
0MB kacpi_hotplug
0MB kacpid,1
0MB kacpid
0MB jbd2/sda1-8,1
0MB jbd2/sda1-8
0MB jbd2/dm-2-8,1
0MB jbd2/dm-2-8
0MB jbd2/dm-1-8,1
0MB jbd2/dm-1-8
0MB irqbalance,1
0MB init,1
0MB httpd,1
0MB hald-runner,1
0MB hald-addon-inpu,1
0MB hald-addon-acpi,1
0MB hald,1
0MB fw_event_mpt2sa,1
0MB fw_event_mpt2sa
0MB flush-253:2,1
0MB flush-253:2
0MB flush-253:1,1
0MB flush-253:1
0MB ext4-dio-unwrit,1
0MB ext4-dio-unwrit
0MB events_power_ef,1
0MB events_power_ef
0MB events_long/7,1
0MB events_long/7
0MB events_long/6,1
0MB events_long/6
0MB events_long/5,1
0MB events_long/5
0MB events_long/4,1
0MB events_long/4
0MB events_long/3,1
0MB events_long/3
0MB events_long/2,1
0MB events_long/2
0MB events_long/1,1
0MB events_long/1
0MB events_long/0,1
0MB events_long/0
0MB events/7,1
0MB events/7
0MB events/6,1
0MB events/6
0MB events/5,1
0MB events/5
0MB events/4,1
0MB events/4
0MB events/3,1
0MB events/3
0MB events/2,1
0MB events/2
0MB events/1,1
0MB events/1
0MB events/0,1
0MB events/0
0MB dnsmasq,1
0MB deferwq,1
0MB deferwq
0MB dbus-daemon,1
0MB crypto/7,1
0MB crypto/7
0MB crypto/6,1
0MB crypto/6
0MB crypto/5,1
0MB crypto/5
0MB crypto/4,1
0MB crypto/4
0MB crypto/3,1
0MB crypto/3
0MB crypto/2,1
0MB crypto/2
0MB crypto/1,1
0MB crypto/1
0MB crypto/0,1
0MB crypto/0
0MB crond,1
0MB crlhelper,1
0MB COMMAND,1
0MB COMMAND
0MB cgroup,1
0MB cgroup
0MB certmonger,1
0MB bdi-default,1
0MB bdi-default
0MB bash,1
0MB bacula-sd,1
0MB bacula-fd,1
0MB bacula-dir,1
0MB awk,1
0MB auditd,1
0MB atd,1
0MB ata_sff/7,1
0MB ata_sff/7
0MB ata_sff/6,1
0MB ata_sff/6
0MB ata_sff/5,1
0MB ata_sff/5
0MB ata_sff/4,1
0MB ata_sff/4
0MB ata_sff/3,1
0MB ata_sff/3
0MB ata_sff/2,1
0MB ata_sff/2
0MB ata_sff/1,1
0MB ata_sff/1
0MB ata_sff/0,1
0MB ata_sff/0
0MB ata_aux,1
0MB ata_aux
0MB async/mgr,1
0MB async/mgr
0MB aio/7,1
0MB aio/7
0MB aio/6,1
0MB aio/6
0MB aio/5,1
0MB aio/5
0MB aio/4,1
0MB aio/4
0MB aio/3,1
0MB aio/3
0MB aio/2,1
0MB aio/2
0MB aio/1,1
0MB aio/1
0MB aio/0,1
0MB aio/0
0MB acpid,1
0MB abrt-dump-oops,1
0MB abrtd,1
Re: Sporadic 'Connection refused' errors in 4.2.4
Can you post the full /var/log/message and the nagios.log file when the timeout happened so we can view them and get better details on what is happening?
If you don't want to post them, can you PM them to me?
If you don't want to post them, can you PM them to me?
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
- Posts: 58
- Joined: Mon Jan 09, 2017 9:06 am
Re: Sporadic 'Connection refused' errors in 4.2.4
Ok, next time this happens I will forward these to you via PM.
Thanks.
Thanks.
Re: Sporadic 'Connection refused' errors in 4.2.4
The only thing I saw in the nagios.log file at the same time the error was in the messages log is the following.
Take a look at the notification settings and verify that they are correct.
Code: Select all
SERVICE NOTIFICATION: external;www.xxx.xxx;HTTPS check;CRITICAL;notify-service-by-email;connect to address www.xxx.xxx and port 443: Connection refused
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
- Posts: 58
- Joined: Mon Jan 09, 2017 9:06 am
Re: Sporadic 'Connection refused' errors in 4.2.4
Can we redact hostnames from that please.
-
- Posts: 58
- Joined: Mon Jan 09, 2017 9:06 am
Re: Sporadic 'Connection refused' errors in 4.2.4
Also, how do you mean by check notifications? The emails and SMS that are sent seem to have the correct information.
Thanks.
Thanks.