Page 1 of 1
Ack en downtime not working sometimes
Posted: Thu Jun 29, 2017 4:23 am
by WillemDH
Hello,
Had it several times last week and this week. Two times today. All of a sudden Nagios is no longer able to acknowledge problems, to schedule downtime, even a force check seems to not work. The solution is to apply a configuration, but as that takes 2 minutes, is really problematic.
What can I do to troubleshoot this?
Grtz
Willem
Re: Ack en downtime not working sometimes
Posted: Thu Jun 29, 2017 11:04 am
by SteveBeauchemin
Willem,
While in this state, look at the "ipcs -q" and I bet it is a big number.
The apply config drops all the pending DB data, and stars over.
I saw this myself yesterday. Just had to be patient. Until I couldn't be.
Not a solution, but is a possible explanation.
As far as troubleshooting - if this is ipcs - while this is happening, use systemctl stop httpd and see if ipcs starts to decrease versus increase.
If it does start to go down, then you know it is a Web based activity that is a possible issue. I have seen this too very recently
and have been driven to explore the ajax refresh rates in the Nagios GUI. I have made some changes, and my stuff is actually
running pretty good now. Of course, by saying this I am now jinxing it.
Once again - if this is ipcs...
In one SSH window, in the OS, as the nagios user, I run this to see what is up.
Then do the
and watch for the number to start decreasing. Then you know it is GUI related.
If it is httpd related, PM me and I'll tell you all my secrets.
Good Luck.
Steve B
Re: Ack en downtime not working sometimes
Posted: Thu Jun 29, 2017 11:58 am
by tgriep
Thanks SteveBeauchemin for the help.
Another possible cause is that there are multiple nagios processes running and when you try and ACK or force a check, it is trying to run the command on the wrong one.
How many host and service checks is the system running?
If it happens again, please run the following as root and post the output so we can see what is running on the server.
Code: Select all
ps -ef --cols=300
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Re: Ack en downtime not working sometimes
Posted: Thu Jun 29, 2017 12:34 pm
by WillemDH
Thanks a lot for the tips guys. I'll definitely investigate more following your instructions when I have the issue the next time. So I had it two times this morning, then I rebooted the Nagios server and it kept working during the day. Let's see how this goes.
The system has 1200 hosts and 22000 services.
Grtz
Re: Ack en downtime not working sometimes
Posted: Thu Jun 29, 2017 4:27 pm
by tgriep
OK, let us know what you find out the next time the issue happens.
Re: Ack en downtime not working sometimes
Posted: Thu Nov 09, 2017 6:20 am
by WillemDH
Ok, I had this issue two times today. After the first time I restarted nagios service after which it was solved for a few hours. But now it's back again.
So I checked the ipcs queue which does not seem to be very large and dros to 0 regularely.
Code: Select all
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Nagios Core 4.2.4
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 12-07-2016
License: GPL
Website: https://www.nagios.org
Reading configuration data...
Read main config file okay...
Read object config files okay...
Running pre-flight check on configuration data...
Checking objects...
Warning: Service 'SHP_Health' on host 'X' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Warning: Service 'SHP_Sitecollections' on host 'X' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Checked 25281 services.
Checked 1238 hosts.
Checked 226 host groups.
Checked 98 service groups.
Checked 110 contacts.
Checked 22 contact groups.
Checked 356 commands.
Checked 123 time periods.
Checked 0 host escalations.
Checked 0 service escalations.
Checking for circular paths...
Checked 1238 hosts
Checked 0 service dependencies
Checked 0 host dependencies
Checked 123 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...
Total Warnings: 2
Total Errors: 0
Code: Select all
ps -ef --cols=300
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 Oct26 ? 00:00:01 /sbin/init
root 2 0 0 Oct26 ? 00:00:00 [kthreadd]
root 3 2 0 Oct26 ? 00:02:49 [migration/0]
root 4 2 0 Oct26 ? 00:19:25 [ksoftirqd/0]
root 5 2 0 Oct26 ? 00:00:00 [stopper/0]
root 6 2 0 Oct26 ? 00:00:04 [watchdog/0]
root 7 2 0 Oct26 ? 00:02:24 [migration/1]
root 8 2 0 Oct26 ? 00:00:00 [stopper/1]
root 9 2 0 Oct26 ? 00:00:46 [ksoftirqd/1]
root 10 2 0 Oct26 ? 00:00:01 [watchdog/1]
root 11 2 0 Oct26 ? 00:02:03 [migration/2]
root 12 2 0 Oct26 ? 00:00:00 [stopper/2]
root 13 2 0 Oct26 ? 00:00:37 [ksoftirqd/2]
root 14 2 0 Oct26 ? 00:00:00 [watchdog/2]
root 15 2 0 Oct26 ? 00:04:38 [migration/3]
root 16 2 0 Oct26 ? 00:00:00 [stopper/3]
root 17 2 0 Oct26 ? 00:18:07 [ksoftirqd/3]
root 18 2 0 Oct26 ? 00:00:03 [watchdog/3]
root 19 2 0 Oct26 ? 00:02:22 [migration/4]
root 20 2 0 Oct26 ? 00:00:00 [stopper/4]
root 21 2 0 Oct26 ? 00:01:03 [ksoftirqd/4]
root 22 2 0 Oct26 ? 00:00:00 [watchdog/4]
root 23 2 0 Oct26 ? 00:01:50 [migration/5]
root 24 2 0 Oct26 ? 00:00:00 [stopper/5]
root 25 2 0 Oct26 ? 00:00:45 [ksoftirqd/5]
root 26 2 0 Oct26 ? 00:00:00 [watchdog/5]
root 27 2 0 Oct26 ? 00:01:28 [events/0]
root 28 2 0 Oct26 ? 00:00:29 [events/1]
root 29 2 0 Oct26 ? 00:00:31 [events/2]
root 30 2 0 Oct26 ? 00:05:14 [events/3]
root 31 2 0 Oct26 ? 00:02:17 [events/4]
root 32 2 0 Oct26 ? 00:01:49 [events/5]
root 33 2 0 Oct26 ? 00:00:00 [events/0]
root 34 2 0 Oct26 ? 00:00:00 [events/1]
root 35 2 0 Oct26 ? 00:00:00 [events/2]
root 36 2 0 Oct26 ? 00:00:00 [events/3]
root 37 2 0 Oct26 ? 00:00:00 [events/4]
root 38 2 0 Oct26 ? 00:00:00 [events/5]
root 39 2 0 Oct26 ? 00:00:00 [events_long/0]
root 40 2 0 Oct26 ? 00:00:00 [events_long/1]
root 41 2 0 Oct26 ? 00:00:00 [events_long/2]
root 42 2 0 Oct26 ? 00:00:00 [events_long/3]
root 43 2 0 Oct26 ? 00:00:00 [events_long/4]
root 44 2 0 Oct26 ? 00:00:00 [events_long/5]
root 45 2 0 Oct26 ? 00:00:00 [events_power_ef]
root 46 2 0 Oct26 ? 00:00:00 [events_power_ef]
root 47 2 0 Oct26 ? 00:00:00 [events_power_ef]
root 48 2 0 Oct26 ? 00:00:00 [events_power_ef]
root 49 2 0 Oct26 ? 00:00:00 [events_power_ef]
root 50 2 0 Oct26 ? 00:00:00 [events_power_ef]
root 51 2 0 Oct26 ? 00:00:00 [cgroup]
root 52 2 0 Oct26 ? 00:00:00 [khelper]
root 53 2 0 Oct26 ? 00:00:00 [netns]
root 54 2 0 Oct26 ? 00:00:00 [async/mgr]
root 55 2 0 Oct26 ? 00:00:00 [pm]
root 56 2 0 Oct26 ? 00:00:03 [sync_supers]
root 57 2 0 Oct26 ? 00:00:00 [bdi-default]
root 58 2 0 Oct26 ? 00:00:00 [kintegrityd/0]
root 59 2 0 Oct26 ? 00:00:00 [kintegrityd/1]
root 60 2 0 Oct26 ? 00:00:00 [kintegrityd/2]
root 61 2 0 Oct26 ? 00:00:00 [kintegrityd/3]
root 62 2 0 Oct26 ? 00:00:00 [kintegrityd/4]
root 63 2 0 Oct26 ? 00:00:00 [kintegrityd/5]
root 64 2 0 Oct26 ? 00:11:35 [kblockd/0]
root 65 2 0 Oct26 ? 00:00:33 [kblockd/1]
root 66 2 0 Oct26 ? 00:00:17 [kblockd/2]
root 67 2 0 Oct26 ? 00:10:26 [kblockd/3]
root 68 2 0 Oct26 ? 00:00:34 [kblockd/4]
root 69 2 0 Oct26 ? 00:00:18 [kblockd/5]
root 70 2 0 Oct26 ? 00:00:00 [kacpid]
root 71 2 0 Oct26 ? 00:00:00 [kacpi_notify]
root 72 2 0 Oct26 ? 00:00:00 [kacpi_hotplug]
root 73 2 0 Oct26 ? 00:00:00 [ata_aux]
root 74 2 0 Oct26 ? 00:00:00 [ata_sff/0]
root 75 2 0 Oct26 ? 00:00:00 [ata_sff/1]
root 76 2 0 Oct26 ? 00:00:00 [ata_sff/2]
root 77 2 0 Oct26 ? 00:00:00 [ata_sff/3]
root 78 2 0 Oct26 ? 00:00:00 [ata_sff/4]
root 79 2 0 Oct26 ? 00:00:00 [ata_sff/5]
root 80 2 0 Oct26 ? 00:00:00 [ksuspend_usbd]
root 81 2 0 Oct26 ? 00:00:00 [khubd]
root 82 2 0 Oct26 ? 00:00:00 [kseriod]
root 83 2 0 Oct26 ? 00:00:00 [md/0]
root 84 2 0 Oct26 ? 00:00:00 [md/1]
root 85 2 0 Oct26 ? 00:00:00 [md/2]
root 86 2 0 Oct26 ? 00:00:00 [md/3]
root 87 2 0 Oct26 ? 00:00:00 [md/4]
root 88 2 0 Oct26 ? 00:00:00 [md/5]
root 89 2 0 Oct26 ? 00:00:00 [md_misc/0]
root 90 2 0 Oct26 ? 00:00:00 [md_misc/1]
root 91 2 0 Oct26 ? 00:00:00 [md_misc/2]
root 92 2 0 Oct26 ? 00:00:00 [md_misc/3]
root 93 2 0 Oct26 ? 00:00:00 [md_misc/4]
root 94 2 0 Oct26 ? 00:00:00 [md_misc/5]
root 95 2 0 Oct26 ? 00:00:00 [linkwatch]
root 98 2 0 Oct26 ? 00:00:00 [khungtaskd]
root 99 2 0 Oct26 ? 00:07:48 [kswapd0]
root 100 2 0 Oct26 ? 00:00:00 [ksmd]
root 101 2 0 Oct26 ? 00:01:25 [khugepaged]
root 102 2 0 Oct26 ? 00:00:00 [aio/0]
root 103 2 0 Oct26 ? 00:00:00 [aio/1]
root 104 2 0 Oct26 ? 00:00:00 [aio/2]
root 105 2 0 Oct26 ? 00:00:00 [aio/3]
root 106 2 0 Oct26 ? 00:00:00 [aio/4]
root 107 2 0 Oct26 ? 00:00:00 [aio/5]
root 108 2 0 Oct26 ? 00:00:00 [crypto/0]
root 109 2 0 Oct26 ? 00:00:00 [crypto/1]
root 110 2 0 Oct26 ? 00:00:00 [crypto/2]
root 111 2 0 Oct26 ? 00:00:00 [crypto/3]
root 112 2 0 Oct26 ? 00:00:00 [crypto/4]
root 113 2 0 Oct26 ? 00:00:00 [crypto/5]
root 120 2 0 Oct26 ? 00:00:00 [kthrotld/0]
root 121 2 0 Oct26 ? 00:00:00 [kthrotld/1]
root 122 2 0 Oct26 ? 00:00:00 [kthrotld/2]
root 123 2 0 Oct26 ? 00:00:00 [kthrotld/3]
root 124 2 0 Oct26 ? 00:00:00 [kthrotld/4]
root 125 2 0 Oct26 ? 00:00:00 [kthrotld/5]
root 126 2 0 Oct26 ? 00:00:00 [pciehpd]
root 128 2 0 Oct26 ? 00:00:00 [kpsmoused]
root 129 2 0 Oct26 ? 00:00:00 [usbhid_resumer]
root 130 2 0 Oct26 ? 00:00:00 [deferwq]
root 163 2 0 Oct26 ? 00:00:00 [kdmremove]
root 164 2 0 Oct26 ? 00:00:00 [kstriped]
root 194 2 0 Oct26 ? 00:00:00 [ttm_swap]
root 427 2 0 Oct26 ? 00:00:00 [scsi_eh_0]
root 428 2 0 Oct26 ? 00:00:00 [scsi_eh_1]
root 433 2 0 Oct26 ? 00:00:17 [mpt_poll_0]
root 434 2 0 Oct26 ? 00:00:00 [mpt/0]
root 435 2 0 Oct26 ? 00:00:00 [scsi_eh_2]
root 485 2 0 Oct26 ? 00:00:00 [kdmflush]
root 487 2 0 Oct26 ? 00:00:00 [kdmflush]
root 505 2 0 Oct26 ? 00:31:33 [jbd2/dm-1-8]
root 506 2 0 Oct26 ? 00:00:00 [ext4-dio-unwrit]
root 604 1 0 Oct26 ? 00:00:00 /sbin/udevd -d
root 802 2 0 Oct26 ? 00:00:13 [vmmemctl]
root 986 604 0 Oct26 ? 00:00:00 /sbin/udevd -d
root 996 604 0 Oct26 ? 00:00:00 /sbin/udevd -d
root 1062 2 0 Oct26 ? 00:00:13 [kauditd]
root 1266 2 0 Oct26 ? 01:59:31 [flush-253:1]
root 1332 1 0 Oct26 ? 00:00:26 auditd
root 1390 1 0 Oct26 ? 00:00:36 irqbalance --pid=/var/run/irqbalance.pid
rpc 1408 1 0 Oct26 ? 00:00:00 rpcbind
rpcuser 1430 1 0 Oct26 ? 00:00:00 rpc.statd
root 1465 2 0 Oct26 ? 00:00:00 [rpciod/0]
root 1466 2 0 Oct26 ? 00:00:00 [rpciod/1]
root 1467 2 0 Oct26 ? 00:00:00 [rpciod/2]
root 1468 2 0 Oct26 ? 00:00:00 [rpciod/3]
root 1469 2 0 Oct26 ? 00:00:00 [rpciod/4]
root 1470 2 0 Oct26 ? 00:00:00 [rpciod/5]
root 1475 1 0 Oct26 ? 00:00:00 rpc.idmapd
dbus 1501 1 0 Oct26 ? 00:00:00 dbus-daemon --system
avahi 1515 1 0 Oct26 ? 00:00:00 avahi-daemon: running [srvnagios01.local]
avahi 1516 1515 0 Oct26 ? 00:00:00 avahi-daemon: chroot helper
root 1546 2 0 Oct26 ? 00:00:00 [kslowd000]
root 1547 2 0 Oct26 ? 00:00:00 [kslowd001]
root 1548 2 0 Oct26 ? 00:00:00 [nfsiod]
root 1549 2 0 Oct26 ? 00:00:00 [lockd]
root 1561 1 0 Oct26 ? 00:00:00 /usr/sbin/acpid
68 1573 1 0 Oct26 ? 00:00:04 hald
root 1574 1573 0 Oct26 ? 00:00:00 hald-runner
root 1606 1574 0 Oct26 ? 00:00:00 hald-addon-input: Listening on /dev/input/event2 /dev/input/event0
68 1616 1574 0 Oct26 ? 00:00:00 hald-addon-acpi: listening on acpid socket /var/run/acpid.socket
root 1662 1 0 Oct26 ? 00:00:00 rpc.rquotad
root 1667 1 0 Oct26 ? 00:00:00 rpc.mountd
root 1673 2 0 Oct26 ? 00:00:00 [nfsd4]
root 1674 2 0 Oct26 ? 00:00:00 [nfsd4_callbacks]
root 1675 2 0 Oct26 ? 00:00:00 [nfsd]
root 1676 2 0 Oct26 ? 00:00:00 [nfsd]
root 1677 2 0 Oct26 ? 00:00:00 [nfsd]
root 1678 2 0 Oct26 ? 00:00:00 [nfsd]
root 1679 2 0 Oct26 ? 00:00:00 [nfsd]
root 1680 2 0 Oct26 ? 00:00:00 [nfsd]
root 1681 2 0 Oct26 ? 00:00:00 [nfsd]
root 1682 2 0 Oct26 ? 00:00:00 [nfsd]
apache 3325 7809 2 09:52 ? 00:03:14 /usr/sbin/httpd
postgres 3732 7618 0 09:52 ? 00:00:05 postgres: nagiosxi nagiosxi IPADDRESS(43130) idle
postfix 6218 7709 0 11:50 ? 00:00:00 pickup -l -t fifo -u
root 7360 1 0 Oct26 ? 00:00:13 /usr/sbin/snmptrapd -Lsd -On -p /var/run/snmptrapd.pid
root 7390 1 0 Oct26 ? 00:00:00 /usr/sbin/sshd
root 7401 1 0 Oct26 ? 00:00:27 xinetd -stayalive -pidfile /var/run/xinetd.pid
ntp 7412 1 0 Oct26 ? 00:00:03 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
postgres 7618 1 0 Oct26 ? 00:02:02 /usr/bin/postmaster -p 5432 -D /var/lib/pgsql/data
postgres 7623 7618 0 Oct26 ? 00:00:19 postgres: logger process
postgres 7625 7618 0 Oct26 ? 00:02:26 postgres: writer process
postgres 7626 7618 0 Oct26 ? 00:01:35 postgres: wal writer process
postgres 7627 7618 0 Oct26 ? 00:00:34 postgres: autovacuum launcher process
postgres 7628 7618 0 Oct26 ? 00:02:23 postgres: stats collector process
root 7709 1 0 Oct26 ? 00:00:03 /usr/libexec/postfix/master
postfix 7721 7709 0 Oct26 ? 00:00:00 qmgr -l -t fifo -u
root 7727 1 0 Oct26 ? 00:09:18 /usr/bin/vmtoolsd
root 7753 1 0 Oct26 ? 00:00:00 /usr/sbin/abrtd
root 7778 1 0 Oct26 ? 00:00:13 abrt-dump-oops -d /var/spool/abrt -rwx /var/log/messages
gearmand 7790 1 3 Oct26 ? 13:22:16 /usr/sbin/gearmand -d --worker-wakeup=10 --retention-file=/tmp/gearmand.retention -q retention --log-file=/var/log/gearmand/gearmand.log
root 7809 1 0 Oct26 ? 00:00:26 /usr/sbin/httpd
root 7852 1 0 Oct26 ? 00:00:33 crond
nagios 7868 1 0 Oct26 ? 00:01:04 /usr/local/nagios/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg
root 7882 1 0 Oct26 ? 00:00:00 /usr/sbin/atd
nagios 7897 1 0 Oct26 ? 00:00:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
root 7929 1 0 Oct26 ? 00:00:00 /usr/share/filebeat/bin/filebeat-god -r / -n -p /var/run/filebeat.pid -- /usr/share/filebeat/bin/filebeat -c /etc/filebeat/filebeat.yml -path.home /usr/share/filebeat -path.config /etc/filebeat -path.data /var/lib/filebeat -path.logs /var/log/filebeat
root 7931 7929 0 Oct26 ? 00:26:22 /usr/share/filebeat/bin/filebeat -c /etc/filebeat/filebeat.yml -path.home /usr/share/filebeat -path.config /etc/filebeat -path.data /var/lib/filebeat -path.logs /var/log/filebeat
ajaxterm 7976 1 0 Oct26 ? 00:03:49 python /usr/share/ajaxterm/ajaxterm.py --daemon --port=8022 --uid=ajaxterm
telegraf 8115 1 5 Oct26 ? 19:38:13 /usr/bin/telegraf -pidfile /var/run/telegraf/telegraf.pid -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
root 8125 1 0 Oct26 tty1 00:00:00 /sbin/mingetty /dev/tty1
root 8129 1 0 Oct26 tty2 00:00:00 /sbin/mingetty /dev/tty2
root 8132 1 0 Oct26 tty3 00:00:00 /sbin/mingetty /dev/tty3
root 8135 1 0 Oct26 tty4 00:00:00 /sbin/mingetty /dev/tty4
root 8137 1 0 Oct26 tty5 00:00:00 /sbin/mingetty /dev/tty5
root 8142 1 0 Oct26 tty6 00:00:00 /sbin/mingetty /dev/tty6
nagios 15610 7401 0 Oct26 ? 00:00:00 nsca -c /usr/local/nagios/etc/nsca.cfg --inetd
apache 16144 7809 3 11:52 ? 00:00:36 /usr/sbin/httpd
apache 16147 7809 3 11:52 ? 00:00:36 /usr/sbin/httpd
apache 16148 7809 3 11:52 ? 00:00:36 /usr/sbin/httpd
postgres 16369 7618 0 11:52 ? 00:00:00 postgres: nagiosxi nagiosxi IPADDRESS(52710) idle
postgres 16400 7618 0 11:52 ? 00:00:00 postgres: nagiosxi nagiosxi IPADDRESS(53048) idle
postgres 16403 7618 0 11:52 ? 00:00:00 postgres: nagiosxi nagiosxi IPADDRESS(53052) idle
apache 16728 7809 3 11:39 ? 00:00:59 /usr/sbin/httpd
postgres 16904 7618 0 11:39 ? 00:00:01 postgres: nagiosxi nagiosxi IPADDRESS(49974) idle
apache 19980 7809 2 10:34 ? 00:02:00 /usr/sbin/httpd
postgres 19984 7618 0 10:34 ? 00:00:03 postgres: nagiosxi nagiosxi IPADDRESS(37504) idle
apache 23927 7809 3 11:40 ? 00:00:52 /usr/sbin/httpd
postgres 23969 7618 0 11:40 ? 00:00:01 postgres: nagiosxi nagiosxi IPADDRESS(56832) idle
nagios 26421 7401 0 10:23 ? 00:00:00 nsca -c /usr/local/nagios/etc/nsca.cfg --inetd
nagios 29285 1 6 10:24 ? 00:07:03 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 29287 29285 0 10:24 ? 00:00:09 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 29288 29285 0 10:24 ? 00:00:09 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 29289 29285 0 10:24 ? 00:00:09 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 29290 29285 0 10:24 ? 00:00:09 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 29291 29285 0 10:24 ? 00:00:09 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 29292 29285 0 10:24 ? 00:00:09 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 29294 29285 0 10:24 ? 00:00:09 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 29295 29285 0 10:24 ? 00:00:09 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 29296 29285 0 10:24 ? 00:00:09 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 29299 7897 0 10:24 ? 00:00:28 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios 29300 29299 21 10:24 ? 00:23:08 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
root 29334 1 0 10:24 ? 00:00:00 /usr/bin/perl /usr/sbin/snmptt --daemon
snmptt 29335 29334 0 10:24 ? 00:00:05 /usr/bin/perl /usr/sbin/snmptt --daemon
nagios 29492 29285 0 10:24 ? 00:00:00 [nagios] <defunct>
nagios 32550 29292 0 12:08 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -t 120 -c check_lin_service -a -o linux -s filebeat
apache 33283 7809 2 11:55 ? 00:00:21 /usr/sbin/httpd
postgres 33334 7618 0 11:55 ? 00:00:00 postgres: nagiosxi nagiosxi IPADDRESS(40070) idle
apache 33354 7809 3 11:42 ? 00:00:53 /usr/sbin/httpd
postgres 33561 7618 0 11:42 ? 00:00:01 postgres: nagiosxi nagiosxi IPADDRESS(37554) idle
nagios 34221 29287 0 12:08 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe_v3 -2 -P 10240 -H IPADDRESS -p 5666 -t 180 -c check_ms_win_network_load -a -I IPADDRESS -t 2
nagios 35250 29296 0 12:09 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -t 120 -c check_swap -a -w 50 -c 20
apache 35797 7809 2 11:56 ? 00:00:18 /usr/sbin/httpd
postgres 35881 7618 0 11:56 ? 00:00:00 postgres: nagiosxi nagiosxi IPADDRESS(42602) idle
nagios 35903 29291 0 12:09 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -t 120 -c check_open_files -a -w 30 -c 50
nagios 36917 29294 0 12:09 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe -H gateway -p 5666 -t 300 -c check_fts_enclosure_power -a -E enclosure
nagios 37322 29291 0 12:09 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -t 120 -c check_mem -a -w 1 -c 0 -nocache
apache 37683 7809 2 10:12 ? 00:02:40 /usr/sbin/httpd
nagios 38009 29288 0 12:09 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 120 -c check_ms_win_network_connections -a -H localhost
nagios 38024 29291 0 12:09 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe -H gateway -p 5666 -t 60 -c check_tel_trunkstat -a -trunknode "OXE" -telnetHostIP "IPADDRESS" -telnetlogin "mtcl" -telnetpwd "PASSWORD" -telnetprompt "ac-oxe-01-cpua" -telnetcommand "trkstat 20"
postgres 38026 7618 0 10:13 ? 00:00:04 postgres: nagiosxi nagiosxi IPADDRESS(49132) idle
nagios 38490 29289 0 12:09 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe_v3 -2 -P 10240 -H IPADDRESS -p 5666 -t 60 -c check_ms_win_disk_load -a -ms 5 -dl C
nagios 38492 29295 0 12:09 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 60 -c check_ms_win_disk_load -a -ms 5 -dl C
nagios 38508 29294 0 12:09 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 60 -c check_ms_win_disk_load -a -ms 5 -dl C
nagios 38596 29287 0 12:09 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 60 -c check_ms_win_disk_load -a -ms 5 -dl M
nagios 38635 29289 0 12:09 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 60 -c check_ms_win_disk_load -a -ms 5 -dl D
nagios 38664 29292 0 12:09 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 180 -c check_ms_win_network_load -a -I IPADDRESS -t 2
nagios 38665 29291 0 12:09 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 60 -c check_ms_win_disk_load -a -ms 5 -dl C
nagios 38694 29289 0 12:09 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 180 -c check_ms_win_network_load -a -I IPADDRESS -t 2
nagios 38701 29291 0 12:09 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe_v3 -2 -P 10240 -H IPADDRESS -p 5666 -t 1800 -c check_ms_win_updates -a -wd 120 -cd 150
nagios 38706 29289 0 12:09 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 60 -c check_ms_win_disk_load -a -ms 5 -dl C
nagios 38707 29290 0 12:09 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 60 -c check_ms_win_disk_load -a -ms 5 -dl C
nagios 38715 29289 0 12:09 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -t 120 -c check_cpu_stats -a -w 85 -c 95
nagios 38724 29291 0 12:09 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 180 -c check_ms_win_network_load -a -I IPADDRESS -t 2
nagios 38727 29287 0 12:09 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 60 -c check_ms_win_disk_load -a -ms 5 -dl C
nagios 38728 29288 0 12:09 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 180 -c check_ms_win_network_load -a -I IPADDRESS -t 2
nagios 38731 29295 0 12:09 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 60 -c check_ms_win_disk_load -a -ms 5 -dl C
nagios 38733 29291 0 12:09 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 60 -c check_ms_win_disk_load -a -ms 5 -dl L
nagios 38794 29287 0 12:10 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 120 -c check_ms_ctx_loadevaluator
nagios 38801 29295 0 12:10 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe_v3 -2 -P 10240 -H IPADDRESS -p 5666 -t 60 -c check_ms_win_disk_load -a -ms 5 -dl G
root 38814 7852 0 12:10 ? 00:00:00 CROND
root 38815 7852 0 12:10 ? 00:00:00 CROND
root 38816 7852 0 12:10 ? 00:00:00 CROND
root 38817 7852 0 12:10 ? 00:00:00 CROND
root 38818 7852 0 12:10 ? 00:00:00 CROND
root 38819 7852 0 12:10 ? 00:00:00 CROND
nagios 38823 38818 0 12:10 ? 00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php >> /usr/local/nagiosxi/var/cmdsubsys.log 2>&1
nagios 38827 38819 0 12:10 ? 00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php >> /usr/local/nagiosxi/var/sysstat.log 2>&1
nagios 38828 38814 0 12:10 ? 00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php >> /usr/local/nagiosxi/var/perfdataproc.log 2>&1
nagios 38829 38815 0 12:10 ? 00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php >> /usr/local/nagiosxi/var/feedproc.log 2>&1
nagios 38831 38816 0 12:10 ? 00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/event_handler.php >> /usr/local/nagiosxi/var/event_handler.log 2>&1
nagios 38832 38817 0 12:10 ? 00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php >> /usr/local/nagiosxi/var/eventman.log 2>&1
nagios 38833 38823 2 12:10 ? 00:00:00 /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php
nagios 38836 38828 3 12:10 ? 00:00:00 /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php
nagios 38837 38832 3 12:10 ? 00:00:00 /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php
nagios 38839 38831 2 12:10 ? 00:00:00 /usr/bin/php -q /usr/local/nagiosxi/cron/event_handler.php
nagios 38841 38827 5 12:10 ? 00:00:00 /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php
nagios 38843 38829 3 12:10 ? 00:00:00 /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php
postgres 38847 7618 0 12:10 ? 00:00:00 postgres: nagiosxi nagiosxi IPADDRESS(47068) idle
nagios 38859 29292 0 12:10 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 180 -c check_ms_win_network_load -a -I IPADDRESS -t 2
postgres 38866 7618 0 12:10 ? 00:00:00 postgres: nagiosxi nagiosxi IPADDRESS(47082) idle
postgres 38878 7618 0 12:10 ? 00:00:00 postgres: nagiosxi nagiosxi IPADDRESS(47086) idle
postgres 38901 7618 0 12:10 ? 00:00:00 postgres: nagiosxi nagiosxi IPADDRESS(47088) idle
postgres 38902 7618 0 12:10 ? 00:00:00 postgres: nagiosxi nagiosxi IPADDRESS(47092) idle
postgres 38908 7618 0 12:10 ? 00:00:00 postgres: nagiosxi nagiosxi IPADDRESS(47094) idle
nagios 38939 29287 0 12:10 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 60 -c check_ms_win_disk_load -a -ms 5 -dl M
nagios 38941 29289 0 12:10 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -t 120 -c check_clu_svc -p 5666
nagios 38942 29290 0 12:10 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 120 -c check_ms_win_network_connections -a -H localhost
nagios 38943 29295 0 12:10 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe_v3 -2 -P 10240 -H IPADDRESS -p 5666 -t 60 -c check_ms_win_disk_load -a -ms 5 -dl L
nagios 38944 29292 0 12:10 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 60 -c check_ms_win_disk_load -a -ms 5 -dl C
nagios 38958 29291 0 12:10 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 180 -c check_ms_win_network_load -a -I IPADDRESS -t 2
nagios 38962 29296 0 12:10 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 120 -c check_ms_win_network_connections -a -H localhost
nagios 39008 38841 0 12:10 ? 00:00:00 sh -c /usr/bin/iostat -c 5 2 | tail --lines=2 | head --lines=1 | awk '{ print $1,$2,$3,$4,$5,$6 }'
nagios 39009 39008 0 12:10 ? 00:00:00 /usr/bin/iostat -c 5 2
nagios 39010 39008 0 12:10 ? 00:00:00 tail --lines=2
nagios 39011 39008 0 12:10 ? 00:00:00 head --lines=1
nagios 39012 39008 0 12:10 ? 00:00:00 awk { print $1,$2,$3,$4,$5,$6 }
nagios 39083 29288 0 12:10 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe_v3 -2 -P 10240 -H IPADDRESS -p 5666 -t 60 -c check_ms_win_disk_load -a -ms 5 -dl K
nagios 39088 29291 0 12:10 ? 00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -t 30 -c CheckCounter -a Counter:Aggregate Delivery Queue Length=\MSExchangeTransport Queues(_total)\Aggregate Delivery Queue Length (All Queues) ShowAll MaxWarn=250 MaxCrit=1000
root 39113 60674 1 12:10 pts/0 00:00:00 ps -ef --cols=300
apache 39789 7809 3 11:43 ? 00:00:52 /usr/sbin/httpd
postgres 40085 7618 0 11:43 ? 00:00:01 postgres: nagiosxi nagiosxi IPADDRESS(42968) idle
apache 40783 7809 2 11:57 ? 00:00:16 /usr/sbin/httpd
apache 40805 7809 2 11:57 ? 00:00:16 /usr/sbin/httpd
apache 40806 7809 2 11:57 ? 00:00:17 /usr/sbin/httpd
postgres 40810 7618 0 11:57 ? 00:00:00 postgres: nagiosxi nagiosxi IPADDRESS(46726) idle
postgres 40813 7618 0 11:57 ? 00:00:00 postgres: nagiosxi nagiosxi IPADDRESS(46728) idle
postgres 40816 7618 0 11:57 ? 00:00:00 postgres: nagiosxi nagiosxi IPADDRESS(46730) idle
postfix 44016 7709 0 10:52 ? 00:00:00 showq -t unix -u
root 44346 1 0 Oct26 ? 00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/var/lib/mysql/mysql.sock --pid-file=/var/run/mysqld/mysqld.pid --basedir=/usr --user=mysql
mysql 44469 44346 14 Oct26 ? 1-23:48:12 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
root 59919 1 0 Oct27 ? 00:00:45 /sbin/rsyslogd -i /var/run/syslogd.pid -c 5
root 60672 7390 0 09:12 ? 00:00:00 sshd: root@pts/0
root 60674 60672 0 09:12 pts/0 00:00:00 -bash
apache 62241 7809 2 10:30 ? 00:02:12 /usr/sbin/httpd
postgres 62752 7618 0 10:30 ? 00:00:03 postgres: nagiosxi nagiosxi IPADDRESS(45396) idle
I will reboot the server now in the hope htat helps for a longer period. Feel free to make any other suggestions to
- monitor this so at least we known when exactly it happens
- prevent this from happening
Grtz
Willem
Re: Ack en downtime not working sometimes
Posted: Thu Nov 09, 2017 5:35 pm
by tgriep
With 25000 services on the server, you may want to look in to splitting the server in to 2 to lighten the load.
Without knowing what the error was when it failed, it is hard to give some advice.
Try this for sure and see if this helps.
https://support.nagios.com/kb/article/n ... eeded.html
If the numbers have been increased, go larger.
Can you describe what the issue was in as much detail as you can?
If you go in to the Core interface, does the server have the same issue?
Re: Ack en downtime not working sometimes
Posted: Fri Nov 10, 2017 6:10 am
by WillemDH
Tom,
There is not much to describe about the issue, except that setting acknowledgments and downtimes were not working anymore.
Neither disk io, nor cpu load or memory usage was very high on my Nagios XI production server. Still we are running in weird performance related issues. After multiple years of using Nagios it seems to me that there is a design issue in Nagios preventing larger setups to be workable. I see no reason why I would be limited to 20k objects per server if no relevant resources are above any warning thresholds.
Next time I'll test if setting downtime from the core interface works. And I will try to configure the suggestions done in
https://support.nagios.com/kb/article/n ... eeded.html , hopefully that helps somehow.
Tx & grtz
Willem
Re: Ack en downtime not working sometimes
Posted: Fri Nov 10, 2017 2:35 pm
by dwasswa
Thanks
@tgriep,
@WillemDH, please try that and let us know if it solves your issue.
Re: Ack en downtime not working sometimes
Posted: Tue Nov 28, 2017 2:40 pm
by kyang
Hey WillemDH, just checking in to see if your issue is resolved?
Let us know if you have any more questions!