Page 1 of 1

Ack en downtime not working sometimes

Posted: Thu Jun 29, 2017 4:23 am
by WillemDH
Hello,

Had it several times last week and this week. Two times today. All of a sudden Nagios is no longer able to acknowledge problems, to schedule downtime, even a force check seems to not work. The solution is to apply a configuration, but as that takes 2 minutes, is really problematic.

What can I do to troubleshoot this?

Grtz

Willem

Re: Ack en downtime not working sometimes

Posted: Thu Jun 29, 2017 11:04 am
by SteveBeauchemin
Willem,

While in this state, look at the "ipcs -q" and I bet it is a big number.
The apply config drops all the pending DB data, and stars over.
I saw this myself yesterday. Just had to be patient. Until I couldn't be.

Not a solution, but is a possible explanation.

As far as troubleshooting - if this is ipcs - while this is happening, use systemctl stop httpd and see if ipcs starts to decrease versus increase.
If it does start to go down, then you know it is a Web based activity that is a possible issue. I have seen this too very recently
and have been driven to explore the ajax refresh rates in the Nagios GUI. I have made some changes, and my stuff is actually
running pretty good now. Of course, by saying this I am now jinxing it.

Once again - if this is ipcs...
In one SSH window, in the OS, as the nagios user, I run this to see what is up.

Code: Select all

watch -n 5 "ipcs -q"
Then do the

Code: Select all

systemctl stop httpd
and watch for the number to start decreasing. Then you know it is GUI related.

If it is httpd related, PM me and I'll tell you all my secrets.

Good Luck.

Steve B

Re: Ack en downtime not working sometimes

Posted: Thu Jun 29, 2017 11:58 am
by tgriep
Thanks SteveBeauchemin for the help.

Another possible cause is that there are multiple nagios processes running and when you try and ACK or force a check, it is trying to run the command on the wrong one.

How many host and service checks is the system running?
If it happens again, please run the following as root and post the output so we can see what is running on the server.

Code: Select all

ps -ef --cols=300
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Re: Ack en downtime not working sometimes

Posted: Thu Jun 29, 2017 12:34 pm
by WillemDH
Thanks a lot for the tips guys. I'll definitely investigate more following your instructions when I have the issue the next time. So I had it two times this morning, then I rebooted the Nagios server and it kept working during the day. Let's see how this goes.

The system has 1200 hosts and 22000 services.

Grtz

Re: Ack en downtime not working sometimes

Posted: Thu Jun 29, 2017 4:27 pm
by tgriep
OK, let us know what you find out the next time the issue happens.

Re: Ack en downtime not working sometimes

Posted: Thu Nov 09, 2017 6:20 am
by WillemDH
Ok, I had this issue two times today. After the first time I restarted nagios service after which it was solved for a few hours. But now it's back again.

So I checked the ipcs queue which does not seem to be very large and dros to 0 regularely.

Code: Select all

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Nagios Core 4.2.4
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 12-07-2016
License: GPL

Website: https://www.nagios.org
Reading configuration data...
   Read main config file okay...
   Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
Warning: Service 'SHP_Health' on host 'X'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
Warning: Service 'SHP_Sitecollections' on host 'X'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective notification interval will be that of the check interval.
        Checked 25281 services.
        Checked 1238 hosts.
        Checked 226 host groups.
        Checked 98 service groups.
        Checked 110 contacts.
        Checked 22 contact groups.
        Checked 356 commands.
        Checked 123 time periods.
        Checked 0 host escalations.
        Checked 0 service escalations.
Checking for circular paths...
        Checked 1238 hosts
        Checked 0 service dependencies
        Checked 0 host dependencies
        Checked 123 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 2
Total Errors:   0

Code: Select all

ps -ef --cols=300
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 Oct26 ?        00:00:01 /sbin/init
root         2     0  0 Oct26 ?        00:00:00 [kthreadd]
root         3     2  0 Oct26 ?        00:02:49 [migration/0]
root         4     2  0 Oct26 ?        00:19:25 [ksoftirqd/0]
root         5     2  0 Oct26 ?        00:00:00 [stopper/0]
root         6     2  0 Oct26 ?        00:00:04 [watchdog/0]
root         7     2  0 Oct26 ?        00:02:24 [migration/1]
root         8     2  0 Oct26 ?        00:00:00 [stopper/1]
root         9     2  0 Oct26 ?        00:00:46 [ksoftirqd/1]
root        10     2  0 Oct26 ?        00:00:01 [watchdog/1]
root        11     2  0 Oct26 ?        00:02:03 [migration/2]
root        12     2  0 Oct26 ?        00:00:00 [stopper/2]
root        13     2  0 Oct26 ?        00:00:37 [ksoftirqd/2]
root        14     2  0 Oct26 ?        00:00:00 [watchdog/2]
root        15     2  0 Oct26 ?        00:04:38 [migration/3]
root        16     2  0 Oct26 ?        00:00:00 [stopper/3]
root        17     2  0 Oct26 ?        00:18:07 [ksoftirqd/3]
root        18     2  0 Oct26 ?        00:00:03 [watchdog/3]
root        19     2  0 Oct26 ?        00:02:22 [migration/4]
root        20     2  0 Oct26 ?        00:00:00 [stopper/4]
root        21     2  0 Oct26 ?        00:01:03 [ksoftirqd/4]
root        22     2  0 Oct26 ?        00:00:00 [watchdog/4]
root        23     2  0 Oct26 ?        00:01:50 [migration/5]
root        24     2  0 Oct26 ?        00:00:00 [stopper/5]
root        25     2  0 Oct26 ?        00:00:45 [ksoftirqd/5]
root        26     2  0 Oct26 ?        00:00:00 [watchdog/5]
root        27     2  0 Oct26 ?        00:01:28 [events/0]
root        28     2  0 Oct26 ?        00:00:29 [events/1]
root        29     2  0 Oct26 ?        00:00:31 [events/2]
root        30     2  0 Oct26 ?        00:05:14 [events/3]
root        31     2  0 Oct26 ?        00:02:17 [events/4]
root        32     2  0 Oct26 ?        00:01:49 [events/5]
root        33     2  0 Oct26 ?        00:00:00 [events/0]
root        34     2  0 Oct26 ?        00:00:00 [events/1]
root        35     2  0 Oct26 ?        00:00:00 [events/2]
root        36     2  0 Oct26 ?        00:00:00 [events/3]
root        37     2  0 Oct26 ?        00:00:00 [events/4]
root        38     2  0 Oct26 ?        00:00:00 [events/5]
root        39     2  0 Oct26 ?        00:00:00 [events_long/0]
root        40     2  0 Oct26 ?        00:00:00 [events_long/1]
root        41     2  0 Oct26 ?        00:00:00 [events_long/2]
root        42     2  0 Oct26 ?        00:00:00 [events_long/3]
root        43     2  0 Oct26 ?        00:00:00 [events_long/4]
root        44     2  0 Oct26 ?        00:00:00 [events_long/5]
root        45     2  0 Oct26 ?        00:00:00 [events_power_ef]
root        46     2  0 Oct26 ?        00:00:00 [events_power_ef]
root        47     2  0 Oct26 ?        00:00:00 [events_power_ef]
root        48     2  0 Oct26 ?        00:00:00 [events_power_ef]
root        49     2  0 Oct26 ?        00:00:00 [events_power_ef]
root        50     2  0 Oct26 ?        00:00:00 [events_power_ef]
root        51     2  0 Oct26 ?        00:00:00 [cgroup]
root        52     2  0 Oct26 ?        00:00:00 [khelper]
root        53     2  0 Oct26 ?        00:00:00 [netns]
root        54     2  0 Oct26 ?        00:00:00 [async/mgr]
root        55     2  0 Oct26 ?        00:00:00 [pm]
root        56     2  0 Oct26 ?        00:00:03 [sync_supers]
root        57     2  0 Oct26 ?        00:00:00 [bdi-default]
root        58     2  0 Oct26 ?        00:00:00 [kintegrityd/0]
root        59     2  0 Oct26 ?        00:00:00 [kintegrityd/1]
root        60     2  0 Oct26 ?        00:00:00 [kintegrityd/2]
root        61     2  0 Oct26 ?        00:00:00 [kintegrityd/3]
root        62     2  0 Oct26 ?        00:00:00 [kintegrityd/4]
root        63     2  0 Oct26 ?        00:00:00 [kintegrityd/5]
root        64     2  0 Oct26 ?        00:11:35 [kblockd/0]
root        65     2  0 Oct26 ?        00:00:33 [kblockd/1]
root        66     2  0 Oct26 ?        00:00:17 [kblockd/2]
root        67     2  0 Oct26 ?        00:10:26 [kblockd/3]
root        68     2  0 Oct26 ?        00:00:34 [kblockd/4]
root        69     2  0 Oct26 ?        00:00:18 [kblockd/5]
root        70     2  0 Oct26 ?        00:00:00 [kacpid]
root        71     2  0 Oct26 ?        00:00:00 [kacpi_notify]
root        72     2  0 Oct26 ?        00:00:00 [kacpi_hotplug]
root        73     2  0 Oct26 ?        00:00:00 [ata_aux]
root        74     2  0 Oct26 ?        00:00:00 [ata_sff/0]
root        75     2  0 Oct26 ?        00:00:00 [ata_sff/1]
root        76     2  0 Oct26 ?        00:00:00 [ata_sff/2]
root        77     2  0 Oct26 ?        00:00:00 [ata_sff/3]
root        78     2  0 Oct26 ?        00:00:00 [ata_sff/4]
root        79     2  0 Oct26 ?        00:00:00 [ata_sff/5]
root        80     2  0 Oct26 ?        00:00:00 [ksuspend_usbd]
root        81     2  0 Oct26 ?        00:00:00 [khubd]
root        82     2  0 Oct26 ?        00:00:00 [kseriod]
root        83     2  0 Oct26 ?        00:00:00 [md/0]
root        84     2  0 Oct26 ?        00:00:00 [md/1]
root        85     2  0 Oct26 ?        00:00:00 [md/2]
root        86     2  0 Oct26 ?        00:00:00 [md/3]
root        87     2  0 Oct26 ?        00:00:00 [md/4]
root        88     2  0 Oct26 ?        00:00:00 [md/5]
root        89     2  0 Oct26 ?        00:00:00 [md_misc/0]
root        90     2  0 Oct26 ?        00:00:00 [md_misc/1]
root        91     2  0 Oct26 ?        00:00:00 [md_misc/2]
root        92     2  0 Oct26 ?        00:00:00 [md_misc/3]
root        93     2  0 Oct26 ?        00:00:00 [md_misc/4]
root        94     2  0 Oct26 ?        00:00:00 [md_misc/5]
root        95     2  0 Oct26 ?        00:00:00 [linkwatch]
root        98     2  0 Oct26 ?        00:00:00 [khungtaskd]
root        99     2  0 Oct26 ?        00:07:48 [kswapd0]
root       100     2  0 Oct26 ?        00:00:00 [ksmd]
root       101     2  0 Oct26 ?        00:01:25 [khugepaged]
root       102     2  0 Oct26 ?        00:00:00 [aio/0]
root       103     2  0 Oct26 ?        00:00:00 [aio/1]
root       104     2  0 Oct26 ?        00:00:00 [aio/2]
root       105     2  0 Oct26 ?        00:00:00 [aio/3]
root       106     2  0 Oct26 ?        00:00:00 [aio/4]
root       107     2  0 Oct26 ?        00:00:00 [aio/5]
root       108     2  0 Oct26 ?        00:00:00 [crypto/0]
root       109     2  0 Oct26 ?        00:00:00 [crypto/1]
root       110     2  0 Oct26 ?        00:00:00 [crypto/2]
root       111     2  0 Oct26 ?        00:00:00 [crypto/3]
root       112     2  0 Oct26 ?        00:00:00 [crypto/4]
root       113     2  0 Oct26 ?        00:00:00 [crypto/5]
root       120     2  0 Oct26 ?        00:00:00 [kthrotld/0]
root       121     2  0 Oct26 ?        00:00:00 [kthrotld/1]
root       122     2  0 Oct26 ?        00:00:00 [kthrotld/2]
root       123     2  0 Oct26 ?        00:00:00 [kthrotld/3]
root       124     2  0 Oct26 ?        00:00:00 [kthrotld/4]
root       125     2  0 Oct26 ?        00:00:00 [kthrotld/5]
root       126     2  0 Oct26 ?        00:00:00 [pciehpd]
root       128     2  0 Oct26 ?        00:00:00 [kpsmoused]
root       129     2  0 Oct26 ?        00:00:00 [usbhid_resumer]
root       130     2  0 Oct26 ?        00:00:00 [deferwq]
root       163     2  0 Oct26 ?        00:00:00 [kdmremove]
root       164     2  0 Oct26 ?        00:00:00 [kstriped]
root       194     2  0 Oct26 ?        00:00:00 [ttm_swap]
root       427     2  0 Oct26 ?        00:00:00 [scsi_eh_0]
root       428     2  0 Oct26 ?        00:00:00 [scsi_eh_1]
root       433     2  0 Oct26 ?        00:00:17 [mpt_poll_0]
root       434     2  0 Oct26 ?        00:00:00 [mpt/0]
root       435     2  0 Oct26 ?        00:00:00 [scsi_eh_2]
root       485     2  0 Oct26 ?        00:00:00 [kdmflush]
root       487     2  0 Oct26 ?        00:00:00 [kdmflush]
root       505     2  0 Oct26 ?        00:31:33 [jbd2/dm-1-8]
root       506     2  0 Oct26 ?        00:00:00 [ext4-dio-unwrit]
root       604     1  0 Oct26 ?        00:00:00 /sbin/udevd -d
root       802     2  0 Oct26 ?        00:00:13 [vmmemctl]
root       986   604  0 Oct26 ?        00:00:00 /sbin/udevd -d
root       996   604  0 Oct26 ?        00:00:00 /sbin/udevd -d
root      1062     2  0 Oct26 ?        00:00:13 [kauditd]
root      1266     2  0 Oct26 ?        01:59:31 [flush-253:1]
root      1332     1  0 Oct26 ?        00:00:26 auditd
root      1390     1  0 Oct26 ?        00:00:36 irqbalance --pid=/var/run/irqbalance.pid
rpc       1408     1  0 Oct26 ?        00:00:00 rpcbind
rpcuser   1430     1  0 Oct26 ?        00:00:00 rpc.statd
root      1465     2  0 Oct26 ?        00:00:00 [rpciod/0]
root      1466     2  0 Oct26 ?        00:00:00 [rpciod/1]
root      1467     2  0 Oct26 ?        00:00:00 [rpciod/2]
root      1468     2  0 Oct26 ?        00:00:00 [rpciod/3]
root      1469     2  0 Oct26 ?        00:00:00 [rpciod/4]
root      1470     2  0 Oct26 ?        00:00:00 [rpciod/5]
root      1475     1  0 Oct26 ?        00:00:00 rpc.idmapd
dbus      1501     1  0 Oct26 ?        00:00:00 dbus-daemon --system
avahi     1515     1  0 Oct26 ?        00:00:00 avahi-daemon: running [srvnagios01.local]
avahi     1516  1515  0 Oct26 ?        00:00:00 avahi-daemon: chroot helper
root      1546     2  0 Oct26 ?        00:00:00 [kslowd000]
root      1547     2  0 Oct26 ?        00:00:00 [kslowd001]
root      1548     2  0 Oct26 ?        00:00:00 [nfsiod]
root      1549     2  0 Oct26 ?        00:00:00 [lockd]
root      1561     1  0 Oct26 ?        00:00:00 /usr/sbin/acpid
68        1573     1  0 Oct26 ?        00:00:04 hald
root      1574  1573  0 Oct26 ?        00:00:00 hald-runner
root      1606  1574  0 Oct26 ?        00:00:00 hald-addon-input: Listening on /dev/input/event2 /dev/input/event0
68        1616  1574  0 Oct26 ?        00:00:00 hald-addon-acpi: listening on acpid socket /var/run/acpid.socket
root      1662     1  0 Oct26 ?        00:00:00 rpc.rquotad
root      1667     1  0 Oct26 ?        00:00:00 rpc.mountd
root      1673     2  0 Oct26 ?        00:00:00 [nfsd4]
root      1674     2  0 Oct26 ?        00:00:00 [nfsd4_callbacks]
root      1675     2  0 Oct26 ?        00:00:00 [nfsd]
root      1676     2  0 Oct26 ?        00:00:00 [nfsd]
root      1677     2  0 Oct26 ?        00:00:00 [nfsd]
root      1678     2  0 Oct26 ?        00:00:00 [nfsd]
root      1679     2  0 Oct26 ?        00:00:00 [nfsd]
root      1680     2  0 Oct26 ?        00:00:00 [nfsd]
root      1681     2  0 Oct26 ?        00:00:00 [nfsd]
root      1682     2  0 Oct26 ?        00:00:00 [nfsd]
apache    3325  7809  2 09:52 ?        00:03:14 /usr/sbin/httpd
postgres  3732  7618  0 09:52 ?        00:00:05 postgres: nagiosxi nagiosxi IPADDRESS(43130) idle
postfix   6218  7709  0 11:50 ?        00:00:00 pickup -l -t fifo -u
root      7360     1  0 Oct26 ?        00:00:13 /usr/sbin/snmptrapd -Lsd -On -p /var/run/snmptrapd.pid
root      7390     1  0 Oct26 ?        00:00:00 /usr/sbin/sshd
root      7401     1  0 Oct26 ?        00:00:27 xinetd -stayalive -pidfile /var/run/xinetd.pid
ntp       7412     1  0 Oct26 ?        00:00:03 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
postgres  7618     1  0 Oct26 ?        00:02:02 /usr/bin/postmaster -p 5432 -D /var/lib/pgsql/data
postgres  7623  7618  0 Oct26 ?        00:00:19 postgres: logger process
postgres  7625  7618  0 Oct26 ?        00:02:26 postgres: writer process
postgres  7626  7618  0 Oct26 ?        00:01:35 postgres: wal writer process
postgres  7627  7618  0 Oct26 ?        00:00:34 postgres: autovacuum launcher process
postgres  7628  7618  0 Oct26 ?        00:02:23 postgres: stats collector process
root      7709     1  0 Oct26 ?        00:00:03 /usr/libexec/postfix/master
postfix   7721  7709  0 Oct26 ?        00:00:00 qmgr -l -t fifo -u
root      7727     1  0 Oct26 ?        00:09:18 /usr/bin/vmtoolsd
root      7753     1  0 Oct26 ?        00:00:00 /usr/sbin/abrtd
root      7778     1  0 Oct26 ?        00:00:13 abrt-dump-oops -d /var/spool/abrt -rwx /var/log/messages
gearmand  7790     1  3 Oct26 ?        13:22:16 /usr/sbin/gearmand -d --worker-wakeup=10 --retention-file=/tmp/gearmand.retention -q retention --log-file=/var/log/gearmand/gearmand.log
root      7809     1  0 Oct26 ?        00:00:26 /usr/sbin/httpd
root      7852     1  0 Oct26 ?        00:00:33 crond
nagios    7868     1  0 Oct26 ?        00:01:04 /usr/local/nagios/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg
root      7882     1  0 Oct26 ?        00:00:00 /usr/sbin/atd
nagios    7897     1  0 Oct26 ?        00:00:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
root      7929     1  0 Oct26 ?        00:00:00 /usr/share/filebeat/bin/filebeat-god -r / -n -p /var/run/filebeat.pid -- /usr/share/filebeat/bin/filebeat -c /etc/filebeat/filebeat.yml -path.home /usr/share/filebeat -path.config /etc/filebeat -path.data /var/lib/filebeat -path.logs /var/log/filebeat
root      7931  7929  0 Oct26 ?        00:26:22 /usr/share/filebeat/bin/filebeat -c /etc/filebeat/filebeat.yml -path.home /usr/share/filebeat -path.config /etc/filebeat -path.data /var/lib/filebeat -path.logs /var/log/filebeat
ajaxterm  7976     1  0 Oct26 ?        00:03:49 python /usr/share/ajaxterm/ajaxterm.py --daemon --port=8022 --uid=ajaxterm
telegraf  8115     1  5 Oct26 ?        19:38:13 /usr/bin/telegraf -pidfile /var/run/telegraf/telegraf.pid -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
root      8125     1  0 Oct26 tty1     00:00:00 /sbin/mingetty /dev/tty1
root      8129     1  0 Oct26 tty2     00:00:00 /sbin/mingetty /dev/tty2
root      8132     1  0 Oct26 tty3     00:00:00 /sbin/mingetty /dev/tty3
root      8135     1  0 Oct26 tty4     00:00:00 /sbin/mingetty /dev/tty4
root      8137     1  0 Oct26 tty5     00:00:00 /sbin/mingetty /dev/tty5
root      8142     1  0 Oct26 tty6     00:00:00 /sbin/mingetty /dev/tty6
nagios   15610  7401  0 Oct26 ?        00:00:00 nsca -c /usr/local/nagios/etc/nsca.cfg --inetd
apache   16144  7809  3 11:52 ?        00:00:36 /usr/sbin/httpd
apache   16147  7809  3 11:52 ?        00:00:36 /usr/sbin/httpd
apache   16148  7809  3 11:52 ?        00:00:36 /usr/sbin/httpd
postgres 16369  7618  0 11:52 ?        00:00:00 postgres: nagiosxi nagiosxi IPADDRESS(52710) idle
postgres 16400  7618  0 11:52 ?        00:00:00 postgres: nagiosxi nagiosxi IPADDRESS(53048) idle
postgres 16403  7618  0 11:52 ?        00:00:00 postgres: nagiosxi nagiosxi IPADDRESS(53052) idle
apache   16728  7809  3 11:39 ?        00:00:59 /usr/sbin/httpd
postgres 16904  7618  0 11:39 ?        00:00:01 postgres: nagiosxi nagiosxi IPADDRESS(49974) idle
apache   19980  7809  2 10:34 ?        00:02:00 /usr/sbin/httpd
postgres 19984  7618  0 10:34 ?        00:00:03 postgres: nagiosxi nagiosxi IPADDRESS(37504) idle
apache   23927  7809  3 11:40 ?        00:00:52 /usr/sbin/httpd
postgres 23969  7618  0 11:40 ?        00:00:01 postgres: nagiosxi nagiosxi IPADDRESS(56832) idle
nagios   26421  7401  0 10:23 ?        00:00:00 nsca -c /usr/local/nagios/etc/nsca.cfg --inetd
nagios   29285     1  6 10:24 ?        00:07:03 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   29287 29285  0 10:24 ?        00:00:09 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   29288 29285  0 10:24 ?        00:00:09 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   29289 29285  0 10:24 ?        00:00:09 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   29290 29285  0 10:24 ?        00:00:09 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   29291 29285  0 10:24 ?        00:00:09 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   29292 29285  0 10:24 ?        00:00:09 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   29294 29285  0 10:24 ?        00:00:09 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   29295 29285  0 10:24 ?        00:00:09 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   29296 29285  0 10:24 ?        00:00:09 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   29299  7897  0 10:24 ?        00:00:28 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios   29300 29299 21 10:24 ?        00:23:08 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
root     29334     1  0 10:24 ?        00:00:00 /usr/bin/perl /usr/sbin/snmptt --daemon
snmptt   29335 29334  0 10:24 ?        00:00:05 /usr/bin/perl /usr/sbin/snmptt --daemon
nagios   29492 29285  0 10:24 ?        00:00:00 [nagios] <defunct>
nagios   32550 29292  0 12:08 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -t 120 -c check_lin_service -a -o linux -s filebeat
apache   33283  7809  2 11:55 ?        00:00:21 /usr/sbin/httpd
postgres 33334  7618  0 11:55 ?        00:00:00 postgres: nagiosxi nagiosxi IPADDRESS(40070) idle
apache   33354  7809  3 11:42 ?        00:00:53 /usr/sbin/httpd
postgres 33561  7618  0 11:42 ?        00:00:01 postgres: nagiosxi nagiosxi IPADDRESS(37554) idle
nagios   34221 29287  0 12:08 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe_v3 -2 -P 10240 -H IPADDRESS -p 5666 -t 180 -c check_ms_win_network_load -a -I IPADDRESS -t 2
nagios   35250 29296  0 12:09 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -t 120 -c check_swap -a -w 50 -c 20
apache   35797  7809  2 11:56 ?        00:00:18 /usr/sbin/httpd
postgres 35881  7618  0 11:56 ?        00:00:00 postgres: nagiosxi nagiosxi IPADDRESS(42602) idle
nagios   35903 29291  0 12:09 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -t 120 -c check_open_files -a -w 30 -c 50
nagios   36917 29294  0 12:09 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe -H gateway -p 5666 -t 300 -c check_fts_enclosure_power -a -E enclosure
nagios   37322 29291  0 12:09 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -t 120 -c check_mem -a -w 1 -c 0 -nocache
apache   37683  7809  2 10:12 ?        00:02:40 /usr/sbin/httpd
nagios   38009 29288  0 12:09 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 120 -c check_ms_win_network_connections -a -H localhost
nagios   38024 29291  0 12:09 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe -H gateway -p 5666 -t 60 -c check_tel_trunkstat -a -trunknode "OXE" -telnetHostIP "IPADDRESS" -telnetlogin "mtcl" -telnetpwd "PASSWORD" -telnetprompt "ac-oxe-01-cpua" -telnetcommand "trkstat 20"
postgres 38026  7618  0 10:13 ?        00:00:04 postgres: nagiosxi nagiosxi IPADDRESS(49132) idle
nagios   38490 29289  0 12:09 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe_v3 -2 -P 10240 -H IPADDRESS -p 5666 -t 60 -c check_ms_win_disk_load -a -ms 5 -dl C
nagios   38492 29295  0 12:09 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 60 -c check_ms_win_disk_load -a -ms 5 -dl C
nagios   38508 29294  0 12:09 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 60 -c check_ms_win_disk_load -a -ms 5 -dl C
nagios   38596 29287  0 12:09 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 60 -c check_ms_win_disk_load -a -ms 5 -dl M
nagios   38635 29289  0 12:09 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 60 -c check_ms_win_disk_load -a -ms 5 -dl D
nagios   38664 29292  0 12:09 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 180 -c check_ms_win_network_load -a -I IPADDRESS -t 2
nagios   38665 29291  0 12:09 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 60 -c check_ms_win_disk_load -a -ms 5 -dl C
nagios   38694 29289  0 12:09 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 180 -c check_ms_win_network_load -a -I IPADDRESS -t 2
nagios   38701 29291  0 12:09 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe_v3 -2 -P 10240 -H IPADDRESS -p 5666 -t 1800 -c check_ms_win_updates -a -wd 120 -cd 150
nagios   38706 29289  0 12:09 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 60 -c check_ms_win_disk_load -a -ms 5 -dl C
nagios   38707 29290  0 12:09 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 60 -c check_ms_win_disk_load -a -ms 5 -dl C
nagios   38715 29289  0 12:09 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -t 120 -c check_cpu_stats -a -w 85 -c 95
nagios   38724 29291  0 12:09 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 180 -c check_ms_win_network_load -a -I IPADDRESS -t 2
nagios   38727 29287  0 12:09 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 60 -c check_ms_win_disk_load -a -ms 5 -dl C
nagios   38728 29288  0 12:09 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 180 -c check_ms_win_network_load -a -I IPADDRESS -t 2
nagios   38731 29295  0 12:09 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 60 -c check_ms_win_disk_load -a -ms 5 -dl C
nagios   38733 29291  0 12:09 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 60 -c check_ms_win_disk_load -a -ms 5 -dl L
nagios   38794 29287  0 12:10 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 120 -c check_ms_ctx_loadevaluator
nagios   38801 29295  0 12:10 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe_v3 -2 -P 10240 -H IPADDRESS -p 5666 -t 60 -c check_ms_win_disk_load -a -ms 5 -dl G
root     38814  7852  0 12:10 ?        00:00:00 CROND
root     38815  7852  0 12:10 ?        00:00:00 CROND
root     38816  7852  0 12:10 ?        00:00:00 CROND
root     38817  7852  0 12:10 ?        00:00:00 CROND
root     38818  7852  0 12:10 ?        00:00:00 CROND
root     38819  7852  0 12:10 ?        00:00:00 CROND
nagios   38823 38818  0 12:10 ?        00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php >> /usr/local/nagiosxi/var/cmdsubsys.log 2>&1
nagios   38827 38819  0 12:10 ?        00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php >> /usr/local/nagiosxi/var/sysstat.log 2>&1
nagios   38828 38814  0 12:10 ?        00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php >> /usr/local/nagiosxi/var/perfdataproc.log 2>&1
nagios   38829 38815  0 12:10 ?        00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php >> /usr/local/nagiosxi/var/feedproc.log 2>&1
nagios   38831 38816  0 12:10 ?        00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/event_handler.php >> /usr/local/nagiosxi/var/event_handler.log 2>&1
nagios   38832 38817  0 12:10 ?        00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php >> /usr/local/nagiosxi/var/eventman.log 2>&1
nagios   38833 38823  2 12:10 ?        00:00:00 /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php
nagios   38836 38828  3 12:10 ?        00:00:00 /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php
nagios   38837 38832  3 12:10 ?        00:00:00 /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php
nagios   38839 38831  2 12:10 ?        00:00:00 /usr/bin/php -q /usr/local/nagiosxi/cron/event_handler.php
nagios   38841 38827  5 12:10 ?        00:00:00 /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php
nagios   38843 38829  3 12:10 ?        00:00:00 /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php
postgres 38847  7618  0 12:10 ?        00:00:00 postgres: nagiosxi nagiosxi IPADDRESS(47068) idle
nagios   38859 29292  0 12:10 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 180 -c check_ms_win_network_load -a -I IPADDRESS -t 2
postgres 38866  7618  0 12:10 ?        00:00:00 postgres: nagiosxi nagiosxi IPADDRESS(47082) idle
postgres 38878  7618  0 12:10 ?        00:00:00 postgres: nagiosxi nagiosxi IPADDRESS(47086) idle
postgres 38901  7618  0 12:10 ?        00:00:00 postgres: nagiosxi nagiosxi IPADDRESS(47088) idle
postgres 38902  7618  0 12:10 ?        00:00:00 postgres: nagiosxi nagiosxi IPADDRESS(47092) idle
postgres 38908  7618  0 12:10 ?        00:00:00 postgres: nagiosxi nagiosxi IPADDRESS(47094) idle
nagios   38939 29287  0 12:10 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 60 -c check_ms_win_disk_load -a -ms 5 -dl M
nagios   38941 29289  0 12:10 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -t 120 -c check_clu_svc -p 5666
nagios   38942 29290  0 12:10 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 120 -c check_ms_win_network_connections -a -H localhost
nagios   38943 29295  0 12:10 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe_v3 -2 -P 10240 -H IPADDRESS -p 5666 -t 60 -c check_ms_win_disk_load -a -ms 5 -dl L
nagios   38944 29292  0 12:10 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 60 -c check_ms_win_disk_load -a -ms 5 -dl C
nagios   38958 29291  0 12:10 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 180 -c check_ms_win_network_load -a -I IPADDRESS -t 2
nagios   38962 29296  0 12:10 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -p 5666 -t 120 -c check_ms_win_network_connections -a -H localhost
nagios   39008 38841  0 12:10 ?        00:00:00 sh -c /usr/bin/iostat -c 5 2 | tail --lines=2 | head --lines=1 | awk '{ print $1,$2,$3,$4,$5,$6 }'
nagios   39009 39008  0 12:10 ?        00:00:00 /usr/bin/iostat -c 5 2
nagios   39010 39008  0 12:10 ?        00:00:00 tail --lines=2
nagios   39011 39008  0 12:10 ?        00:00:00 head --lines=1
nagios   39012 39008  0 12:10 ?        00:00:00 awk { print $1,$2,$3,$4,$5,$6 }
nagios   39083 29288  0 12:10 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe_v3 -2 -P 10240 -H IPADDRESS -p 5666 -t 60 -c check_ms_win_disk_load -a -ms 5 -dl K
nagios   39088 29291  0 12:10 ?        00:00:00 /usr/local/nagios/libexec/check_nrpe -H IPADDRESS -t 30 -c CheckCounter -a Counter:Aggregate Delivery Queue Length=\MSExchangeTransport Queues(_total)\Aggregate Delivery Queue Length (All Queues) ShowAll MaxWarn=250 MaxCrit=1000
root     39113 60674  1 12:10 pts/0    00:00:00 ps -ef --cols=300
apache   39789  7809  3 11:43 ?        00:00:52 /usr/sbin/httpd
postgres 40085  7618  0 11:43 ?        00:00:01 postgres: nagiosxi nagiosxi IPADDRESS(42968) idle
apache   40783  7809  2 11:57 ?        00:00:16 /usr/sbin/httpd
apache   40805  7809  2 11:57 ?        00:00:16 /usr/sbin/httpd
apache   40806  7809  2 11:57 ?        00:00:17 /usr/sbin/httpd
postgres 40810  7618  0 11:57 ?        00:00:00 postgres: nagiosxi nagiosxi IPADDRESS(46726) idle
postgres 40813  7618  0 11:57 ?        00:00:00 postgres: nagiosxi nagiosxi IPADDRESS(46728) idle
postgres 40816  7618  0 11:57 ?        00:00:00 postgres: nagiosxi nagiosxi IPADDRESS(46730) idle
postfix  44016  7709  0 10:52 ?        00:00:00 showq -t unix -u
root     44346     1  0 Oct26 ?        00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/var/lib/mysql/mysql.sock --pid-file=/var/run/mysqld/mysqld.pid --basedir=/usr --user=mysql
mysql    44469 44346 14 Oct26 ?        1-23:48:12 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
root     59919     1  0 Oct27 ?        00:00:45 /sbin/rsyslogd -i /var/run/syslogd.pid -c 5
root     60672  7390  0 09:12 ?        00:00:00 sshd: root@pts/0
root     60674 60672  0 09:12 pts/0    00:00:00 -bash
apache   62241  7809  2 10:30 ?        00:02:12 /usr/sbin/httpd
postgres 62752  7618  0 10:30 ?        00:00:03 postgres: nagiosxi nagiosxi IPADDRESS(45396) idle
I will reboot the server now in the hope htat helps for a longer period. Feel free to make any other suggestions to
- monitor this so at least we known when exactly it happens
- prevent this from happening

Grtz

Willem

Re: Ack en downtime not working sometimes

Posted: Thu Nov 09, 2017 5:35 pm
by tgriep
With 25000 services on the server, you may want to look in to splitting the server in to 2 to lighten the load.

Without knowing what the error was when it failed, it is hard to give some advice.
Try this for sure and see if this helps.
https://support.nagios.com/kb/article/n ... eeded.html
If the numbers have been increased, go larger.

Can you describe what the issue was in as much detail as you can?

If you go in to the Core interface, does the server have the same issue?

Re: Ack en downtime not working sometimes

Posted: Fri Nov 10, 2017 6:10 am
by WillemDH
Tom,

There is not much to describe about the issue, except that setting acknowledgments and downtimes were not working anymore.
Neither disk io, nor cpu load or memory usage was very high on my Nagios XI production server. Still we are running in weird performance related issues. After multiple years of using Nagios it seems to me that there is a design issue in Nagios preventing larger setups to be workable. I see no reason why I would be limited to 20k objects per server if no relevant resources are above any warning thresholds.
Next time I'll test if setting downtime from the core interface works. And I will try to configure the suggestions done in https://support.nagios.com/kb/article/n ... eeded.html , hopefully that helps somehow.

Tx & grtz

Willem

Re: Ack en downtime not working sometimes

Posted: Fri Nov 10, 2017 2:35 pm
by dwasswa
Thanks @tgriep,

@WillemDH, please try that and let us know if it solves your issue.

Re: Ack en downtime not working sometimes

Posted: Tue Nov 28, 2017 2:40 pm
by kyang
Hey WillemDH, just checking in to see if your issue is resolved?

Let us know if you have any more questions!