High load average with 5.3.3?
-
cbeattie-unitrends
- Posts: 84
- Joined: Mon Oct 10, 2016 2:51 pm
High load average with 5.3.3?
Has anyone noticed a high load average after upgrading to Nagios XI 5.3.3? I upgraded two Nagios XI instances on November 25th, and since then the load average on both has gone way up. I updated the OS around the same time, but I've already rebooted back to the previous Linux kernel with no effect.
Is it safe to try installing 5.3.2 over 5.3.3? I have a pre-5.3.3 snapshot I can revert to, but I'd rather not lose the graph data.
The Nagios hosts have 8 CPUs with 32GB of RAM, running CentOS 7. One has 700+ hosts and almost 16K services, and the other has 600+ hosts and 13.5K services.
Is it safe to try installing 5.3.2 over 5.3.3? I have a pre-5.3.3 snapshot I can revert to, but I'd rather not lose the graph data.
The Nagios hosts have 8 CPUs with 32GB of RAM, running CentOS 7. One has 700+ hosts and almost 16K services, and the other has 600+ hosts and 13.5K services.
You do not have the required permissions to view the files attached to this post.
Last edited by dwhitfield on Mon Dec 05, 2016 4:53 pm, edited 1 time in total.
Reason: marking with green check mark
Reason: marking with green check mark
Re: High load average with 5.3.3?
To the best of my knowledge, we have not had any reports of this behavior. Around the time of the spike, what is being recorded in the event log?
Home > Monitoring Process > Event Log
During the high load /cpu can you post the output of:
From a tech support perspective, it is not recommend to install an older version over new version, but I'm sure there's admins out there who have, perhaps they can chime in with some advice about it.
Home > Monitoring Process > Event Log
During the high load /cpu can you post the output of:
Code: Select all
top
ps -aefBe sure to check out the Knowledgebase for helpful articles and solutions!
-
cbeattie-unitrends
- Posts: 84
- Joined: Mon Oct 10, 2016 2:51 pm
Re: High load average with 5.3.3?
I see a lot of runtime warnings and errors in the event log:
This is what top looks like during high load. Almost all of our checks are SNMP-based, so I'd expect to see that in there a lot.
And here is the ps output:
I cloned a new VM from a snapshot I took of the Nagios host before I upgraded Nagios XI and updated the OS. Its load average is around 3 so far, but it hasn't been running for very long yet.
These are the packages that would be updated if I let yum run. There are a couple PHP updates, so it's possible one of them is part of the problem:
Code: Select all
Information 11/27/2016 22:09 wproc: Core Worker 20378: job 464598 (pid=17919) timed out. Killing it
Service Warning 11/27/2016 22:09 SERVICE ALERT: den-ltr-hrip-c5dbe61954e6;CPU Usage;WARNING;SOFT;1;2 CPU, average load 96.0% > 95% : WARNING
Service Warning 11/27/2016 22:09 SERVICE ALERT: den-ltr-iwu-388ae4da0b90;CPU Usage;WARNING;SOFT;1;2 CPU, average load 96.0% > 95% : WARNING
Information 11/27/2016 22:08 wproc: Core Worker 20376: job 464451 (pid=15837): Dormant child reaped
Service Critical 11/27/2016 22:08 SERVICE ALERT: den-ltr-mrmc-2707288fd0bf;sshd;CRITICAL;SOFT;1;(Service check timed out after 60.01 seconds)
Runtime Warning 11/27/2016 22:08 Warning: Check of service 'sshd' on host 'den-ltr-mrmc-2707288fd0bf' timed out after 60.005s!
Runtime Error 11/27/2016 22:08 wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Runtime Error 11/27/2016 22:08 wproc: host=den-ltr-mrmc-2707288fd0bf; service=sshd;
Runtime Error 11/27/2016 22:08 wproc: CHECK job 464451 from worker Core Worker 20376 timed out after 60.01s
Information 11/27/2016 22:08 wproc: Core Worker 20376: job 464451 (pid=15837) timed out. Killing it
Service Warning 11/27/2016 22:08 SERVICE ALERT: esxi106.unit.den3.loc;esx_CPU;WARNING;SOFT;1;32 CPU, average load 42.7% > 40% : WARNING
Service Warning 11/27/2016 22:08 SERVICE ALERT: den-ltr-lrmhmr-649241bbb91f;CPU Usage;WARNING;SOFT;1;2 CPU, average load 98.0% > 95% : WARNING
Process Information 11/27/2016 22:08 Auto-save of retention data completed successfully.
Information 11/27/2016 22:08 wproc: Core Worker 20367: job 464375 (pid=14770): Dormant child reaped
Runtime Warning 11/27/2016 22:08 Warning: Check of service 'xinetd' on host 'den-ltr91' timed out after 60.006s!
Runtime Error 11/27/2016 22:08 wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Runtime Error 11/27/2016 22:08 wproc: host=den-ltr91; service=xinetd;
Runtime Error 11/27/2016 22:08 wproc: CHECK job 464375 from worker Core Worker 20367 timed out after 60.01s
Information 11/27/2016 22:08 wproc: Core Worker 20367: job 464375 (pid=14770) timed out. Killing it
Information 11/27/2016 22:08 wproc: Core Worker 20368: job 464375 (pid=14769): Dormant child reaped
Code: Select all
[root@den-nagios ~]# top
top - 14:35:46 up 6:44, 1 user, load average: 95.84, 61.28, 46.69
Tasks: 554 total, 296 running, 258 sleeping, 0 stopped, 0 zombie
%Cpu(s): 60.6 us, 6.8 sy, 0.0 ni, 32.5 id, 0.0 wa, 0.0 hi, 0.2 si, 0.0 st
KiB Mem : 32947500 total, 28688868 free, 2793988 used, 1464644 buff/cache
KiB Swap: 16515068 total, 16515068 free, 0 used. 29798440 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7427 root 20 0 0 0 0 R 42.9 0.0 48:27.38 kworker/u16:0
1915 nagios 20 0 61408 41832 1328 R 41.9 0.1 20:38.50 nagios
1748 mysql 20 0 4719360 196528 9764 S 6.3 0.6 28:34.75 mysqld
1943 nagios 20 0 136088 6580 1276 S 5.6 0.0 9:42.63 ndo2db
637 nagios 20 0 434596 23416 9628 R 3.7 0.1 0:00.22 php
2081 nagios 20 0 159140 11660 2396 R 2.3 0.0 0:00.07 check_snmp_proc
2105 nagios 20 0 159272 11712 2452 R 2.3 0.0 0:00.07 check_snmp_proc
2106 nagios 20 0 159272 11716 2452 R 2.3 0.0 0:00.07 check_snmp_proc
2117 nagios 20 0 159272 11716 2452 R 2.3 0.0 0:00.07 check_snmp_proc
2130 nagios 20 0 159272 11716 2452 R 2.3 0.0 0:00.07 check_snmp_proc
2138 nagios 20 0 159272 11712 2452 R 2.3 0.0 0:00.07 check_snmp_proc
2144 nagios 20 0 159272 11712 2452 R 2.3 0.0 0:00.07 check_snmp_proc
2146 nagios 20 0 159272 11712 2452 R 2.3 0.0 0:00.07 check_snmp_proc
2147 nagios 20 0 159272 11716 2452 R 2.3 0.0 0:00.07 check_snmp_proc
2189 nagios 20 0 159140 11716 2452 R 2.3 0.0 0:00.07 check_snmp_proc
2077 nagios 20 0 159140 11656 2396 R 2.0 0.0 0:00.06 check_snmp_proc
2086 nagios 20 0 159140 11664 2396 R 2.0 0.0 0:00.06 check_snmp_proc
2087 nagios 20 0 159140 11660 2396 R 2.0 0.0 0:00.06 check_snmp_proc
2098 nagios 20 0 159140 11660 2396 S 2.0 0.0 0:00.06 check_snmp_proc
2100 nagios 20 0 140656 9208 2116 D 2.0 0.0 0:00.06 check_snmp_proc
2101 nagios 20 0 154548 10920 2200 R 2.0 0.0 0:00.06 check_snmp_proc
2109 nagios 20 0 159140 11656 2396 R 2.0 0.0 0:00.06 check_snmp_proc
2115 nagios 20 0 159140 11664 2396 R 2.0 0.0 0:00.06 check_snmp_proc
2118 nagios 20 0 154548 10916 2200 R 2.0 0.0 0:00.06 check_snmp_proc
2126 nagios 20 0 159140 11660 2396 R 2.0 0.0 0:00.06 check_snmp_proc
2132 nagios 20 0 159140 11656 2396 R 2.0 0.0 0:00.06 check_snmp_proc
2139 nagios 20 0 159140 11660 2396 R 2.0 0.0 0:00.06 check_snmp_proc
2142 nagios 20 0 159140 11660 2396 R 2.0 0.0 0:00.06 check_snmp_proc
2148 nagios 20 0 159140 11660 2396 R 2.0 0.0 0:00.06 check_snmp_proc
2153 nagios 20 0 159140 11656 2396 R 2.0 0.0 0:00.06 check_snmp_proc
2154 nagios 20 0 159140 11660 2396 R 2.0 0.0 0:00.06 check_snmp_proc
2168 nagios 20 0 159272 11716 2452 R 2.0 0.0 0:00.06 check_snmp_proc
2170 nagios 20 0 159140 11712 2452 R 2.0 0.0 0:00.06 check_snmp_proc
2172 nagios 20 0 154548 10920 2200 R 2.0 0.0 0:00.06 check_snmp_proc
2178 nagios 20 0 154548 10924 2200 R 2.0 0.0 0:00.06 check_snmp_proc
2180 nagios 20 0 159140 11660 2396 R 2.0 0.0 0:00.06 check_snmp_proc
2192 nagios 20 0 159140 11660 2396 R 2.0 0.0 0:00.06 check_snmp_proc
2195 nagios 20 0 159272 11716 2452 R 2.0 0.0 0:00.06 check_snmp_proc
2200 nagios 20 0 159140 11656 2396 R 2.0 0.0 0:00.06 check_snmp_proc
2201 nagios 20 0 159140 11660 2396 R 2.0 0.0 0:00.06 check_snmp_proc
2207 nagios 20 0 159140 11660 2396 R 2.0 0.0 0:00.06 check_snmp_proc
2210 nagios 20 0 159140 11660 2396 R 2.0 0.0 0:00.06 check_snmp_proc
2222 nagios 20 0 159140 11656 2396 R 2.0 0.0 0:00.06 check_snmp_proc
2237 nagios 20 0 159140 11660 2396 R 2.0 0.0 0:00.06 check_snmp_proc
2238 nagios 20 0 159140 11712 2452 R 2.0 0.0 0:00.06 check_snmp_proc
2239 nagios 20 0 154548 10920 2200 R 2.0 0.0 0:00.06 check_snmp_proc
2241 nagios 20 0 154944 11544 2304 R 2.0 0.0 0:00.06 check_snmp_proc
Code: Select all
[root@den-nagios ~]# ps -aef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 07:51 ? 00:00:13 /usr/lib/systemd/systemd --switched-root --system --deserialize 21
root 2 0 0 07:51 ? 00:00:00 [kthreadd]
root 3 2 0 07:51 ? 00:01:52 [ksoftirqd/0]
root 5 2 0 07:51 ? 00:00:00 [kworker/0:0H]
root 7 2 0 07:51 ? 00:02:32 [migration/0]
root 8 2 0 07:51 ? 00:00:00 [rcu_bh]
root 9 2 0 07:51 ? 00:00:00 [rcuob/0]
root 10 2 0 07:51 ? 00:00:00 [rcuob/1]
root 11 2 0 07:51 ? 00:00:00 [rcuob/2]
root 12 2 0 07:51 ? 00:00:00 [rcuob/3]
root 13 2 0 07:51 ? 00:00:00 [rcuob/4]
root 14 2 0 07:51 ? 00:00:00 [rcuob/5]
root 15 2 0 07:51 ? 00:00:00 [rcuob/6]
root 16 2 0 07:51 ? 00:00:00 [rcuob/7]
root 17 2 0 07:51 ? 00:00:51 [rcu_sched]
root 18 2 0 07:51 ? 00:00:10 [rcuos/0]
root 19 2 0 07:51 ? 00:00:09 [rcuos/1]
root 20 2 0 07:51 ? 00:00:09 [rcuos/2]
root 21 2 0 07:51 ? 00:00:09 [rcuos/3]
root 22 2 0 07:51 ? 00:00:09 [rcuos/4]
root 23 2 0 07:51 ? 00:00:10 [rcuos/5]
root 24 2 0 07:51 ? 00:00:10 [rcuos/6]
root 25 2 0 07:51 ? 00:00:10 [rcuos/7]
root 26 2 0 07:51 ? 00:00:35 [watchdog/0]
root 27 2 0 07:51 ? 00:00:29 [watchdog/1]
root 28 2 0 07:51 ? 00:03:12 [migration/1]
root 29 2 0 07:51 ? 00:02:02 [ksoftirqd/1]
root 31 2 0 07:51 ? 00:00:00 [kworker/1:0H]
root 32 2 0 07:51 ? 00:00:33 [watchdog/2]
root 33 2 0 07:51 ? 00:02:37 [migration/2]
root 34 2 0 07:51 ? 00:01:35 [ksoftirqd/2]
root 37 2 0 07:51 ? 00:00:33 [watchdog/3]
root 38 2 0 07:51 ? 00:02:44 [migration/3]
root 39 2 0 07:51 ? 00:01:39 [ksoftirqd/3]
root 41 2 0 07:51 ? 00:00:00 [kworker/3:0H]
root 42 2 0 07:51 ? 00:00:32 [watchdog/4]
root 43 2 0 07:51 ? 00:02:32 [migration/4]
root 44 2 0 07:51 ? 00:01:33 [ksoftirqd/4]
root 46 2 0 07:51 ? 00:00:00 [kworker/4:0H]
root 47 2 0 07:51 ? 00:00:34 [watchdog/5]
root 48 2 0 07:51 ? 00:02:37 [migration/5]
root 49 2 0 07:51 ? 00:01:54 [ksoftirqd/5]
root 51 2 0 07:51 ? 00:00:00 [kworker/5:0H]
root 52 2 0 07:51 ? 00:00:34 [watchdog/6]
root 53 2 0 07:51 ? 00:02:31 [migration/6]
root 54 2 0 07:51 ? 00:01:39 [ksoftirqd/6]
root 57 2 0 07:51 ? 00:00:33 [watchdog/7]
root 58 2 0 07:51 ? 00:02:34 [migration/7]
root 59 2 0 07:51 ? 00:01:36 [ksoftirqd/7]
root 61 2 0 07:51 ? 00:00:00 [kworker/7:0H]
root 62 2 0 07:51 ? 00:00:00 [khelper]
root 63 2 0 07:51 ? 00:00:00 [kdevtmpfs]
root 64 2 0 07:51 ? 00:00:00 [netns]
root 65 2 0 07:51 ? 00:00:00 [perf]
root 66 2 0 07:51 ? 00:00:00 [writeback]
root 67 2 0 07:51 ? 00:00:00 [kintegrityd]
root 68 2 0 07:51 ? 00:00:00 [bioset]
root 69 2 0 07:51 ? 00:00:00 [kblockd]
root 70 2 0 07:51 ? 00:00:00 [md]
root 75 2 0 07:51 ? 00:00:00 [khungtaskd]
root 76 2 0 07:51 ? 00:00:00 [kswapd0]
root 77 2 0 07:51 ? 00:00:00 [ksmd]
root 78 2 0 07:51 ? 00:00:15 [khugepaged]
root 79 2 0 07:51 ? 00:00:00 [fsnotify_mark]
root 80 2 0 07:51 ? 00:00:00 [crypto]
root 88 2 0 07:51 ? 00:00:00 [kthrotld]
root 90 2 0 07:51 ? 00:00:00 [kmpath_rdacd]
root 91 2 0 07:51 ? 00:00:00 [kpsmoused]
root 93 2 0 07:51 ? 00:00:00 [ipv6_addrconf]
root 112 2 0 07:51 ? 00:00:00 [deferwq]
root 147 2 0 07:51 ? 00:00:00 [kauditd]
root 319 2 0 07:51 ? 00:00:00 [scsi_eh_0]
root 320 2 0 07:51 ? 00:00:00 [ata_sff]
root 321 2 0 07:51 ? 00:00:00 [scsi_tmf_0]
root 322 2 0 07:51 ? 00:00:00 [vmw_pvscsi_wq_0]
root 333 2 0 07:51 ? 00:00:00 [scsi_eh_1]
root 335 2 0 07:51 ? 00:00:00 [events_power_ef]
root 336 2 0 07:51 ? 00:00:00 [scsi_tmf_1]
root 338 2 0 07:51 ? 00:00:00 [scsi_eh_2]
root 339 2 0 07:51 ? 00:00:00 [scsi_tmf_2]
root 344 2 0 07:51 ? 00:00:00 [ttm_swap]
root 377 2 0 07:51 ? 00:00:10 [kworker/5:1H]
root 438 2 0 07:51 ? 00:00:00 [kdmflush]
root 439 2 0 07:51 ? 00:00:00 [bioset]
root 450 2 0 07:51 ? 00:00:00 [kdmflush]
root 451 2 0 07:51 ? 00:00:00 [bioset]
root 464 2 0 07:51 ? 00:00:00 [xfsalloc]
root 465 2 0 07:51 ? 00:00:00 [xfs_mru_cache]
root 466 2 0 07:51 ? 00:00:00 [xfs-buf/dm-0]
root 467 2 0 07:51 ? 00:00:00 [xfs-data/dm-0]
root 468 2 0 07:51 ? 00:00:00 [xfs-conv/dm-0]
root 469 2 0 07:51 ? 00:00:00 [xfs-cil/dm-0]
root 470 2 4 07:51 ? 00:17:50 [xfsaild/dm-0]
root 471 2 0 07:51 ? 00:00:06 [kworker/3:1H]
root 546 1 0 07:51 ? 00:00:07 /usr/lib/systemd/systemd-journald
root 562 1 0 07:51 ? 00:00:00 /usr/sbin/lvmetad -f
root 570 1 0 07:51 ? 00:00:00 /usr/lib/systemd/systemd-udevd
root 624 2 0 07:51 ? 00:00:11 [kworker/0:1H]
root 668 2 0 07:51 ? 00:00:00 [xfs-buf/sda2]
root 669 2 0 07:51 ? 00:00:00 [xfs-data/sda2]
root 670 2 0 07:51 ? 00:00:00 [xfs-conv/sda2]
root 671 2 0 07:51 ? 00:00:00 [xfs-cil/sda2]
root 672 2 0 07:51 ? 00:00:00 [xfsaild/sda2]
root 673 2 0 07:51 ? 00:00:00 [kdmflush]
root 674 2 0 07:51 ? 00:00:00 [bioset]
root 685 2 0 07:51 ? 00:00:00 [xfs-buf/dm-2]
root 686 2 0 07:51 ? 00:00:00 [xfs-data/dm-2]
root 687 2 0 07:51 ? 00:00:00 [xfs-conv/dm-2]
root 688 2 0 07:51 ? 00:00:00 [xfs-cil/dm-2]
root 689 2 0 07:51 ? 00:00:11 [xfsaild/dm-2]
root 700 1 0 07:51 ? 00:01:03 /sbin/auditd -n
root 724 1 0 07:51 ? 00:00:07 /usr/lib/systemd/systemd-logind
root 728 1 0 07:51 ? 00:00:03 /usr/bin/python -Es /usr/sbin/firewalld --nofork --nopid
root 729 1 0 07:51 ? 00:00:03 /usr/sbin/irqbalance --foreground
root 731 1 0 07:51 ? 00:00:22 /usr/bin/vmtoolsd
dbus 732 1 0 07:51 ? 00:00:17 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-a
chrony 734 1 0 07:51 ? 00:00:00 /usr/sbin/chronyd
root 739 1 0 07:51 ? 00:00:01 /usr/sbin/rsyslogd -n
root 745 1 0 07:51 ? 00:00:01 /usr/sbin/crond -n
root 751 1 0 07:51 tty1 00:00:00 /sbin/agetty --noclear tty1 linux
root 768 2 0 07:51 ? 00:00:05 [kworker/7:1H]
root 771 2 0 14:35 ? 00:00:00 [kworker/4:2]
root 811 1 0 07:51 ? 00:00:06 /usr/sbin/NetworkManager --no-daemon
root 1138 1 0 07:51 ? 00:00:00 /usr/sbin/wpa_supplicant -u -f /var/log/wpa_supplicant.log -c /etc/wpa_suppli
polkitd 1139 1 0 07:51 ? 00:00:04 /usr/lib/polkit-1/polkitd --no-debug
root 1418 1 0 07:51 ? 00:00:00 /usr/sbin/sshd -D
root 1425 1 0 07:51 ? 00:00:17 /usr/sbin/httpd -DFOREGROUND
root 1426 1 0 07:51 ? 00:00:05 /usr/bin/python -Es /usr/sbin/tuned -l -P
root 1429 1 0 07:51 ? 00:00:00 /usr/sbin/xinetd -stayalive -pidfile /var/run/xinetd.pid
nagios 1431 1 0 07:51 ? 00:00:03 /usr/local/nagios/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg
mysql 1474 1 0 07:51 ? 00:00:00 /bin/sh /usr/bin/mysqld_safe --basedir=/usr
nagios 1494 1 0 07:51 ? 00:00:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
mysql 1748 1474 7 07:51 ? 00:28:37 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr
ajaxterm 1797 1 0 07:51 ? 00:00:10 python /usr/share/ajaxterm/ajaxterm.py --daemon --port=8022 --uid=ajaxterm
root 1800 1 0 07:51 ? 00:00:00 /usr/libexec/postfix/master -w
postfix 1803 1800 0 07:51 ? 00:00:00 qmgr -l -t unix -u
root 1912 2 0 14:19 ? 00:00:00 [kworker/7:1]
nagios 1915 1 5 07:52 ? 00:20:40 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 1917 1915 0 07:52 ? 00:00:21 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 1919 1915 0 07:52 ? 00:00:20 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 1920 1915 0 07:52 ? 00:00:17 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 1921 1915 0 07:52 ? 00:00:19 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 1922 1915 0 07:52 ? 00:00:19 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 1923 1915 0 07:52 ? 00:00:21 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 1924 1915 0 07:52 ? 00:00:16 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 1925 1915 0 07:52 ? 00:00:20 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 1926 1915 0 07:52 ? 00:00:18 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 1927 1915 0 07:52 ? 00:00:19 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 1928 1915 0 07:52 ? 00:00:19 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 1929 1915 0 07:52 ? 00:00:21 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 1942 1494 0 07:52 ? 00:00:17 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios 1943 1942 2 07:52 ? 00:09:43 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
root 2004 1418 0 07:52 ? 00:00:08 sshd: root@pts/0
apache 2017 1425 0 14:35 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND
nagios 2023 1915 0 07:52 ? 00:00:03 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 2116 2 0 14:20 ? 00:00:00 [kworker/3:0]
root 2177 2 0 07:52 ? 00:00:08 [kworker/4:1H]
root 2182 2 0 07:52 ? 00:00:06 [kworker/1:1H]
root 2184 2004 0 07:52 pts/0 00:00:00 -bash
root 2841 745 0 14:36 ? 00:00:00 /usr/sbin/CROND -n
root 2842 745 0 14:36 ? 00:00:00 /usr/sbin/CROND -n
root 2843 745 0 14:36 ? 00:00:00 /usr/sbin/CROND -n
root 2844 745 0 14:36 ? 00:00:00 /usr/sbin/CROND -n
root 2845 745 0 14:36 ? 00:00:00 /usr/sbin/CROND -n
root 2846 745 0 14:36 ? 00:00:00 /usr/sbin/CROND -n
nagios 2847 2841 0 14:36 ? 00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php > /usr/l
nagios 2850 2843 0 14:36 ? 00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/event_handler.php > /usr/
nagios 2853 2850 0 14:36 ? 00:00:00 /usr/bin/php -q /usr/local/nagiosxi/cron/event_handler.php
nagios 2856 2845 0 14:36 ? 00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php > /usr/loca
nagios 2857 2846 0 14:36 ? 00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php > /usr/local/
nagios 2859 2857 4 14:36 ? 00:00:00 /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php
nagios 2860 2844 0 14:36 ? 00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php > /usr/local
nagios 2861 2860 0 14:36 ? 00:00:00 /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php
nagios 2862 2842 0 14:36 ? 00:00:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php > /usr/local
nagios 2865 2862 0 14:36 ? 00:00:00 /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php
nagios 2867 2847 0 14:36 ? 00:00:00 /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php
nagios 2869 2856 0 14:36 ? 00:00:00 /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php
apache 3266 1425 0 14:36 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND
apache 3267 1425 0 14:36 ? 00:00:00 /usr/sbin/httpd -DFOREGROUND
nagios 3388 1926 11 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_uptime.pl --perfparse -c --h
nagios 3469 1920 12 14:36 ? 00:00:00 [check_snmp_proc] <defunct>
nagios 3472 1920 9 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3517 1920 4 14:36 ? 00:00:00 [check_snmp_proc] <defunct>
nagios 3521 1926 4 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3522 1923 4 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3523 1921 17 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3524 1921 1 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3526 1928 2 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3527 1928 10 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3529 1921 4 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3530 1928 2 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_uptime.pl --perfparse -c --h
nagios 3533 1921 4 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3534 1928 5 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3535 1928 6 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3537 1921 2 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3542 1925 5 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3543 1925 1 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_load_wizard.pl -H den-l
nagios 3544 1925 2 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3545 1925 5 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3550 1924 5 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3551 1924 1 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3558 1927 5 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3559 1927 4 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3560 1927 1 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_uptime.pl --perfparse -c --h
nagios 3561 1927 1 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3563 1927 4 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3564 1927 1 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_load_wizard.pl -H den-l
nagios 3571 1923 1 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3577 1926 2 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3578 1926 5 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3580 1926 1 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3587 1929 3 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3588 1929 3 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3589 1929 7 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3590 1929 6 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3592 1929 3 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3593 1929 8 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3596 1922 7 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3597 1922 3 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3598 1922 7 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3599 1922 3 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3600 1922 7 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3601 1922 7 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3602 1922 3 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3608 1920 3 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3609 1920 1 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3610 1920 4 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3611 1920 6 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3612 1920 7 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3613 1920 6 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3614 1920 2 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3615 1920 0 14:36 ? 00:00:00 [check_icmp] <defunct>
nagios 3617 1920 0 14:36 ? 00:00:00 [check_icmp] <defunct>
nagios 3618 1919 2 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3619 1919 3 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3620 1919 3 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_load_wizard.pl -H den-l
nagios 3621 1919 9 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3622 1919 7 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3623 1919 2 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3624 1919 6 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3628 1917 2 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3629 1917 2 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3630 1917 3 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3631 1917 4 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3633 1917 3 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3634 1917 3 14:36 ? 00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_process_wizard.pl -H de
nagios 3656 1920 0 14:36 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 3661 1926 0 14:36 ? 00:00:00 /usr/local/nagios/libexec/check_icmp -H den-ltr-nukusa-4d57815a27a3.unitrends
root 3662 2184 0 14:36 pts/0 00:00:00 ps -aef
nagios 3663 1917 0 14:36 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
root 3760 2 0 14:21 ? 00:00:01 [kworker/2:2]
root 3762 2 0 14:21 ? 00:00:01 [kworker/6:0H]
root 7427 2 34 12:14 ? 00:48:46 [kworker/u16:0]
root 8035 2 0 14:06 ? 00:00:10 [kworker/1:0]
root 10930 2 0 13:53 ? 00:00:07 [kworker/4:0]
root 11174 2 0 13:53 ? 00:00:11 [kworker/5:2]
root 11509 2 0 09:01 ? 00:00:49 [kworker/2:2H]
apache 11613 1425 2 14:24 ? 00:00:19 /usr/sbin/httpd -DFOREGROUND
root 12507 2 0 14:25 ? 00:00:00 [kworker/0:3]
root 12513 2 0 14:25 ? 00:00:00 [kworker/1:1]
root 14124 2 0 12:02 ? 00:00:00 [kworker/2:0H]
root 15140 2 0 14:26 ? 00:00:00 [kworker/6:0]
root 15448 2 0 14:26 ? 00:00:00 [kworker/2:0]
root 17560 2 0 14:27 ? 00:00:01 [kworker/6:2H]
root 19099 2 0 14:28 ? 00:00:00 [kworker/5:0]
root 19162 2 0 14:28 ? 00:00:03 [kworker/7:2]
root 21099 2 0 13:10 ? 00:00:00 [kworker/u16:1]
root 21195 2 0 14:29 ? 00:00:00 [kworker/4:1]
root 23966 2 0 14:14 ? 00:00:07 [kworker/6:1]
root 24874 2 0 14:31 ? 00:00:02 [kworker/0:0]
root 25723 2 0 14:15 ? 00:00:00 [kworker/u16:3]
apache 26028 1425 1 14:31 ? 00:00:04 /usr/sbin/httpd -DFOREGROUND
root 26442 2 0 14:00 ? 00:00:10 [kworker/3:3]
root 26706 2 0 14:32 ? 00:00:00 [kworker/6:2]
root 26728 2 0 14:32 ? 00:00:00 [kworker/1:2]
root 26756 2 0 14:32 ? 00:00:00 [kworker/3:1]
postfix 27008 1800 0 14:32 ? 00:00:00 pickup -l -t unix -u
apache 27821 1425 1 14:32 ? 00:00:02 /usr/sbin/httpd -DFOREGROUND
root 27978 2 0 14:32 ? 00:00:00 [kworker/6:1H]
root 28037 2 0 14:16 ? 00:00:06 [kworker/2:1]
root 28172 2 0 14:32 ? 00:00:00 [kworker/2:3]
root 28369 2 0 14:33 ? 00:00:00 [kworker/5:1]
apache 29943 1425 2 14:33 ? 00:00:03 /usr/sbin/httpd -DFOREGROUND
apache 29944 1425 2 14:33 ? 00:00:04 /usr/sbin/httpd -DFOREGROUND
apache 30181 1425 1 14:33 ? 00:00:01 /usr/sbin/httpd -DFOREGROUND
root 30858 2 0 14:34 ? 00:00:00 [kworker/7:0]
root 31124 2 0 14:34 ? 00:00:00 [kworker/u16:2]
apache 31554 1425 1 14:34 ? 00:00:01 /usr/sbin/httpd -DFOREGROUND
apache 32532 1425 3 14:34 ? 00:00:03 /usr/sbin/httpd -DFOREGROUND
[root@den-nagios ~]#
These are the packages that would be updated if I let yum run. There are a couple PHP updates, so it's possible one of them is part of the problem:
- Resolving Dependencies
--> Running transaction check
---> Package epel-release.noarch 0:7-2 will be updated
---> Package epel-release.noarch 0:7-8 will be an update
---> Package php-mcrypt.x86_64 0:5.4.16-5.el7 will be updated
---> Package php-mcrypt.x86_64 0:5.4.16-7.el7 will be an update
---> Package php-mssql.x86_64 0:5.4.16-5.el7 will be updated
---> Package php-mssql.x86_64 0:5.4.16-7.el7 will be an update
---> Package python-simplejson.x86_64 0:3.3.3-1.el7 will be updated
---> Package python-simplejson.x86_64 0:3.5.3-1.el7 will be an update
--> Finished Dependency Resolution
-
dwhitfield
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: High load average with 5.3.3?
Can you PM me or another tech your profile? Admin > System Config > System Profile
After you PM the profile, please make sure you update this thread. That's the only way it will show up on our dashboard. Thanks!
UPDATE: profile received and shared with techs
After you PM the profile, please make sure you update this thread. That's the only way it will show up on our dashboard. Thanks!
UPDATE: profile received and shared with techs
-
cbeattie-unitrends
- Posts: 84
- Joined: Mon Oct 10, 2016 2:51 pm
Re: High load average with 5.3.3?
I've PMed the profile, thanks. For comparison, I've also attached current load average graphs from the Nagios host and the clone I made of its snapshot from before I installed 5.3.3. Almost 10x!
I figure I'll install the OS update packages one at a time and see if any of them cause the load average to go haywire.
I figure I'll install the OS update packages one at a time and see if any of them cause the load average to go haywire.
You do not have the required permissions to view the files attached to this post.
-
avandemore
- Posts: 1597
- Joined: Tue Sep 27, 2016 4:57 pm
Re: High load average with 5.3.3?
Did you have base commands or wizard generated commands in the pre-5.3.3 XI installs? Your info shows a lot of failures. Failures from checks are expensive in much the same way an exception is expensive in programming. Such failures often occur after an upgrade where the user has customized commands or wizards as the upgrade would revert those items to baseline. There is a warning about this prior to the upgrade being run.
Previous Nagios employee
-
cbeattie-unitrends
- Posts: 84
- Joined: Mon Oct 10, 2016 2:51 pm
Re: High load average with 5.3.3?
I don't think I understand your question correctly. Some of the service checks were initially created by running the autodiscover wizard, but I reconfigured them to be assigned by hostgroup membership instead. As I've added more service checks, I've either written them from scratch starting by adding a new command (check_snmp_uptime for example) or I've copied an existing service check (which may have used a check_xi command) and modified the copy. I did add a "--timeout=60" parameter to the existing check_xi_service_snmp_linux_process command, but that persisted through the 5.3.3 upgrade.
Can you point me at the failures you found? All the service and command objects should be the same between my 5.3.3 host and its clone still running 5.3.2, with the only other difference between the two being the OS updates listed above.
Can you point me at the failures you found? All the service and command objects should be the same between my 5.3.3 host and its clone still running 5.3.2, with the only other difference between the two being the OS updates listed above.
-
avandemore
- Posts: 1597
- Joined: Tue Sep 27, 2016 4:57 pm
Re: High load average with 5.3.3?
The ones you posted from the event log:
Code: Select all
Service Critical 11/27/2016 22:08 SERVICE ALERT: den-ltr-mrmc-2707288fd0bf;sshd;CRITICAL;SOFT;1;(Service check timed out after 60.01 seconds)
Runtime Warning 11/27/2016 22:08 Warning: Check of service 'sshd' on host 'den-ltr-mrmc-2707288fd0bf' timed out after 60.005s!
Runtime Error 11/27/2016 22:08 wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Runtime Error 11/27/2016 22:08 wproc: host=den-ltr-mrmc-2707288fd0bf; service=sshd;
Runtime Error 11/27/2016 22:08 wproc: CHECK job 464451 from worker Core Worker 20376 timed out after 60.01s
Information 11/27/2016 22:08 wproc: Core Worker 20376: job 464451 (pid=15837) timed out. Killing itPrevious Nagios employee
-
cbeattie-unitrends
- Posts: 84
- Joined: Mon Oct 10, 2016 2:51 pm
Re: High load average with 5.3.3? [SOLVED]
The culprit turned out to be php-mcrypt 5.4.16-7.el7 from CentOS 7's epel repository. After I ran 'yum downgrade php-mcrypt' and reverted to 5.4.16-4.el7 from the nagiosxi-deps repository, the CPU load average went back to normal.
After that, I also reverted php-mssql just to be on the safe side and keep its version numbers the same as php-mcrypt's.
After that, I also reverted php-mssql just to be on the safe side and keep its version numbers the same as php-mcrypt's.
-
dwhitfield
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: High load average with 5.3.3?
It looks like you marked your last post as [SOLVED]. Is it okay if we lock this thread? Thanks for choosing the Nagios forums!