Seriously High Load Average
Posted: Tue Jan 22, 2013 5:16 am
Hi,
We're seeing seriously high load average on our Nagios server, using Nagios XI.
We are in a program of migration of service checks from our old Nagios server to a new one, but we're miles away from having all our service checks ported over to the new system and our load seems to be hugely high on the new system (right now it's load average: 34.89, 21.68, 20.98). That's bad, but this is worse: -
[1358840273] SERVICE ALERT: nagios;Current Load;CRITICAL;HARD;4;CRITICAL - load average: 78.42, 33.55, 23.60
I wondered if you might have any clues or advice for us please? I've tried to capture a few occasions where the system looks busy, in "top" and included the output here: -
Old System Specs
We're seeing seriously high load average on our Nagios server, using Nagios XI.
We are in a program of migration of service checks from our old Nagios server to a new one, but we're miles away from having all our service checks ported over to the new system and our load seems to be hugely high on the new system (right now it's load average: 34.89, 21.68, 20.98). That's bad, but this is worse: -
[1358840273] SERVICE ALERT: nagios;Current Load;CRITICAL;HARD;4;CRITICAL - load average: 78.42, 33.55, 23.60
I wondered if you might have any clues or advice for us please? I've tried to capture a few occasions where the system looks busy, in "top" and included the output here: -
Code: Select all
[root@Nagios admin]# top -M
top - 10:03:04 up 17 days, 9:23, 3 users, load average: 33.77, 19.58, 18.93
Tasks: 299 total, 2 running, 287 sleeping, 1 stopped, 9 zombie
Cpu0 : 64.8%us, 8.1%sy, 0.0%ni, 26.4%id, 0.7%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 62.3%us, 10.8%sy, 0.0%ni, 26.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 91.6%us, 7.8%sy, 0.0%ni, 0.0%id, 0.3%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu3 : 61.6%us, 11.7%sy, 0.0%ni, 26.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 15.571G total, 14.520G used, 1077.016M free, 129.570M buffers
Swap: 31.248G total, 24.719M used, 31.224G free, 7359.934M cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
18370 apache 20 0 359m 39m 7772 R 30.2 0.2 0:04.94 httpd
15410 mysql 20 0 2201m 60m 6140 S 4.4 0.4 7:52.21 mysqld
3276 nagios 20 0 221m 23m 9280 S 2.5 0.1 0:00.08 php
3288 nagios 20 0 214m 21m 7748 S 2.2 0.1 0:00.07 php
720 root 20 0 0 0 0 S 1.9 0.0 0:50.92 kswapd0
3275 nagios 20 0 214m 20m 7096 S 1.9 0.1 0:00.06 php
3282 nagios 20 0 215m 22m 7620 S 1.9 0.1 0:00.06 php
3284 nagios 20 0 215m 22m 7628 S 1.9 0.1 0:00.06 php
3285 nagios 20 0 214m 21m 7588 S 1.9 0.1 0:00.06 php
3289 nagios 20 0 214m 21m 7612 S 1.9 0.1 0:00.06 php
913 apache 20 0 349m 32m 7392 S 1.6 0.2 0:00.24 httpd
1025 apache 20 0 342m 28m 4268 S 1.6 0.2 0:00.16 httpd
1050 apache 20 0 335m 21m 4480 S 1.6 0.1 0:00.24 httpd
1206 nagios 20 0 50664 1968 940 S 1.2 0.0 0:36.97 ndo2db
2562 root 20 0 192m 1844 852 S 1.2 0.0 14:57.38 snmpd
3 root 20 0 0 0 0 S 0.9 0.0 1:07.44 ksoftirqd/0
1209 nagios 20 0 27340 4288 988 S 0.9 0.0 2:22.62 nagios
2510 named 20 0 369m 30m 2536 S 0.6 0.2 27:16.69 named
18541 apache 20 0 350m 33m 7668 S 0.6 0.2 0:02.22 httpd
13 root 20 0 0 0 0 S 0.3 0.0 1:11.91 ksoftirqd/2
16 root 20 0 0 0 0 S 0.3 0.0 1:07.64 ksoftirqd/3
Code: Select all
[root@Nagios admin]# top -M
top - 10:05:01 up 17 days, 9:24, 3 users, load average: 12.39, 17.60, 18.40
Tasks: 214 total, 2 running, 211 sleeping, 1 stopped, 0 zombie
Cpu0 : 5.6%us, 2.3%sy, 0.0%ni, 91.4%id, 0.7%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 42.7%us, 0.7%sy, 0.0%ni, 56.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 6.4%us, 0.7%sy, 0.0%ni, 93.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 7.3%us, 1.3%sy, 0.0%ni, 88.0%id, 3.3%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 15.571G total, 14.038G used, 1570.555M free, 129.672M buffers
Swap: 31.248G total, 24.715M used, 31.224G free, 7360.020M cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
925 apache 20 0 388m 71m 7600 R 41.9 0.4 0:02.26 httpd
15410 mysql 20 0 2201m 60m 6140 S 5.0 0.4 7:53.51 mysqld
1054 apache 20 0 350m 32m 7568 S 2.7 0.2 0:01.05 httpd
18366 apache 20 0 350m 33m 7812 S 2.7 0.2 0:04.41 httpd
955 apache 20 0 350m 33m 7592 S 2.3 0.2 0:01.01 httpd
1019 apache 20 0 349m 32m 7712 S 2.3 0.2 0:01.07 httpd
1052 apache 20 0 349m 32m 7684 S 2.3 0.2 0:01.07 httpd
1053 apache 20 0 349m 32m 7596 S 2.3 0.2 0:01.15 httpd
2066 apache 20 0 350m 32m 7560 S 2.3 0.2 0:02.32 httpd
1209 nagios 20 0 27340 4284 988 S 1.0 0.0 2:23.14 nagios
1206 nagios 20 0 50664 1968 940 S 0.7 0.0 0:37.13 ndo2db
926 postgres 20 0 210m 5664 4016 S 0.3 0.0 0:00.03 postmaster
1949 postgres 20 0 210m 5656 4012 S 0.3 0.0 0:00.03 postmaster
2002 postgres 20 0 210m 5636 3992 S 0.3 0.0 0:00.03 postmaster
13614 root 20 0 15980 2232 996 S 0.3 0.0 1:39.44 top
18367 postgres 20 0 210m 5684 4040 S 0.3 0.0 0:00.11 postmaster
1 root 20 0 19272 1088 896 S 0.0 0.0 0:01.46 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 1:07.45 ksoftirqd/0
5 root 20 0 0 0 0 S 0.0 0.0 0:01.32 kworker/u:0
Code: Select all
[root@Nagios admin]# top -M
top - 10:08:11 up 17 days, 9:28, 3 users, load average: 20.82, 15.36, 17.14
Tasks: 230 total, 1 running, 228 sleeping, 1 stopped, 0 zombie
Cpu0 : 3.0%us, 0.3%sy, 0.0%ni, 94.6%id, 2.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 11.0%us, 0.7%sy, 0.0%ni, 85.0%id, 3.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu2 : 6.3%us, 0.7%sy, 0.0%ni, 93.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 12.9%us, 0.7%sy, 0.0%ni, 86.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 15.571G total, 13.970G used, 1639.883M free, 129.621M buffers
Swap: 31.248G total, 25.055M used, 31.224G free, 7332.289M cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1052 apache 20 0 349m 32m 7780 S 2.7 0.2 0:02.46 httpd
2066 apache 20 0 350m 32m 7600 S 2.7 0.2 0:03.88 httpd
14570 apache 20 0 349m 31m 7572 S 2.7 0.2 0:00.81 httpd
14603 apache 20 0 350m 33m 7692 S 2.7 0.2 0:01.04 httpd
14615 apache 20 0 350m 32m 7668 S 2.7 0.2 0:00.73 httpd
32421 apache 20 0 349m 32m 7620 S 2.7 0.2 0:02.35 httpd
32428 apache 20 0 350m 33m 7640 S 2.7 0.2 0:03.94 httpd
1019 apache 20 0 350m 33m 7744 S 2.3 0.2 0:02.26 httpd
1053 apache 20 0 349m 32m 7768 S 2.3 0.2 0:02.43 httpd
3681 apache 20 0 352m 35m 7864 S 2.3 0.2 0:10.23 httpd
14596 apache 20 0 350m 32m 7556 S 2.3 0.2 0:00.93 httpd
14597 apache 20 0 350m 32m 7556 S 2.3 0.2 0:00.70 httpd
14602 apache 20 0 350m 32m 7588 S 2.0 0.2 0:02.24 httpd
15410 mysql 20 0 2201m 60m 6140 S 1.0 0.4 7:55.23 mysqld
554 postgres 20 0 210m 5664 4024 S 0.3 0.0 0:00.09 postmaster
2003 postgres 20 0 210m 5676 4028 S 0.3 0.0 0:00.09 postmaster
2752 postgres 20 0 208m 4592 4356 S 0.3 0.0 1:20.68 postmaster
14887 postgres 20 0 210m 5656 4008 S 0.3 0.0 0:00.02 postmaster
1 root 20 0 19272 1088 896 S 0.0 0.0 0:01.46 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
- Onsite Dell PowerEdge (model unknown)
Dual Core i386 CPU
2GB RAM
Hardware RAID1
FreeBSD 4.8 RELEASE (i386)
Nagios Core 1.2
- Remote Dedicated Server
Modern Quad Core x86_64 CPU
16GB RAM
Software RAID1
CentOS 6.3 x86_64
Nagios Core 3.4.1
Nagios XI 2012R1.4