Answers threaded in below...
Is the main problem that you're receiving constant email notifications, or that the load is actually high?
I don't receive constant emails but I get a text message and email almost every night around 4:00 AM alerting me that system load is high.
What is the current load in the plugin output?
This is what was in the email from overnight; - load average: 3.81, 4.04, 3.44
This is current: - load average: 0.66, 1.21, 1.34
What is the output of the top command?
Code: Select all
top - 09:33:19 up 31 days, 16:47, 2 users, load average: 2.96, 1.54, 1.44
Tasks: 179 total, 1 running, 178 sleeping, 0 stopped, 0 zombie
Cpu(s): 14.7%us, 2.4%sy, 0.0%ni, 82.5%id, 0.2%wa, 0.1%hi, 0.1%si, 0.0%st
Mem: 3923980k total, 2717952k used, 1206028k free, 135204k buffers
Swap: 2064380k total, 0k used, 2064380k free, 1858232k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7439 nagios 20 0 50672 2556 1036 S 5.0 0.1 0:11.13 ndo2db
7423 nagios 20 0 26356 8040 1312 S 4.7 0.2 0:04.60 nagios
7244 mysql 20 0 2220m 46m 6248 S 3.7 1.2 0:13.72 mysqld
509 root 20 0 0 0 0 S 1.0 0.0 248:22.60 jbd2/dm-0-8
4735 root 20 0 15292 1596 996 S 0.3 0.0 113:17.18 top
7428 nagios 20 0 10108 1020 684 S 0.3 0.0 0:00.25 nagios
1 root 20 0 19360 1248 944 S 0.0 0.0 0:14.93 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root RT 0 0 0 0 S 0.0 0.0 69:39.40 migration/0
4 root 20 0 0 0 0 S 0.0 0.0 25:55.81 ksoftirqd/0
5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 stopper/0
6 root RT 0 0 0 0 S 0.0 0.0 37:36.04 watchdog/0
7 root RT 0 0 0 0 S 0.0 0.0 19:12.37 migration/1
8 root RT 0 0 0 0 S 0.0 0.0 0:00.00 stopper/1
9 root 20 0 0 0 0 S 0.0 0.0 6:50.69 ksoftirqd/1
10 root RT 0 0 0 0 S 0.0 0.0 16:03.89 watchdog/1
11 root RT 0 0 0 0 S 0.0 0.0 46:09.38 migration/2
12 root RT 0 0 0 0 S 0.0 0.0 0:00.00 stopper/2
13 root 20 0 0 0 0 S 0.0 0.0 27:17.99 ksoftirqd/2
14 root RT 0 0 0 0 S 0.0 0.0 35:07.52 watchdog/2
15 root RT 0 0 0 0 S 0.0 0.0 19:40.30 migration/3
16 root RT 0 0 0 0 S 0.0 0.0 0:00.00 stopper/3
17 root 20 0 0 0 0 S 0.0 0.0 6:03.08 ksoftirqd/3
18 root RT 0 0 0 0 S 0.0 0.0 14:58.51 watchdog/3
19 root 20 0 0 0 0 S 0.0 0.0 209:38.93 events/0
20 root 20 0 0 0 0 S 0.0 0.0 56:13.19 events/1
21 root 20 0 0 0 0 S 0.0 0.0 86:45.62 events/2
22 root 20 0 0 0 0 S 0.0 0.0 85:00.76 events/3
23 root 20 0 0 0 0 S 0.0 0.0 0:00.00 events/0
24 root 20 0 0 0 0 S 0.0 0.0 0:00.00 events/1
25 root 20 0 0 0 0 S 0.0 0.0 0:00.00 events/2
26 root 20 0 0 0 0 S 0.0 0.0 0:00.00 events/3
27 root 20 0 0 0 0 S 0.0 0.0 0:00.00 events_long/0
28 root 20 0 0 0 0 S 0.0 0.0 0:00.00 events_long/1
29 root 20 0 0 0 0 S 0.0 0.0 0:00.00 events_long/2
30 root 20 0 0 0 0 S 0.0 0.0 0:00.00 events_long/3
31 root 20 0 0 0 0 S 0.0 0.0 0:00.00 events_power_ef
32 root 20 0 0 0 0 S 0.0 0.0 0:00.00 events_power_ef
33 root 20 0 0 0 0 S 0.0 0.0 0:00.00 events_power_ef
34 root 20 0 0 0 0 S 0.0 0.0 0:00.00 events_power_ef
35 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cgroup
36 root 20 0 0 0 0 S 0.0 0.0 0:00.00 khelper
37 root 20 0 0 0 0 S 0.0 0.0 0:00.00 netns
38 root 20 0 0 0 0 S 0.0 0.0 0:00.00 async/mgr
39 root 20 0 0 0 0 S 0.0 0.0 0:00.00 pm
40 root 20 0 0 0 0 S 0.0 0.0 4:04.84 sync_supers
41 root 20 0 0 0 0 S 0.0 0.0 4:12.17 bdi-default
42 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kintegrityd/0
43 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kintegrityd/1
44 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kintegrityd/2
45 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kintegrityd/3
46 root 20 0 0 0 0 S 0.0 0.0 100:51.31 kblockd/0
47 root 20 0 0 0 0 S 0.0 0.0 69:40.04 kblockd/1
48 root 20 0 0 0 0 S 0.0 0.0 107:04.87 kblockd/2
49 root 20 0 0 0 0 S 0.0 0.0 67:08.68 kblockd/3
50 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kacpid
51 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kacpi_notify
52 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kacpi_hotplug
53 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ata_aux
54 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ata_sff/0
55 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ata_sff/1
56 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ata_sff/2
57 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ata_sff/3
How many hosts and services are you monitoring with this XI server?
158 Hosts
1716 Services
What is the alert threshold for the load check?
Since I am not a Linux expert, I left it at the default. I assumed it would be correct.