high current load alerts after upgrading to 2012R2.2

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
scobi
Posts: 67
Joined: Tue Feb 28, 2012 8:36 pm

high current load alerts after upgrading to 2012R2.2

Post by scobi »

Greetings! Since upgrading to 2012R2.2 on Friday, we're seeing high "Current load" alerts from Nagios.

Not sure where to stat in troubleshooting this one...

Thanks in advance for the support!

Scobi
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: high current load alerts after upgrading to 2012R2.2

Post by sreinhardt »

This is the nagios server saying high load for localhost?
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
scobi
Posts: 67
Joined: Tue Feb 28, 2012 8:36 pm

Re: high current load alerts after upgrading to 2012R2.2

Post by scobi »

Yuppers! Sorry for being so vague in the original post. :-)

***** Nagios XI Alert *****

Nagios has detected a problem with this service.

Notification Type: PROBLEM

Service: Current Load
Host: localhost
Address: 127.0.0.1
State: WARNING
Info:
WARNING - load average: 4.41, 4.31, 3.44
Date/Time: 2013-07-29 08:49:09
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: high current load alerts after upgrading to 2012R2.2

Post by sreinhardt »

Well, a load of 4 would not be terrible if you have it setup with quad cores and as we normally setup localhost checks for single core scenarios I could see this alerting even if nagios is working properly. Let's check a few things and see what comes out.

Code: Select all

# shows number of processes\threads with process name
ps aux | awk '{print $11}' | sort | uniq -c | sort -nk1 | tail -n5
# shows number of processes per user running them
ps aux | awk '{print $1}' | sort | uniq -c | sort -nk1 | tail -n5

top -b -n 1
df -h
df -i
Please post the results of all of the commands.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
scobi
Posts: 67
Joined: Tue Feb 28, 2012 8:36 pm

Re: high current load alerts after upgrading to 2012R2.2

Post by scobi »

First command:

5 /usr/bin/php
6 /bin/sh
6 /sbin/mingetty
21 /usr/sbin/httpd
30 postgres:

Second command:

5 ajaxterm
17 nagios
20 apache
31 postgres
88 root

Third command:

top - 13:49:33 up 135 days, 15:10, 1 user, load average: 0.34, 0.68, 0.96
Tasks: 143 total, 1 running, 142 sleeping, 0 stopped, 0 zombie
Cpu(s): 16.0%us, 2.1%sy, 0.0%ni, 80.9%id, 0.6%wa, 0.1%hi, 0.2%si, 0.0%st
Mem: 769120k total, 636456k used, 132664k free, 8500k buffers
Swap: 262136k total, 62396k used, 199740k free, 65824k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 2880 620 460 S 0.0 0.1 0:05.28 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
4 root 20 0 0 0 0 S 0.0 0.0 0:43.86 ksoftirqd/0
5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
6 root RT 0 0 0 0 S 0.0 0.0 0:00.70 watchdog/0
7 root 20 0 0 0 0 S 0.0 0.0 0:00.95 events/0
8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuset
9 root 20 0 0 0 0 S 0.0 0.0 0:00.00 khelper
10 root 20 0 0 0 0 S 0.0 0.0 0:00.00 netns
11 root 20 0 0 0 0 S 0.0 0.0 0:00.00 async/mgr
12 root 20 0 0 0 0 S 0.0 0.0 0:00.00 pm
13 root 20 0 0 0 0 S 0.0 0.0 0:00.01 sync_supers
14 root 20 0 0 0 0 S 0.0 0.0 0:00.09 bdi-default
15 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kintegrityd/0
16 root 20 0 0 0 0 S 0.0 0.0 8:29.17 kblockd/0
17 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kacpid
18 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kacpi_notify
19 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kacpi_hotplug
20 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ata/0
21 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ata_aux
22 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksuspend_usbd
23 root 20 0 0 0 0 S 0.0 0.0 0:00.00 khubd
24 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kseriod
25 root 20 0 0 0 0 S 0.0 0.0 0:00.00 md/0
26 root 20 0 0 0 0 S 0.0 0.0 0:00.00 md_misc/0
27 root 20 0 0 0 0 S 0.0 0.0 0:00.00 khungtaskd
28 root 20 0 0 0 0 S 0.0 0.0 9:51.45 kswapd0
29 root 25 5 0 0 0 S 0.0 0.0 0:00.00 ksmd
30 root 20 0 0 0 0 S 0.0 0.0 0:00.00 aio/0
31 root 20 0 0 0 0 S 0.0 0.0 0:00.00 crypto/0
36 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthrotld/0
37 root 20 0 0 0 0 S 0.0 0.0 0:00.00 pciehpd
39 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kpsmoused
40 root 20 0 0 0 0 S 0.0 0.0 0:00.00 usbhid_resumer
70 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kstriped
187 root 20 0 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_0
190 root 20 0 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_1
199 root 20 0 0 0 0 S 0.0 0.0 0:00.28 mpt_poll_0
200 root 20 0 0 0 0 S 0.0 0.0 0:00.00 mpt/0
201 root 20 0 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_2
309 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kdmflush
313 root 20 0 0 0 0 S 0.0 0.0 0:34.15 kdmflush
328 root 20 0 0 0 0 S 0.0 0.0 17:40.27 jbd2/dm-1-8
329 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ext4-dio-unwrit
392 root 16 -4 2476 8 4 S 0.0 0.0 0:00.02 udevd
572 root 20 0 0 0 0 S 0.0 0.0 0:00.55 vmmemctl
701 root 20 0 0 0 0 S 0.0 0.0 49:48.03 flush-253:1
798 root 20 0 0 0 0 S 0.0 0.0 0:35.47 kauditd
996 apache 20 0 70832 34m 4880 S 0.0 4.6 12:49.13 httpd
1180 root 20 0 5816 1228 988 S 0.0 0.2 25:41.53 vmtoolsd
1262 root 16 -4 13968 436 292 S 0.0 0.1 3:33.20 auditd
1278 root 20 0 36448 660 476 S 0.0 0.1 1:13.13 rsyslogd
1290 dbus 20 0 2972 180 72 S 0.0 0.0 0:00.00 dbus-daemon
1301 avahi 20 0 3108 212 140 S 0.0 0.0 0:00.20 avahi-daemon
1302 avahi 20 0 3108 8 4 S 0.0 0.0 0:00.00 avahi-daemon
1357 root 20 0 16668 396 252 S 0.0 0.1 1:08.48 snmptrapd
1366 root 20 0 15460 196 92 S 0.0 0.0 0:08.62 snmptt
1369 root 20 0 15516 580 308 S 0.0 0.1 2:58.26 snmptt
1378 root 20 0 9324 304 212 S 0.0 0.0 0:12.39 sshd
1386 root 20 0 3180 8 4 S 0.0 0.0 0:00.01 xinetd
1422 root 20 0 5116 8 4 S 0.0 0.0 0:00.00 mysqld_safe
1512 mysql 20 0 159m 22m 2468 S 0.0 3.0 1241:54 mysqld
1550 postgres 20 0 49260 340 212 S 0.0 0.0 9:41.83 postmaster
1629 root 20 0 12424 248 164 S 0.0 0.0 0:41.31 master
1636 postfix 20 0 12568 596 496 S 0.0 0.1 0:29.80 qmgr
1639 postgres 20 0 17432 96 56 S 0.0 0.0 0:04.56 postmaster
1653 root 20 0 5932 320 236 S 0.0 0.0 3:55.66 crond
1664 nagios 20 0 35076 588 456 S 0.0 0.1 4:58.40 npcd
1682 ajaxterm 20 0 22996 4044 1328 S 0.0 0.5 16:14.81 python
1713 postgres 20 0 50276 3220 2508 S 0.0 0.4 0:11.79 postmaster
1734 root 20 0 2004 8 4 S 0.0 0.0 0:00.00 mingetty
1736 root 20 0 2004 8 4 S 0.0 0.0 0:00.00 mingetty
1738 root 20 0 2004 8 4 S 0.0 0.0 0:00.00 mingetty
1741 root 18 -2 2472 8 4 S 0.0 0.0 0:00.00 udevd
1742 root 18 -2 2472 8 4 S 0.0 0.0 0:00.00 udevd
1743 root 20 0 2004 8 4 S 0.0 0.0 0:00.00 mingetty
1745 root 20 0 2004 8 4 S 0.0 0.0 0:00.00 mingetty
3148 apache 20 0 70832 37m 4920 S 0.0 5.0 0:30.74 httpd
3342 postgres 20 0 50276 4148 3376 S 0.0 0.5 0:00.39 postmaster
6928 apache 20 0 70836 36m 4676 S 0.0 4.9 13:31.73 httpd
7070 apache 20 0 70832 37m 4852 S 0.0 5.0 9:43.93 httpd
7254 postgres 20 0 50304 3220 2492 S 0.0 0.4 0:13.01 postmaster
7753 postgres 20 0 50304 3144 2416 S 0.0 0.4 0:08.52 postmaster
9649 nagios 20 0 8112 8 4 S 0.0 0.0 0:00.00 ndo2db
10545 root 20 0 37364 540 104 S 0.0 0.1 0:00.46 httpd
12551 apache 20 0 70832 35m 4916 S 0.0 4.7 0:23.68 httpd
12787 postgres 20 0 50276 4136 3372 S 0.0 0.5 0:00.35 postmaster
15304 apache 20 0 70832 37m 4944 S 0.0 5.0 0:38.42 httpd
15305 postgres 20 0 50276 4148 3396 S 0.0 0.5 0:00.46 postmaster
15308 apache 20 0 70832 35m 4920 S 0.0 4.8 0:35.87 httpd
15309 apache 20 0 70832 37m 4952 S 0.0 5.0 0:31.41 httpd
15840 postgres 20 0 50276 4156 3388 S 0.0 0.5 0:00.48 postmaster
15843 postgres 20 0 50276 4156 3400 S 0.0 0.5 0:00.51 postmaster
16996 root 20 0 21860 584 424 S 0.0 0.1 0:00.00 console-kit-dae
18100 apache 20 0 54184 21m 4904 S 0.0 2.9 0:10.43 httpd
18144 postgres 20 0 50276 4380 3396 S 0.0 0.6 0:00.17 postmaster
20327 apache 20 0 70832 37m 4924 S 0.0 5.0 0:35.10 httpd
20330 apache 20 0 70832 37m 4936 S 0.0 5.0 0:34.07 httpd
20331 apache 20 0 70832 37m 4920 S 0.0 5.0 0:32.81 httpd
20332 postgres 20 0 50276 4136 3376 S 0.0 0.5 0:00.49 postmaster
20389 ajaxterm 20 0 22648 2804 796 S 0.0 0.4 0:00.00 python
20443 postgres 20 0 50276 4132 3380 S 0.0 0.5 0:00.48 postmaster
20611 postgres 20 0 50276 4168 3404 S 0.0 0.5 0:00.51 postmaster
20616 ajaxterm 20 0 23004 3344 788 S 0.0 0.4 0:00.00 python
21099 ajaxterm 20 0 19204 13m 2180 S 0.0 1.8 0:00.15 ssh
21342 root 20 0 11568 3392 2612 S 0.0 0.4 0:00.08 sshd
21757 root 20 0 2004 500 444 S 0.0 0.1 0:00.00 mingetty
22129 root 20 0 5252 1680 1388 S 0.0 0.2 0:00.00 bash
25220 nagios 20 0 8112 116 76 S 0.0 0.0 0:22.32 ndo2db
25221 nagios 20 0 8652 628 332 S 0.0 0.1 2:52.63 ndo2db
25224 nagios 20 0 15332 2060 648 S 0.0 0.3 14:43.71 nagios
25734 postfix 20 0 12500 1700 1608 S 0.0 0.2 0:00.00 pickup
28890 postgres 20 0 49380 1436 1280 S 0.0 0.2 0:03.26 postmaster
28891 postgres 20 0 49260 188 136 S 0.0 0.0 0:01.02 postmaster
28893 postgres 20 0 49548 548 304 S 0.0 0.1 0:01.95 postmaster
28894 postgres 20 0 17708 348 164 S 0.0 0.0 0:11.23 postmaster
29101 apache 20 0 70832 37m 4708 S 0.0 4.9 12:30.76 httpd
29305 root 20 0 6328 1260 956 S 0.0 0.2 0:00.00 crond
29306 root 20 0 6328 1260 956 S 0.0 0.2 0:00.00 crond
29307 root 20 0 6328 1260 956 S 0.0 0.2 0:00.00 crond
29308 root 20 0 6328 1260 956 S 0.0 0.2 0:00.00 crond
29309 root 20 0 6328 1260 956 S 0.0 0.2 0:00.00 crond
29310 nagios 20 0 2980 1072 952 S 0.0 0.1 0:00.00 sh
29311 nagios 20 0 2980 1072 952 S 0.0 0.1 0:00.00 sh
29312 nagios 20 0 2980 1068 952 S 0.0 0.1 0:00.00 sh
29313 nagios 20 0 2980 1064 952 S 0.0 0.1 0:00.00 sh
29315 nagios 20 0 2980 1072 952 S 0.0 0.1 0:00.00 sh
29316 nagios 20 0 36812 15m 6580 S 0.0 2.0 0:00.11 php
29317 nagios 20 0 40780 18m 6640 S 0.0 2.5 0:00.16 php
29318 nagios 20 0 36824 15m 6592 S 0.0 2.0 0:00.12 php
29319 nagios 20 0 36940 15m 7016 S 0.0 2.1 0:00.14 php
29323 nagios 20 0 36940 15m 6616 S 0.0 2.0 0:00.12 php
29327 postgres 20 0 50284 4448 3420 S 0.0 0.6 0:00.02 postmaster
29329 postgres 20 0 50184 4156 3200 S 0.0 0.5 0:00.00 postmaster
29331 postgres 20 0 50184 4156 3204 S 0.0 0.5 0:00.00 postmaster
29374 postgres 20 0 50284 4472 3444 S 0.0 0.6 0:00.00 postmaster
29596 postgres 20 0 50192 4144 3196 S 0.0 0.5 0:00.00 postmaster
30821 postgres 20 0 50304 3276 2524 S 0.0 0.4 0:11.71 postmaster
31080 nagios 20 0 15336 1676 236 S 0.0 0.2 0:00.00 nagios
31081 nagios 20 0 4304 596 516 S 0.0 0.1 0:00.00 check_icmp
31100 root 20 0 2692 1016 768 R 0.0 0.1 0:00.00 top
31101 root 20 0 4172 644 568 S 0.0 0.1 0:00.00 more

Fourth command:
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
9.5G 5.3G 3.8G 59% /
tmpfs 376M 0 376M 0% /dev/shm
/dev/sda1 97M 64M 29M 69% /boot

Fifth command:

Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/VolGroup00-LogVol00
632736 113017 519719 18% /
tmpfs 96140 1 96139 1% /dev/shm
/dev/sda1 25688 51 25637 1% /boot
Last edited by scobi on Mon Jul 29, 2013 6:57 pm, edited 1 time in total.
scobi
Posts: 67
Joined: Tue Feb 28, 2012 8:36 pm

Re: high current load alerts after upgrading to 2012R2.2

Post by scobi »

Hi again!

Just to clarify, we haven't seen any issues until we upgraded to 2.2...

:-)
scobi
Posts: 67
Joined: Tue Feb 28, 2012 8:36 pm

Re: high current load alerts after upgrading to 2012R2.2

Post by scobi »

Hi! Just thought I'd let you guys know it's happening right now in case there was anything special you wanted me to look at.

Thanks!
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: high current load alerts after upgrading to 2012R2.2

Post by scottwilkerson »

if you could run the following when you are experiencing problems it would be helpful

Code: Select all

top -b -n 1
ps -ef
Also, if you could get the Admin -> System Profile while you were experiencing a problem an PM it, that would be helpful..

As Spenser said before, a load of 4 on a quad core machine is not that high, you may want to bump up the warning threshold for that check on localhost
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
scobi
Posts: 67
Joined: Tue Feb 28, 2012 8:36 pm

Re: high current load alerts after upgrading to 2012R2.2

Post by scobi »

Will do! Thanks!

Also, not sure if they're related - seeing the following in the console:

Out of memory: kill process 1928 (httpd) score 17709 or a child
Killed process 1928 (httpd) vsz:70836kB, anon-rss:23776kB, file-rss:136kB

httpd and postmaster.

I'll PM in just a bit...

Thanks!
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: high current load alerts after upgrading to 2012R2.2

Post by abrist »

The OOM error may be your issue. When you experience problems, in addition to Scott's commands, also run:

Code: Select all

free -m
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Locked