Nagios performance trouble

hhlodge · Post by **hhlodge** » Tue Sep 18, 2012 2:59 pm

Good so far. I'll know if this helped after the overnight and will follow up tomorrow. Thanks much!

scottwilkerson · Post by **scottwilkerson** » Tue Sep 18, 2012 3:13 pm

This looks a lot better. Do post back if anything else pops up.

hhlodge · Post by **hhlodge** » Wed Sep 19, 2012 7:30 am

Load is back up with processes blocking again. XI web pages very slow to load. This system has 2 quad core 2.33GHz CPUs and /usr/local is a hardware RAID mirrored set of 10K SAS drives. Again, this has been online for a year and this issue just started a couple weeks ago. Should I try the RAM drive or is something in XI possibly broken?

Code: Select all

[root@psm-itmon nagios]# ps aux | grep " [D]"
root       379  0.0  0.0      0     0 ?        D    Sep18   0:11 [pdflush]
root      2347  0.0  0.0      0     0 ?        D<   Sep18   0:12 [kjournald]
nagios    4638  0.3  0.0  32148  5488 ?        Dsl  Sep18   3:59 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   22739  0.6  0.3 181144 25448 ?        D    08:16   0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php
postgres 22749  0.0  0.0 122116  4528 ?        D    08:16   0:00 postgres: nagiosxi nagiosxi 127.0.0.1(43855) UPDATE

Code: Select all

# vmstat 5 10
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  5      0 2834376 335516 3526244    0    0     7    56   64   79  5  1 76 17  0
 0  9      0 2728628 335536 3526256    0    0     0   317 1194  718  5  1 42 52  0
 1  8      0 2724828 335536 3526256    0    0     0    30 1137  308  0  0 37 62  0
 1  5      0 2696412 335540 3526268    0    0     0   178 1040  247  4  1 51 44  0
 2  3      0 2682372 335544 3524816    0    0     0   550 1200  166  2  0 78 20  0
 0  3      0 2714128 335548 3524824    0    0     0   358 1160  413  3  0 82 15  0
 0  4      0 2713416 335548 3526184    0    0     0    95 1030  123  0  0 65 35  0
 0  4      0 2713936 335548 3526272    0    0     0    87 1039  104  0  0 52 48  0
 0  5      0 2698404 335548 3526272    0    0     0    10 1029  117  1  0 73 26  0
 0  5      0 2683616 335548 3526272    0    0     0    63 1023  117  1  0 62 37  0

hhlodge · Post by **hhlodge** » Wed Sep 19, 2012 7:50 am

I may have an issue with LVM. pvdisplay hangs for over a minute before showing my PVs.

Code: Select all

# pvdisplay
  /dev/hda: open failed: No medium found
  --- Physical volume ---
  PV Name               /dev/cciss/c0d1p2
  VG Name               VolGroup01
  PV Size               136.65 GB / not usable 25.47 MB
  Allocatable           yes (but full)
  PE Size (KByte)       32768
  Total PE              4372
  Free PE               0
  Allocated PE          4372
  PV UUID               bKyLqd-hfCd-sm5R-ZfF9-ldKH-MTAz-P4Cd4a
   
  --- Physical volume ---
  PV Name               /dev/cciss/c0d0p2
  VG Name               VolGroup00
  PV Size               68.23 GB / not usable 12.63 MB
  Allocatable           yes (but full)
  PE Size (KByte)       32768
  Total PE              2183
  Free PE               0
  Allocated PE          2183
  PV UUID               xqy328-hjvL-N7fs-8z18-FrQu-Tqpr-8sdYka

And fdisk gives this scary error on the /usr/local partition and I've not seen that before.

Code: Select all

# fdisk -l
Disk /dev/cciss/c0d1: 146.7 GB, 146778685440 bytes
255 heads, 32 sectors/track, 35132 cylinders
Units = cylinders of 8160 * 512 = 4177920 bytes

           Device Boot      Start         End      Blocks   Id  System
/dev/cciss/c0d1p1   *           1          13       50781   83  Linux
Partition 1 does not end on cylinder boundary.
/dev/cciss/c0d1p2              13       35132   143287778+  8e  Linux LVM
Partition 2 does not end on cylinder boundary.

hhlodge · Post by **hhlodge** » Wed Sep 19, 2012 9:03 am

I got rid of the boot partition on my second RAID device and rebooted and pvdisplay responds instantly and now all the errors are now gone. Let's see if this helps. No clue why the CentOS installer would have put a second boot partition there.

scottwilkerson · Post by **scottwilkerson** » Wed Sep 19, 2012 10:14 am

Looking at the vmstat you posted it looks like you have a quite a bit of wait time which is usually disk IO. I'll be interested to see if removing the extra boot drive helps, if not setting up the RAM Disk certainly can help with the disk IO.

I would also circle back around and look st the mysqld.log again to make sure we don't have a persistent error

Code: Select all

tail /var/log/mysqld.log

hhlodge · Post by **hhlodge** » Wed Sep 19, 2012 12:13 pm

It was doing well for a couple hours and then just shot through the roof with blocked processes and load now around 10. Why this happened suddenly makes no sense since I've added very few hosts and services over the past couple months. I will try the RAM disk. Nothing in mysqld.log.

Code: Select all

# tail /var/log/mysqld.log
120919  9:08:00  InnoDB: Starting shutdown...
120919  9:08:01  InnoDB: Shutdown completed; log sequence number 0 43655
120919  9:08:01 [Note] /usr/libexec/mysqld: Shutdown complete

120919 09:08:01  mysqld ended

120919 09:36:33  mysqld started
120919  9:36:33  InnoDB: Started; log sequence number 0 43655
120919  9:36:33 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.0.77'  socket: '/usr/local/var/lib/mysql/mysql.sock'  port: 3306  Source distribution

slansing · Post by **slansing** » Wed Sep 19, 2012 12:17 pm

Let us know how the RAM Disk works for you, are the majority of your checks active, passive, or both?

hhlodge · Post by **hhlodge** » Wed Sep 19, 2012 12:54 pm

All active. There's 190 hosts and 749 services and most I doubled the check interval from 5 to 10 minutes.

slansing · Post by **slansing** » Wed Sep 19, 2012 1:03 pm

Okay, that would not be causing the issue. I would pickup iotop and take a look at the top producers as well:

http://guichaz.free.fr/iotop/

Nagios Support Forum

Nagios performance trouble

Re: Nagios performance trouble

Re: Nagios performance trouble

Re: Nagios performance trouble

Re: Nagios performance trouble

Re: Nagios performance trouble

Re: Nagios performance trouble

Re: Nagios performance trouble

Re: Nagios performance trouble

Re: Nagios performance trouble

Re: Nagios performance trouble