Nagios performance trouble

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
hhlodge
Posts: 206
Joined: Tue Mar 08, 2011 2:13 pm

Re: Nagios performance trouble

Post by hhlodge »

Good so far. I'll know if this helped after the overnight and will follow up tomorrow. Thanks much!
- Kyle
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios performance trouble

Post by scottwilkerson »

This looks a lot better. Do post back if anything else pops up.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
hhlodge
Posts: 206
Joined: Tue Mar 08, 2011 2:13 pm

Re: Nagios performance trouble

Post by hhlodge »

Load is back up with processes blocking again. XI web pages very slow to load. This system has 2 quad core 2.33GHz CPUs and /usr/local is a hardware RAID mirrored set of 10K SAS drives. Again, this has been online for a year and this issue just started a couple weeks ago. Should I try the RAM drive or is something in XI possibly broken?

Code: Select all

[root@psm-itmon nagios]# ps aux | grep " [D]"
root       379  0.0  0.0      0     0 ?        D    Sep18   0:11 [pdflush]
root      2347  0.0  0.0      0     0 ?        D<   Sep18   0:12 [kjournald]
nagios    4638  0.3  0.0  32148  5488 ?        Dsl  Sep18   3:59 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   22739  0.6  0.3 181144 25448 ?        D    08:16   0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php
postgres 22749  0.0  0.0 122116  4528 ?        D    08:16   0:00 postgres: nagiosxi nagiosxi 127.0.0.1(43855) UPDATE         

Code: Select all

# vmstat 5 10
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  5      0 2834376 335516 3526244    0    0     7    56   64   79  5  1 76 17  0
 0  9      0 2728628 335536 3526256    0    0     0   317 1194  718  5  1 42 52  0
 1  8      0 2724828 335536 3526256    0    0     0    30 1137  308  0  0 37 62  0
 1  5      0 2696412 335540 3526268    0    0     0   178 1040  247  4  1 51 44  0
 2  3      0 2682372 335544 3524816    0    0     0   550 1200  166  2  0 78 20  0
 0  3      0 2714128 335548 3524824    0    0     0   358 1160  413  3  0 82 15  0
 0  4      0 2713416 335548 3526184    0    0     0    95 1030  123  0  0 65 35  0
 0  4      0 2713936 335548 3526272    0    0     0    87 1039  104  0  0 52 48  0
 0  5      0 2698404 335548 3526272    0    0     0    10 1029  117  1  0 73 26  0
 0  5      0 2683616 335548 3526272    0    0     0    63 1023  117  1  0 62 37  0
- Kyle
hhlodge
Posts: 206
Joined: Tue Mar 08, 2011 2:13 pm

Re: Nagios performance trouble

Post by hhlodge »

I may have an issue with LVM. pvdisplay hangs for over a minute before showing my PVs.

Code: Select all

# pvdisplay
  /dev/hda: open failed: No medium found
  --- Physical volume ---
  PV Name               /dev/cciss/c0d1p2
  VG Name               VolGroup01
  PV Size               136.65 GB / not usable 25.47 MB
  Allocatable           yes (but full)
  PE Size (KByte)       32768
  Total PE              4372
  Free PE               0
  Allocated PE          4372
  PV UUID               bKyLqd-hfCd-sm5R-ZfF9-ldKH-MTAz-P4Cd4a
   
  --- Physical volume ---
  PV Name               /dev/cciss/c0d0p2
  VG Name               VolGroup00
  PV Size               68.23 GB / not usable 12.63 MB
  Allocatable           yes (but full)
  PE Size (KByte)       32768
  Total PE              2183
  Free PE               0
  Allocated PE          2183
  PV UUID               xqy328-hjvL-N7fs-8z18-FrQu-Tqpr-8sdYka
And fdisk gives this scary error on the /usr/local partition and I've not seen that before.

Code: Select all

# fdisk -l
Disk /dev/cciss/c0d1: 146.7 GB, 146778685440 bytes
255 heads, 32 sectors/track, 35132 cylinders
Units = cylinders of 8160 * 512 = 4177920 bytes

           Device Boot      Start         End      Blocks   Id  System
/dev/cciss/c0d1p1   *           1          13       50781   83  Linux
Partition 1 does not end on cylinder boundary.
/dev/cciss/c0d1p2              13       35132   143287778+  8e  Linux LVM
Partition 2 does not end on cylinder boundary.
- Kyle
hhlodge
Posts: 206
Joined: Tue Mar 08, 2011 2:13 pm

Re: Nagios performance trouble

Post by hhlodge »

I got rid of the boot partition on my second RAID device and rebooted and pvdisplay responds instantly and now all the errors are now gone. Let's see if this helps. No clue why the CentOS installer would have put a second boot partition there.
- Kyle
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios performance trouble

Post by scottwilkerson »

Looking at the vmstat you posted it looks like you have a quite a bit of wait time which is usually disk IO. I'll be interested to see if removing the extra boot drive helps, if not setting up the RAM Disk certainly can help with the disk IO.

I would also circle back around and look st the mysqld.log again to make sure we don't have a persistent error

Code: Select all

tail /var/log/mysqld.log
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
hhlodge
Posts: 206
Joined: Tue Mar 08, 2011 2:13 pm

Re: Nagios performance trouble

Post by hhlodge »

It was doing well for a couple hours and then just shot through the roof with blocked processes and load now around 10. Why this happened suddenly makes no sense since I've added very few hosts and services over the past couple months. I will try the RAM disk. Nothing in mysqld.log.

Code: Select all

# tail /var/log/mysqld.log
120919  9:08:00  InnoDB: Starting shutdown...
120919  9:08:01  InnoDB: Shutdown completed; log sequence number 0 43655
120919  9:08:01 [Note] /usr/libexec/mysqld: Shutdown complete

120919 09:08:01  mysqld ended

120919 09:36:33  mysqld started
120919  9:36:33  InnoDB: Started; log sequence number 0 43655
120919  9:36:33 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.0.77'  socket: '/usr/local/var/lib/mysql/mysql.sock'  port: 3306  Source distribution
- Kyle
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Nagios performance trouble

Post by slansing »

Let us know how the RAM Disk works for you, are the majority of your checks active, passive, or both?
hhlodge
Posts: 206
Joined: Tue Mar 08, 2011 2:13 pm

Re: Nagios performance trouble

Post by hhlodge »

All active. There's 190 hosts and 749 services and most I doubled the check interval from 5 to 10 minutes.
- Kyle
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Nagios performance trouble

Post by slansing »

Okay, that would not be causing the issue. I would pickup iotop and take a look at the top producers as well:

http://guichaz.free.fr/iotop/
Locked