Nagios performance trouble
Re: Nagios performance trouble
Good so far. I'll know if this helped after the overnight and will follow up tomorrow. Thanks much!
- Kyle
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Nagios performance trouble
This looks a lot better. Do post back if anything else pops up.
Re: Nagios performance trouble
Load is back up with processes blocking again. XI web pages very slow to load. This system has 2 quad core 2.33GHz CPUs and /usr/local is a hardware RAID mirrored set of 10K SAS drives. Again, this has been online for a year and this issue just started a couple weeks ago. Should I try the RAM drive or is something in XI possibly broken?
Code: Select all
[root@psm-itmon nagios]# ps aux | grep " [D]"
root 379 0.0 0.0 0 0 ? D Sep18 0:11 [pdflush]
root 2347 0.0 0.0 0 0 ? D< Sep18 0:12 [kjournald]
nagios 4638 0.3 0.0 32148 5488 ? Dsl Sep18 3:59 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 22739 0.6 0.3 181144 25448 ? D 08:16 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php
postgres 22749 0.0 0.0 122116 4528 ? D 08:16 0:00 postgres: nagiosxi nagiosxi 127.0.0.1(43855) UPDATE
Code: Select all
# vmstat 5 10
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 5 0 2834376 335516 3526244 0 0 7 56 64 79 5 1 76 17 0
0 9 0 2728628 335536 3526256 0 0 0 317 1194 718 5 1 42 52 0
1 8 0 2724828 335536 3526256 0 0 0 30 1137 308 0 0 37 62 0
1 5 0 2696412 335540 3526268 0 0 0 178 1040 247 4 1 51 44 0
2 3 0 2682372 335544 3524816 0 0 0 550 1200 166 2 0 78 20 0
0 3 0 2714128 335548 3524824 0 0 0 358 1160 413 3 0 82 15 0
0 4 0 2713416 335548 3526184 0 0 0 95 1030 123 0 0 65 35 0
0 4 0 2713936 335548 3526272 0 0 0 87 1039 104 0 0 52 48 0
0 5 0 2698404 335548 3526272 0 0 0 10 1029 117 1 0 73 26 0
0 5 0 2683616 335548 3526272 0 0 0 63 1023 117 1 0 62 37 0
- Kyle
Re: Nagios performance trouble
I may have an issue with LVM. pvdisplay hangs for over a minute before showing my PVs.
And fdisk gives this scary error on the /usr/local partition and I've not seen that before.
Code: Select all
# pvdisplay
/dev/hda: open failed: No medium found
--- Physical volume ---
PV Name /dev/cciss/c0d1p2
VG Name VolGroup01
PV Size 136.65 GB / not usable 25.47 MB
Allocatable yes (but full)
PE Size (KByte) 32768
Total PE 4372
Free PE 0
Allocated PE 4372
PV UUID bKyLqd-hfCd-sm5R-ZfF9-ldKH-MTAz-P4Cd4a
--- Physical volume ---
PV Name /dev/cciss/c0d0p2
VG Name VolGroup00
PV Size 68.23 GB / not usable 12.63 MB
Allocatable yes (but full)
PE Size (KByte) 32768
Total PE 2183
Free PE 0
Allocated PE 2183
PV UUID xqy328-hjvL-N7fs-8z18-FrQu-Tqpr-8sdYka
Code: Select all
# fdisk -l
Disk /dev/cciss/c0d1: 146.7 GB, 146778685440 bytes
255 heads, 32 sectors/track, 35132 cylinders
Units = cylinders of 8160 * 512 = 4177920 bytes
Device Boot Start End Blocks Id System
/dev/cciss/c0d1p1 * 1 13 50781 83 Linux
Partition 1 does not end on cylinder boundary.
/dev/cciss/c0d1p2 13 35132 143287778+ 8e Linux LVM
Partition 2 does not end on cylinder boundary.
- Kyle
Re: Nagios performance trouble
I got rid of the boot partition on my second RAID device and rebooted and pvdisplay responds instantly and now all the errors are now gone. Let's see if this helps. No clue why the CentOS installer would have put a second boot partition there.
- Kyle
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Nagios performance trouble
Looking at the vmstat you posted it looks like you have a quite a bit of wait time which is usually disk IO. I'll be interested to see if removing the extra boot drive helps, if not setting up the RAM Disk certainly can help with the disk IO.
I would also circle back around and look st the mysqld.log again to make sure we don't have a persistent error
I would also circle back around and look st the mysqld.log again to make sure we don't have a persistent error
Code: Select all
tail /var/log/mysqld.logRe: Nagios performance trouble
It was doing well for a couple hours and then just shot through the roof with blocked processes and load now around 10. Why this happened suddenly makes no sense since I've added very few hosts and services over the past couple months. I will try the RAM disk. Nothing in mysqld.log.
Code: Select all
# tail /var/log/mysqld.log
120919 9:08:00 InnoDB: Starting shutdown...
120919 9:08:01 InnoDB: Shutdown completed; log sequence number 0 43655
120919 9:08:01 [Note] /usr/libexec/mysqld: Shutdown complete
120919 09:08:01 mysqld ended
120919 09:36:33 mysqld started
120919 9:36:33 InnoDB: Started; log sequence number 0 43655
120919 9:36:33 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.0.77' socket: '/usr/local/var/lib/mysql/mysql.sock' port: 3306 Source distribution
- Kyle
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: Nagios performance trouble
Let us know how the RAM Disk works for you, are the majority of your checks active, passive, or both?
Re: Nagios performance trouble
All active. There's 190 hosts and 749 services and most I doubled the check interval from 5 to 10 minutes.
- Kyle
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: Nagios performance trouble
Okay, that would not be causing the issue. I would pickup iotop and take a look at the top producers as well:
http://guichaz.free.fr/iotop/
http://guichaz.free.fr/iotop/