Page 2 of 2
Re: Nagios hanging during start-up/reboot
Posted: Tue May 28, 2013 2:28 pm
by lce411
sreinhardt wrote:Lets get some basic system specs and load settings, ideally while its running sluggishly. This also may be easier to attach as a text message than typing in.
Code: Select all
ps ax | wc -l
ulimit -a
cat /proc/loadavg
cat /proc/sys/kernel/threads-max
grep -i rlimit /usr/local/apache/conf/httpd.conf
uptime
free -m
df -h
df -i
tail /var/log/messages
Code: Select all
ps ax | wc -l
[nagios@cde-nagios ~]$ ps ax | wc -l
247
ulimit -a
[root@cde-nagios ~]# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 31568
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 31568
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
cat /proc/loadavg
[root@cde-nagios ~]# cat /proc/loadavg
105.81 105.64 103.82 106/249 13486
cat /proc/sys/kernel/threads-max
[root@cde-nagios ~]# cat /proc/sys/kernel/threads-max
63137
grep -i rlimit /usr/local/apache/conf/httpd.conf
No such file or directory
uptime
[root@cde-nagios ~]# uptime
15:23:33 up 7:04, 3 users, load average: 107.27, 106.10, 104.11
free -m
[root@cde-nagios ~]# free -m
total used free shared buffers cached
Mem: 3949 3095 853 0 162 391
-/+ buffers/cache: 2541 1407
Swap: 501 0 501
df -h
[root@cde-nagios ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 3.8G 2.7G 959M 74% /
/dev/sda3 15G 9.0G 5.0G 65% /var
/dev/sda2 487M 29M 433M 7% /home
tmpfs 2.0G 0 2.0G 0% /dev/shm
df -i
[root@cde-nagios ~]# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda1 1025024 76548 948476 8% /
/dev/sda3 3961056 23657 3937399 1% /var
/dev/sda2 128520 81 128439 1% /home
tmpfs 505472 1 505471 1% /dev/shm
I have to pick up my child from school, so I will reply to any questions you have tomorrow morning. I appreciate any help you may have.
Re: Nagios hanging during start-up/reboot
Posted: Tue May 28, 2013 3:33 pm
by sreinhardt
Your memory and process utilization don't seem bad, however your average load is through the roof. 100+ for all 1/5/15 minute intervals is extremely high. Lets get a few more stats and see what else might be going on. Also what total number of checks and intervals are you presently running?
Code: Select all
top > /tmp/top (press ctrl+c after a few seconds to break it)
cat /tmp/top
iostat -x
Also note, holy cow man, mcafee and linuxshield appear to be kicking your systems butt. Did this happen to start when they both were installed?
Re: Nagios hanging during start-up/reboot
Posted: Wed May 29, 2013 9:05 am
by lce411
top > /tmp/top (press ctrl+c after a few seconds to break it)
cat /tmp/top
Code: Select all
[root@cde-nagios ~]# cat /tmp/top
top - 09:27:50 up 6 min, 1 user, load average: 3.02, 2.40, 1.11
Tasks: 112 total, 3 running, 109 sleeping, 0 stopped, 0 zombie
Cpu(s): 89.8%us, 10.2%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 4043780k total, 792008k used, 3251772k free, 109892k buffers
Swap: 514040k total, 0k used, 514040k free, 366184k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4265 root 25 0 114m 47m 46m R 9.6 1.2 0:09.13 scanner
2691 root 25 0 359m 300m 107m R 7.6 7.6 1:16.03 scanner
2769 nagios 15 0 42588 2812 1312 S 0.7 0.1 0:01.09 nagios
4 root 10 -5 0 0 0 S 0.3 0.0 0:00.22 events/0
2686 root 15 0 19120 4168 3352 S 0.3 0.1 0:00.21 nailslogd
1 root 15 0 10372 704 592 S 0.0 0.0 0:00.43 init
2 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/0
3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
5 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 khelper
46 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 kthread
50 root 10 -5 0 0 0 S 0.0 0.0 0:00.02 kblockd/0
51 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 cqueue/0
54 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 khubd
56 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kseriod
203 root 15 0 0 0 0 S 0.0 0.0 0:00.00 khungtaskd
204 root 25 0 0 0 0 S 0.0 0.0 0:00.00 pdflush
205 root 15 0 0 0 0 S 0.0 0.0 0:00.07 pdflush
206 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 kswapd0
207 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 aio/0
411 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 kpsmoused
441 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 mpt_poll_0
442 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 mpt/0
443 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_0
446 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 ata/0
447 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 ata_aux
454 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 kstriped
463 root 10 -5 0 0 0 S 0.0 0.0 0:00.07 kjournald
488 root 10 -5 0 0 0 S 0.0 0.0 0:00.24 kauditd
521 root 16 -4 12700 840 420 S 0.0 0.0 0:00.23 udevd
1516 root 12 -5 0 0 0 S 0.0 0.0 0:00.00 kmpathd/0
1517 root 12 -5 0 0 0 S 0.0 0.0 0:00.00 kmpath_handlerd
1539 root 10 -5 0 0 0 S 0.0 0.0 0:02.17 kjournald
1541 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kjournald
1813 root 15 0 49996 2968 2372 S 0.0 0.1 0:00.33 vmtoolsd
1860 root 12 -5 0 0 0 S 0.0 0.0 0:00.00 iscsi_eh
1892 root 16 -5 0 0 0 S 0.0 0.0 0:00.00 cnic_wq
1897 root 2 -20 0 0 0 S 0.0 0.0 0:00.00 bnx2i_thread/0
1910 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 ib_addr
1917 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 ib_mcast
1918 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 ib_inform
1919 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 local_sa
1922 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 iw_cm_wq
1926 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 ib_cm/0
1928 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 rdma_cm
1944 root 10 -10 28704 22m 1744 S 0.0 0.6 0:00.00 iscsiuio
1949 root 18 0 4596 452 372 S 0.0 0.0 0:00.00 iscsid
1950 root 5 -10 5100 3048 1904 S 0.0 0.1 0:00.00 iscsid
2207 root 11 -4 27376 828 540 S 0.0 0.0 0:02.87 auditd
iostat -x
Code: Select all
[root@cde-nagios ~]# iostat -x
Linux 2.6.18-348.3.1.el5 (cde-nagios) 05/29/2013
avg-cpu: %user %nice %system %iowait %steal %idle
55.52 0.00 15.68 14.47 0.00 14.33
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 124.06 774.10 78.64 281.53 3423.49 8433.25 32.92 2.08 5.78 2.52 90.80
sda1 36.20 142.63 36.39 9.50 2808.61 1217.03 87.74 0.63 13.72 4.62 21.22
sda2 1.85 0.01 0.62 0.10 14.77 0.58 21.26 0.01 8.17 8.12 0.59
sda3 78.81 631.45 41.32 271.93 589.85 7215.64 24.92 1.45 4.62 2.56 80.35
sda4 0.00 0.00 0.02 0.00 0.04 0.00 2.00 0.00 3.00 3.00 0.01
sda5 7.04 0.00 0.19 0.00 8.09 0.00 43.59 0.00 4.49 3.76 0.07
Re: Nagios hanging during start-up/reboot
Posted: Wed May 29, 2013 4:02 pm
by sreinhardt
It would seem that your largest issue is disk utilization followed closely by fluctuating CPU usage. Have you taken a look at
offloading mysql and
using a ramdisk for check results and performance data?
Re: Nagios hanging during start-up/reboot
Posted: Thu May 30, 2013 7:28 am
by lce411
sreinhardt wrote:It would seem that your largest issue is disk utilization followed closely by fluctuating CPU usage. Have you taken a look at
offloading mysql and
using a ramdisk for check results and performance data?
I've tried offloading MySQL to another server, but was unable to get data to actually be sent to it. I was able to log into the MySQL server, from the Nagios server, so connectivity was there, but the db never grew. I tried/checked everything myself and coworker could think of, but nothing made a difference. The CPU usage is a new issue. Someone is testing out HBSS and it's hammering the Nagios server CPU. I've never had a RAM disk suggested to me, but I feel like I would need to get the other things resolved for a RAM disk to be fully effective. This Nagios server is a VMware image. Would simply adding a CPU or RAM help resolve this?
Re: Nagios hanging during start-up/reboot
Posted: Thu May 30, 2013 10:25 am
by slansing
Well the fact that you have someone testing something on your production nagios server is a big indicator. Besides what we have suggested, and possibly adding more memory and another cpu or core there is not much else, it seems to be the current environment the server is in as spenser pointed out. It does not look like this is an issue with Nagios specifically.
Re: Nagios hanging during start-up/reboot
Posted: Thu May 30, 2013 11:33 am
by lce411
Well the problem has been resolved. It was not anything Nagios related. I didn't think it was, but wanted to do my due-diligence. One of our VMware guys found that, although there was 4GB or RAM configured, there was a 512MB reserve set. Once the reserve was removed, the CPU usage dropped drastically. I appreciate all the help in confirming it was not in fact a problem with Nagios. I was able to report that this morning and the VMware guys had it resolved before lunch.
Re: Nagios hanging during start-up/reboot
Posted: Thu May 30, 2013 11:50 am
by sreinhardt
Fantastic, glad to hear its working! I'll lock this up then.