Nagios hanging during start-up/reboot

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
lce411
Posts: 41
Joined: Thu Jun 07, 2012 1:28 pm

Re: Nagios hanging during start-up/reboot

Post by lce411 »

sreinhardt wrote:Lets get some basic system specs and load settings, ideally while its running sluggishly. This also may be easier to attach as a text message than typing in.

Code: Select all

ps ax | wc -l

ulimit -a

cat /proc/loadavg

cat /proc/sys/kernel/threads-max

grep -i rlimit /usr/local/apache/conf/httpd.conf

uptime

free -m

df -h

df -i

tail /var/log/messages

Code: Select all

ps ax | wc -l

[nagios@cde-nagios ~]$ ps ax | wc -l
247

ulimit -a

[root@cde-nagios ~]# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 31568
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 31568
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

cat /proc/loadavg

[root@cde-nagios ~]# cat /proc/loadavg
105.81 105.64 103.82 106/249 13486

cat /proc/sys/kernel/threads-max

[root@cde-nagios ~]# cat /proc/sys/kernel/threads-max
63137

grep -i rlimit /usr/local/apache/conf/httpd.conf

No such file or directory

uptime

[root@cde-nagios ~]# uptime
 15:23:33 up  7:04,  3 users,  load average: 107.27, 106.10, 104.11

free -m

[root@cde-nagios ~]# free -m
             total       used       free     shared    buffers     cached
Mem:          3949       3095        853          0        162        391
-/+ buffers/cache:       2541       1407
Swap:          501          0        501

df -h

[root@cde-nagios ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             3.8G  2.7G  959M  74% /
/dev/sda3              15G  9.0G  5.0G  65% /var
/dev/sda2             487M   29M  433M   7% /home
tmpfs                 2.0G     0  2.0G   0% /dev/shm

df -i

[root@cde-nagios ~]# df -i
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/sda1            1025024   76548  948476    8% /
/dev/sda3            3961056   23657 3937399    1% /var
/dev/sda2             128520      81  128439    1% /home
tmpfs                 505472       1  505471    1% /dev/shm
I have to pick up my child from school, so I will reply to any questions you have tomorrow morning. I appreciate any help you may have.
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Nagios hanging during start-up/reboot

Post by sreinhardt »

Your memory and process utilization don't seem bad, however your average load is through the roof. 100+ for all 1/5/15 minute intervals is extremely high. Lets get a few more stats and see what else might be going on. Also what total number of checks and intervals are you presently running?

Code: Select all

top > /tmp/top (press ctrl+c after a few seconds to break it)
cat /tmp/top

iostat -x
Also note, holy cow man, mcafee and linuxshield appear to be kicking your systems butt. Did this happen to start when they both were installed?
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
lce411
Posts: 41
Joined: Thu Jun 07, 2012 1:28 pm

Re: Nagios hanging during start-up/reboot

Post by lce411 »

top > /tmp/top (press ctrl+c after a few seconds to break it)
cat /tmp/top

Code: Select all

[root@cde-nagios ~]# cat /tmp/top
top - 09:27:50 up 6 min,  1 user,  load average: 3.02, 2.40, 1.11
Tasks: 112 total,   3 running, 109 sleeping,   0 stopped,   0 zombie
Cpu(s): 89.8%us, 10.2%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   4043780k total,   792008k used,  3251772k free,   109892k buffers
Swap:   514040k total,        0k used,   514040k free,   366184k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 4265 root      25   0  114m  47m  46m R  9.6  1.2   0:09.13 scanner
 2691 root      25   0  359m 300m 107m R  7.6  7.6   1:16.03 scanner
 2769 nagios    15   0 42588 2812 1312 S  0.7  0.1   0:01.09 nagios
    4 root      10  -5     0    0    0 S  0.3  0.0   0:00.22 events/0
 2686 root      15   0 19120 4168 3352 S  0.3  0.1   0:00.21 nailslogd
    1 root      15   0 10372  704  592 S  0.0  0.0   0:00.43 init
    2 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 migration/0
    3 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/0
    5 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 khelper
   46 root      11  -5     0    0    0 S  0.0  0.0   0:00.00 kthread
   50 root      10  -5     0    0    0 S  0.0  0.0   0:00.02 kblockd/0
   51 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 cqueue/0
   54 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 khubd
   56 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 kseriod
  203 root      15   0     0    0    0 S  0.0  0.0   0:00.00 khungtaskd
  204 root      25   0     0    0    0 S  0.0  0.0   0:00.00 pdflush
  205 root      15   0     0    0    0 S  0.0  0.0   0:00.07 pdflush
  206 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 kswapd0
  207 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 aio/0
  411 root      11  -5     0    0    0 S  0.0  0.0   0:00.00 kpsmoused
  441 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 mpt_poll_0
  442 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 mpt/0
  443 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 scsi_eh_0
  446 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 ata/0
  447 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 ata_aux
  454 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 kstriped
  463 root      10  -5     0    0    0 S  0.0  0.0   0:00.07 kjournald
  488 root      10  -5     0    0    0 S  0.0  0.0   0:00.24 kauditd
  521 root      16  -4 12700  840  420 S  0.0  0.0   0:00.23 udevd
 1516 root      12  -5     0    0    0 S  0.0  0.0   0:00.00 kmpathd/0
 1517 root      12  -5     0    0    0 S  0.0  0.0   0:00.00 kmpath_handlerd
 1539 root      10  -5     0    0    0 S  0.0  0.0   0:02.17 kjournald
 1541 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 kjournald
 1813 root      15   0 49996 2968 2372 S  0.0  0.1   0:00.33 vmtoolsd
 1860 root      12  -5     0    0    0 S  0.0  0.0   0:00.00 iscsi_eh
 1892 root      16  -5     0    0    0 S  0.0  0.0   0:00.00 cnic_wq
 1897 root       2 -20     0    0    0 S  0.0  0.0   0:00.00 bnx2i_thread/0
 1910 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 ib_addr
 1917 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 ib_mcast
 1918 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 ib_inform
 1919 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 local_sa
 1922 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 iw_cm_wq
 1926 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 ib_cm/0
 1928 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 rdma_cm
 1944 root      10 -10 28704  22m 1744 S  0.0  0.6   0:00.00 iscsiuio
 1949 root      18   0  4596  452  372 S  0.0  0.0   0:00.00 iscsid
 1950 root       5 -10  5100 3048 1904 S  0.0  0.1   0:00.00 iscsid
 2207 root      11  -4 27376  828  540 S  0.0  0.0   0:02.87 auditd
iostat -x

Code: Select all

[root@cde-nagios ~]# iostat -x
Linux 2.6.18-348.3.1.el5 (cde-nagios)   05/29/2013

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          55.52    0.00   15.68   14.47    0.00   14.33

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda             124.06   774.10 78.64 281.53  3423.49  8433.25    32.92     2.08    5.78   2.52  90.80
sda1             36.20   142.63 36.39  9.50  2808.61  1217.03    87.74     0.63   13.72   4.62  21.22
sda2              1.85     0.01  0.62  0.10    14.77     0.58    21.26     0.01    8.17   8.12   0.59
sda3             78.81   631.45 41.32 271.93   589.85  7215.64    24.92     1.45    4.62   2.56  80.35
sda4              0.00     0.00  0.02  0.00     0.04     0.00     2.00     0.00    3.00   3.00   0.01
sda5              7.04     0.00  0.19  0.00     8.09     0.00    43.59     0.00    4.49   3.76   0.07
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Nagios hanging during start-up/reboot

Post by sreinhardt »

It would seem that your largest issue is disk utilization followed closely by fluctuating CPU usage. Have you taken a look at offloading mysql and using a ramdisk for check results and performance data?
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
lce411
Posts: 41
Joined: Thu Jun 07, 2012 1:28 pm

Re: Nagios hanging during start-up/reboot

Post by lce411 »

sreinhardt wrote:It would seem that your largest issue is disk utilization followed closely by fluctuating CPU usage. Have you taken a look at offloading mysql and using a ramdisk for check results and performance data?
I've tried offloading MySQL to another server, but was unable to get data to actually be sent to it. I was able to log into the MySQL server, from the Nagios server, so connectivity was there, but the db never grew. I tried/checked everything myself and coworker could think of, but nothing made a difference. The CPU usage is a new issue. Someone is testing out HBSS and it's hammering the Nagios server CPU. I've never had a RAM disk suggested to me, but I feel like I would need to get the other things resolved for a RAM disk to be fully effective. This Nagios server is a VMware image. Would simply adding a CPU or RAM help resolve this?
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Nagios hanging during start-up/reboot

Post by slansing »

Well the fact that you have someone testing something on your production nagios server is a big indicator. Besides what we have suggested, and possibly adding more memory and another cpu or core there is not much else, it seems to be the current environment the server is in as spenser pointed out. It does not look like this is an issue with Nagios specifically.
lce411
Posts: 41
Joined: Thu Jun 07, 2012 1:28 pm

Re: Nagios hanging during start-up/reboot

Post by lce411 »

Well the problem has been resolved. It was not anything Nagios related. I didn't think it was, but wanted to do my due-diligence. One of our VMware guys found that, although there was 4GB or RAM configured, there was a 512MB reserve set. Once the reserve was removed, the CPU usage dropped drastically. I appreciate all the help in confirming it was not in fact a problem with Nagios. I was able to report that this morning and the VMware guys had it resolved before lunch.
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Nagios hanging during start-up/reboot

Post by sreinhardt »

Fantastic, glad to hear its working! I'll lock this up then.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Locked