Nagios 4.1.1 too many zombie process and 100% cpu usage

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
sainudani
Posts: 28
Joined: Mon Sep 05, 2016 8:59 pm

Nagios 4.1.1 too many zombie process and 100% cpu usage

Post by sainudani »

Hi,
Nagios 4.1.1 consumes 100% cpu.Any fix for this issue ?
My Nagios Core is running on Solaris 10 64bit SPARC.


load averages: 9.27, 5.26, 2.32; up 3+21:46:20
540 processes: 66 sleeping, 1 running, 465 zombie, 8 on cpu
CPU states: 0.0% idle, 71.7% user, 28.3% kernel, 0.0% iowait, 0.0% swap


Thanks,
Sainu Daniel
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Nagios 4.1.1 too many zombie process and 100% cpu usage

Post by rkennedy »

How many host / service checks do you have running on the machine? What are the top processes currently?

Please also post the full output of ps -ef
Former Nagios Employee
sainudani
Posts: 28
Joined: Mon Sep 05, 2016 8:59 pm

Re: Nagios 4.1.1 too many zombie process and 100% cpu usage

Post by sainudani »

Code: Select all

bash-3.2# ps -ef
     UID   PID  PPID   C    STIME TTY         TIME CMD
noaccess 19840 19186   0   Sep 02 ?          13:07 /usr/java/bin/java -server -Xmx128m -XX:+UseParallelGC -XX:ParallelGCThreads=4
  nagios  5950 20495   0        - ?           0:00 <defunct>
    root 19195 19186   0   Sep 02 ?           0:02 /sbin/init
  nagios  6247 20501   0        - ?           0:00 <defunct>
  nagios  5192 20501   0        - ?           0:00 <defunct>
  nagios  5191 20492   0        - ?           0:00 <defunct>
  nagios  6530 20494   0        - ?           0:00 <defunct>
  nagios  6892 20494   0        - ?           0:00 <defunct>
    root 19199 19186   0   Sep 02 ?           0:22 /lib/svc/bin/svc.configd
  daemon 19377 19186   0   Sep 02 ?           0:00 /usr/sbin/rpcbind
  nagios  6696 20494   0        - ?           0:00 <defunct>
  nagios  5751 20491   0        - ?           0:00 <defunct>
  nagios  6361 20492   0        - ?           0:00 <defunct>
    root 19186 19186   0   Sep 02 ?           0:00 zsched
  nagios  6772 20492   0        - ?           0:00 <defunct>
  nagios  6910 20499   0        - ?           0:00 <defunct>
  nagios  5870 20494   0        - ?           0:00 <defunct>
  daemon 19387 19186   0   Sep 02 ?           0:00 /usr/lib/nfs/nfs4cbd
  nagios  6799 20493   0        - ?           0:00 <defunct>
  nagios  5183 20502   0        - ?           0:00 <defunct>
  nagios  6981 20494   0        - ?           0:00 <defunct>
  nagios  6909 20495   0        - ?           0:00 <defunct>
    root 19435 19186   0   Sep 02 ?           0:01 /usr/lib/ssh/sshd
  nagios  6791 20501   0        - ?           0:00 <defunct>
  nagios  6116 20492   0        - ?           0:00 <defunct>
    root 19492 19186   0   Sep 02 ?           1:23 /usr/lib/sendmail -bl -q15m
  nagios  6092 20492   0        - ?           0:00 <defunct>
  nagios  6444 20492   0        - ?           0:00 <defunct>
  nagios 20494 20489  12   Sep 13 ?        3944:09 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
  nagios  6445 20493   0        - ?           0:00 <defunct>
  nagios  6389 20494   0        - ?           0:00 <defunct>
webservd 27335  7131   0 10:39:28 ?           0:00 /opt/coolstack/apache2/bin/httpd -k start
  nagios  6374 20501   0        - ?           0:00 <defunct>
  nagios  6397 20495   0        - ?           0:00 <defunct>
  nagios  6422 20493   0        - ?           0:00 <defunct>
  nagios  5314 20494   0        - ?           0:00 <defunct>
  nagios  6365 20501   0        - ?           0:00 <defunct>
  daemon 19273 19186   0   Sep 02 ?           0:00 /usr/lib/crypto/kcfd
  nagios  6724 20499   0        - ?           0:00 <defunct>
  nagios  6206 20495   0        - ?           0:00 <defunct>
    root 19280 19186   0   Sep 02 ?          10:37 /usr/sbin/nscd
  nagios  6215 20499   0        - ?           0:00 <defunct>
  nagios  6599 20502   0        - ?           0:00 <defunct>
  nagios  5766 20491   0        - ?           0:00 <defunct>
  nagios  6716 20502   0        - ?           0:00 <defunct>
  nagios  6447 20494   0        - ?           0:00 <defunct>
  nagios  6972 20492   0        - ?           0:00 <defunct>
  nagios  6427 20502   0        - ?           0:00 <defunct>
  nagios  6115 20491   0        - ?           0:00 <defunct>
  nagios  6428 20491   0        - ?           0:00 <defunct>
  nagios  5833 20491   0        - ?           0:00 <defunct>
  nagios  5237 20491   0        - ?           0:00 <defunct>
  nagios  5875 20501   0        - ?           0:00 <defunct>
  nagios  6795 20501   0        - ?           0:00 <defunct>
  nagios  5496 20501   0        - ?           0:00 <defunct>
  nagios  6730 20493   0        - ?           0:00 <defunct>
  nagios  6893 20495   0        - ?           0:00 <defunct>
  nagios 20493 20489  12   Sep 13 ?        3946:06 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
  nagios  5856 20491   0        - ?           0:00 <defunct>
  nagios  6121 20499   0        - ?           0:00 <defunct>
  nagios  5896 20493   0        - ?           0:00 <defunct>
  nagios  5320 20501   0        - ?           0:00 <defunct>
  nagios  6561 20501   0        - ?           0:00 <defunct>
  nagios  5770 20493   0        - ?           0:00 <defunct>
  nagios  6824 20502   0        - ?           0:00 <defunct>
  nagios  6888 20491   0        - ?           0:00 <defunct>
  nagios  5494 20494   0        - ?           0:00 <defunct>
  nagios  5334 20492   0        - ?           0:00 <defunct>
  nagios  5689 20499   0        - ?           0:00 <defunct>
  nagios  5866 20492   0        - ?           0:00 <defunct>
  nagios  5166 20492   0        - ?           0:00 <defunct>
  nagios  6649 20495   0        - ?           0:00 <defunct>
  nagios  6264 20501   0        - ?           0:00 <defunct>
  nagios  6259 20493   0        - ?           0:00 <defunct>
  nagios  6691 20502   0        - ?           0:00 <defunct>
  nagios  5855 20499   0        - ?           0:00 <defunct>
  nagios  6742 20495   0        - ?           0:00 <defunct>
  nagios  6760 20499   0 10:50:38 ?           0:00 /bin/perl /usr/local/nagios/libexec/psu_check melsmpmednbu01
  nagios  6380 20493   0        - ?           0:00 <defunct>
    root  5353 19186   0 10:49:10 pts/4       0:00 -bash
  nagios  6110 20494   0        - ?           0:00 <defunct>
  nagios  6578 20492   0        - ?           0:00 <defunct>
  nagios  6797 20499   0        - ?           0:00 <defunct>
  nagios  6725 20494   0        - ?           0:00 <defunct>
  nagios  6936 20499   0        - ?           0:00 <defunct>
  nagios  6558 20492   0        - ?           0:00 <defunct>
  nagios  5331 20491   0        - ?           0:00 <defunct>
  nagios  6739 20494   0        - ?           0:00 <defunct>
  nagios  5688 20502   0        - ?           0:00 <defunct>
  nagios  6583 20493   0        - ?           0:00 <defunct>
  nagios  5172 20493   0        - ?           0:00 <defunct>
  nagios  6408 20502   0        - ?           0:00 <defunct>
  nagios  6093 20493   0        - ?           0:00 <defunct>
  nagios  5768 20492   0        - ?           0:00 <defunct>
  nagios  6802 20494   0        - ?           0:00 <defunct>
  nagios  5826 20502   0        - ?           0:00 <defunct>
  nagios  5845 20502   0        - ?           0:00 <defunct>
  nagios  6991 20492   0        - ?           0:00 <defunct>
  nagios  6911 20502   0        - ?           0:00 <defunct>
  nagios  6400 20501   0        - ?           0:00 <defunct>
  nagios  6632 20501   0        - ?           0:00 <defunct>
  nagios  6087 20501   0        - ?           0:00 <defunct>
  nagios  6208 20499   0        - ?           0:00 <defunct>
    root 19772 19186   0   Sep 02 ?           0:00 /esm/bin/solaris-sparc/esmd -fv
  nagios  6808 20495   0        - ?           0:00 <defunct>
  nagios  6818 20495   0        - ?           0:00 <defunct>
webservd 29488  7131   0 10:42:30 ?           0:00 /opt/coolstack/apache2/bin/httpd -k start
  nagios  6786  6779   0 10:50:38 pts/5       0:00 /bin/ssh admin@melsmpmednbu02-alom
  nagios  6719 20491   0        - ?           0:00 <defunct>
  nagios  5339 20495   0        - ?           0:00 <defunct>
  nagios  5906 20494   0        - ?           0:00 <defunct>
  nagios  6443 20502   0        - ?           0:00 <defunct>
  nagios  5493 20493   0        - ?           0:00 <defunct>
  daemon 19392 19186   0   Sep 02 ?           0:00 /usr/lib/nfs/statd
  nagios  6113 20499   0        - ?           0:00 <defunct>
  nagios  6378 20502   0        - ?           0:00 <defunct>
  nagios  6442 20499   0        - ?           0:00 <defunct>
  nagios  6078 20495   0        - ?           0:00 <defunct>
  nagios  5920 20495   0        - ?           0:00 <defunct>
  nagios  6420 20492   0        - ?           0:00 <defunct>
  nagios  6170 20492   0        - ?           0:00 <defunct>
  nagios  6114 20502   0        - ?           0:00 <defunct>
  nagios  6101 20492   0        - ?           0:00 <defunct>
  nagios  6063 20495   0        - ?           0:00 <defunct>
  nagios  6825 20491   0        - ?           0:00 <defunct>
  nagios  6876 20494   0        - ?           0:00 <defunct>
  nagios  6122 20502   0        - ?           0:00 <defunct>
  nagios  6793 20492   0        - ?           0:00 <defunct>
  nagios  6807 20495   0        - ?           0:00 <defunct>
  nagios  6100 20493   0        - ?           0:00 <defunct>
  nagios  6218 20493   0        - ?           0:00 <defunct>
  nagios  6551 20493   0        - ?           0:00 <defunct>
  nagios  5220 20493   0        - ?           0:00 <defunct>
  nagios  6743 20499   0        - ?           0:00 <defunct>
  nagios  5830 20494   0        - ?           0:00 <defunct>
  nagios  5932 20499   0        - ?           0:00 <defunct>
    root 19428 19418   0   Sep 02 ?           0:00 /usr/lib/saf/ttymon
  nagios  6629 20492   0        - ?           0:00 <defunct>
  nagios  6073 20491   0        - ?           0:00 <defunct>
    root 19463 19186   0   Sep 02 ?           0:14 /usr/sfw/sbin/snmpd
    root 19418 19197   0   Sep 02 ?           0:00 /usr/lib/saf/sac -t 300
  nagios  6538 20501   0        - ?           0:00 <defunct>
  nagios  6079 20499   0        - ?           0:00 <defunct>
  nagios  6089 20499   0        - ?           0:00 <defunct>
  nagios  6107 20491   0        - ?           0:00 <defunct>
  nagios  5257 20495   0        - ?           0:00 <defunct>
  nagios  6060 20494   0        - ?           0:00 <defunct>
  nagios  5750 20502   0        - ?           0:00 <defunct>
  nagios  5495 20495   0        - ?           0:00 <defunct>
webservd 29475  7131   0 10:42:26 ?           0:00 /opt/coolstack/apache2/bin/httpd -k start
  nagios  5326 20499   0        - ?           0:00 <defunct>
  nagios  6359 20502   0        - ?           0:00 <defunct>
  nagios  6731 20495   0        - ?           0:00 <defunct>
  nagios  6803 20492   0        - ?           0:00 <defunct>
  nagios  6993 20494   0        - ?           0:00 <defunct>
  nagios  6172 20494   0        - ?           0:00 <defunct>
  nagios  6637 20502   0        - ?           0:00 <defunct>
  nagios  6120 20501   0        - ?           0:00 <defunct>
  nagios  5899 20502   0        - ?           0:00 <defunct>
  nagios  6613 20494   0        - ?           0:00 <defunct>
  nagios  6179 20492   0        - ?           0:00 <defunct>
  nagios  6984 20495   0        - ?           0:00 <defunct>
  nagios  5236 20499   0        - ?           0:00 <defunct>
  nagios  6243 20491   0        - ?           0:00 <defunct>
  nagios  6214 20501   0        - ?           0:00 <defunct>
  nagios  6595 20501   0        - ?           0:00 <defunct>
  nagios  6619 20502   0        - ?           0:00 <defunct>
  nagios  6424 20495   0        - ?           0:00 <defunct>
  nagios  6272 20501   0        - ?           0:00 <defunct>
  nagios  5170 20501   0        - ?           0:00 <defunct>
  nagios  6798 20491   0        - ?           0:00 <defunct>
  nagios  5850 20501   0        - ?           0:00 <defunct>
  nagios 19782 19186   0   Sep 02 ?           0:00 /bin/sh ./APAgent
  nagios  6370 20491   0        - ?           0:00 <defunct>
  nagios  6884 20495   0        - ?           0:00 <defunct>
  nagios  6387 20492   0        - ?           0:00 <defunct>
  nagios  6779  6776   0 10:50:38 ?           0:00 /usr/local/bin/expect -- /usr/local/nagios/libexec/get_alom_data.exp melsmpmedn
  nagios  6095 20495   0        - ?           0:00 <defunct>
  nagios  6088 20494   0        - ?           0:00 <defunct>
  nagios  6590 20492   0        - ?           0:00 <defunct>
  nagios  6939 20502   0        - ?           0:00 <defunct>
  nagios  6810 20491   0        - ?           0:00 <defunct>
  nagios  5867 20502   0        - ?           0:00 <defunct>
  nagios  6405 20494   0        - ?           0:00 <defunct>
  nagios  6341 20499   0        - ?           0:00 <defunct>
  nagios  5952 20499   0        - ?           0:00 <defunct>
  nagios  6123 20491   0        - ?           0:00 <defunct>
  nagios  5178 20493   0        - ?           0:00 <defunct>
  nagios  5908 20499   0        - ?           0:00 <defunct>
  nagios  6549 20501   0        - ?           0:00 <defunct>
  nagios  6286 20492   0        - ?           0:00 <defunct>
  nagios  5385 20491   0        - ?           0:00 <defunct>
  nagios  6232 20499   0        - ?           0:00 <defunct>
  nagios  6390 20501   0        - ?           0:00 <defunct>
  nagios  6542 20494   0        - ?           0:00 <defunct>
webservd  1494  7131   0 10:45:30 ?           0:00 /opt/coolstack/apache2/bin/httpd -k start
  daemon 19423 19186   0   Sep 02 ?           0:00 /usr/lib/nfs/lockd
  nagios 20500 20489   0        - ?           0:00 <defunct>
webservd  3992  7131   0 10:48:30 ?           0:00 /opt/coolstack/apache2/bin/httpd -k start
  nagios  6097 20499   0        - ?           0:00 <defunct>
  nagios  5941 20495   0        - ?           0:00 <defunct>
  nagios  5214 20491   0        - ?           0:00 <defunct>
    root 19417 19186   0   Sep 02 ?           0:00 /usr/lib/inet/inetd start
  nagios  6210 20492   0        - ?           0:00 <defunct>
  nagios  6417 20495   0        - ?           0:00 <defunct>
  nagios  5497 20499   0        - ?           0:00 <defunct>
  nagios  6951 20501   0        - ?           0:00 <defunct>
  nagios  6475 20493   0        - ?           0:00 <defunct>
  nagios  5948 20493   0        - ?           0:00 <defunct>
  nagios  5167 20493   0        - ?           0:00 <defunct>
  nagios  5246 20501   0        - ?           0:00 <defunct>
  nagios  6019 20494   0        - ?           0:00 <defunct>
  nagios  6555 20494   0        - ?           0:00 <defunct>
  nagios  6119 20495   0        - ?           0:00 <defunct>
  nagios  6715 20499   0        - ?           0:00 <defunct>
  nagios 19787 19782   0   Sep 02 ?          35:09 /alarmpoint/apagent/jre/bin/java -Xms64M -Xmx128M -Dapagent.home=/alarmpoint/ap
  nagios  6406 20493   0        - ?           0:00 <defunct>
  nagios  5241 20501   0        - ?           0:00 <defunct>
  nagios  6992 20493   0        - ?           0:00 <defunct>
  nagios  6106 20502   0        - ?           0:00 <defunct>
  nagios  6371 20493   0        - ?           0:00 <defunct>
  nagios  6250 20494   0        - ?           0:00 <defunct>
  nagios  6952 20495   0        - ?           0:00 <defunct>
    root 19432 19186   0   Sep 02 ?          38:51 /usr/sbin/syslogd
  nagios  6886 20499   0        - ?           0:00 <defunct>
  nagios  5847 20494   0        - ?           0:00 <defunct>
  nagios  6237 20495   0        - ?           0:00 <defunct>
  nagios  6871 20502   0        - ?           0:00 <defunct>
    root 19790 19186   0   Sep 02 ?          10:46 /bin/java -Djava.library.path=/alarmpoint/apagent/WUGIntegrator -cp /alarmpoint
  nagios  5877 20499   0        - ?           0:00 <defunct>
  nagios  5174 20495   0        - ?           0:00 <defunct>
  nagios  6924 20493   0        - ?           0:00 <defunct>
  nagios  6174 20499   0        - ?           0:00 <defunct>
    root 19197 19186   0   Sep 02 ?           0:12 /lib/svc/bin/svc.startd
  nagios  6989 20491   0        - ?           0:00 <defunct>
  nagios  5749 20499   0        - ?           0:00 <defunct>
  nagios  5338 20494   0        - ?           0:00 <defunct>
  nagios  6777  6760   0 10:50:38 ?           0:00 sh -c /usr/local/nagios/libexec/get_alom_data.exp melsmpmednbu01-alom > /usr/lo
  nagios  6780 20495   0        - ?           0:00 <defunct>
  nagios  6085 20493   0        - ?           0:00 <defunct>
  nagios  5754 20495   0        - ?           0:00 <defunct>
  nagios  6091 20491   0        - ?           0:00 <defunct>
  nagios  5837 20494   0        - ?           0:00 <defunct>
  nagios  5862 20492   0        - ?           0:00 <defunct>
webservd  3985  7131   0 10:48:29 ?           0:00 /opt/coolstack/apache2/bin/httpd -k start
  nagios  6071 20502   0        - ?           0:00 <defunct>
  nagios  6519 20494   0        - ?           0:00 <defunct>
  nagios  6059 20493   0        - ?           0:00 <defunct>
  nagios  5273 20502   0        - ?           0:00 <defunct>
  nagios 20492 20489  12   Sep 13 ?        3943:09 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
webservd  6833  7131   0 10:50:49 ?           0:00 /opt/coolstack/apache2/bin/httpd -k start
  nagios  5757 20499   0        - ?           0:00 <defunct>
  nagios  5748 20501   0        - ?           0:00 <defunct>
  nagios  5218 20491   0        - ?           0:00 <defunct>
  nagios  6763 20494   0        - ?           0:00 <defunct>
  nagios  6450 20501   0        - ?           0:00 <defunct>
  nagios  6575 20499   0        - ?           0:00 <defunct>
  nagios  6917 20501   0        - ?           0:00 <defunct>
  nagios  6640 20493   0        - ?           0:00 <defunct>
  nagios  6058 20492   0        - ?           0:00 <defunct>
  nagios  6436 20491   0        - ?           0:00 <defunct>
  nagios  5239 20501   0        - ?           0:00 <defunct>
    root 19427 19186   0   Sep 02 ?           0:02 /usr/lib/utmpd
webservd  5604  7131   0 10:49:19 ?           0:00 /opt/coolstack/apache2/bin/httpd -k start
  nagios  5863 20495   0        - ?           0:00 <defunct>
  nagios  5767 20495   0        - ?           0:00 <defunct>
  nagios  6402 20499   0        - ?           0:00 <defunct>
  nagios  6605 20495   0        - ?           0:00 <defunct>
  nagios  6401 20493   0        - ?           0:00 <defunct>
  nagios  6372 20494   0        - ?           0:00 <defunct>
  nagios  6963 20495   0        - ?           0:00 <defunct>
  nagios 20498 20489   0        - ?           0:00 <defunct>
  nagios  6090 20502   0        - ?           0:00 <defunct>
  nagios  6722 20492   0        - ?           0:00 <defunct>
  nagios  6723 20501   0        - ?           0:00 <defunct>
webservd  2745  7131   0 10:46:59 ?           0:00 /opt/coolstack/apache2/bin/httpd -k start
  nagios  6213 20495   0        - ?           0:00 <defunct>
  nagios  6816 20492   0        - ?           0:00 <defunct>
  nagios  6438 20493   0        - ?           0:00 <defunct>
  nagios  6112 20501   0        - ?           0:00 <defunct>
  nagios  6381 20495   0        - ?           0:00 <defunct>
  nagios  6384 20502   0        - ?           0:00 <defunct>
  nagios  6950 20491   0        - ?           0:00 <defunct>
  nagios  5854 20501   0        - ?           0:00 <defunct>
  nagios  6441 20501   0        - ?           0:00 <defunct>
  nagios  6789 20491   0        - ?           0:00 <defunct>
  nagios  6526 20492   0        - ?           0:00 <defunct>
  nagios  5205 20502   0        - ?           0:00 <defunct>
  nagios  5187 20492   0        - ?           0:00 <defunct>
  nagios  6228 20495   0        - ?           0:00 <defunct>
  nagios  6074 20492   0        - ?           0:00 <defunct>
  nagios  6109 20493   0        - ?           0:00 <defunct>
  nagios  6440 20495   0        - ?           0:00 <defunct>
  nagios  5864 20491   0        - ?           0:00 <defunct>
  nagios  5829 20491   0        - ?           0:00 <defunct>
  nagios  6108 20492   0        - ?           0:00 <defunct>
  nagios 20497 20489   0        - ?           0:00 <defunct>
  nagios  6611 20493   0        - ?           0:00 <defunct>
  nagios  6577 20491   0        - ?           0:00 <defunct>
  nagios  6413 20492   0        - ?           0:00 <defunct>
  nagios  5878 20494   0        - ?           0:00 <defunct>
  nagios 20495 20489  12   Sep 13 ?        3941:52 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
  nagios  6216 20492   0        - ?           0:00 <defunct>
  nagios  6531 20499   0        - ?           0:00 <defunct>
  nagios  6982 20491   0        - ?           0:00 <defunct>
  nagios  6105 20501   0        - ?           0:00 <defunct>
  nagios  6572 20493   0        - ?           0:00 <defunct>
  nagios  5327 20502   0        - ?           0:00 <defunct>
  nagios  6694 20491   0        - ?           0:00 <defunct>
  nagios  6178 20493   0        - ?           0:00 <defunct>
  nagios  5944 20499   0        - ?           0:00 <defunct>
  nagios  6960 20493   0        - ?           0:00 <defunct>
  nagios  6375 20499   0        - ?           0:00 <defunct>
  nagios  6885 20502   0        - ?           0:00 <defunct>
  nagios  6879 20501   0        - ?           0:00 <defunct>
  nagios  6192 20501   0        - ?           0:00 <defunct>
  nagios  5538 20492   0        - ?           0:00 <defunct>
  nagios  6976 20495   0        - ?           0:00 <defunct>
  nagios  6717 20501   0        - ?           0:00 <defunct>
  nagios  6083 20491   0        - ?           0:00 <defunct>
  nagios  6126 20493   0        - ?           0:00 <defunct>
  nagios  5311 20492   0        - ?           0:00 <defunct>
  nagios 20496 20489   0        - ?           0:00 <defunct>
  nagios  6616 20501   0        - ?           0:00 <defunct>
  nagios  6096 20501   0        - ?           0:00 <defunct>
  nagios  5954 20491   0        - ?           0:00 <defunct>
  nagios  6762 20493   0        - ?           0:00 <defunct>
  nagios  6098 20502   0        - ?           0:00 <defunct>
webservd  6282  7131   0 10:50:03 ?           0:00 /opt/coolstack/apache2/bin/httpd -k start
  nagios  6598 20494   0        - ?           0:00 <defunct>
  nagios  6180 20494   0        - ?           0:00 <defunct>
  nagios  6769 20499   0        - ?           0:00 <defunct>
  nagios  6713 20495   0        - ?           0:00 <defunct>
  nagios  6945 20495   0        - ?           0:00 <defunct>
  nagios  5841 20501   0        - ?           0:00 <defunct>
  nagios  6906 20492   0        - ?           0:00 <defunct>
  nagios  6729 20492   0        - ?           0:00 <defunct>
  nagios  6111 20495   0        - ?           0:00 <defunct>
  nagios 20515 19186   0   Sep 13 ?           0:16 /usr/local/nagios/bin/nrpe -n -c /usr/local/nagios/etc/nrpe.cfg -d
  nagios  5880 20501   0        - ?           0:00 <defunct>
  nagios  6236 20493   0        - ?           0:00 <defunct>
  nagios  5195 20491   0        - ?           0:00 <defunct>
  nagios  6360 20491   0        - ?           0:00 <defunct>
  nagios  5221 20492   0        - ?           0:00 <defunct>
  nagios  6774 20494   0 10:50:38 ?           0:00 /bin/perl /usr/local/nagios/libexec/psu_check melsmpmednbu02
  nagios  6938 20501   0        - ?           0:00 <defunct>
    root  6995  6994   0 10:51:02 pts/4       0:00 ps -ef
  nagios  6430 20493   0        - ?           0:00 <defunct>
  nagios  6520 20495   0        - ?           0:00 <defunct>
  nagios  6184 20502   0        - ?           0:00 <defunct>
  nagios  6353 20493   0        - ?           0:00 <defunct>
  nagios  6127 20494   0        - ?           0:00 <defunct>
  nagios  6584 20499   0        - ?           0:00 <defunct>
  nagios  6914 20494   0        - ?           0:00 <defunct>
  nagios  6875 20493   0        - ?           0:00 <defunct>
  nagios  6782  6778   0 10:50:38 pts/3       0:00 /bin/ssh admin@melsmpmednbu01-alom
  nagios  5315 20495   0        - ?           0:00 <defunct>
  nagios  6585 20495   0        - ?           0:00 <defunct>
  nagios  6204 20493   0        - ?           0:00 <defunct>
  nagios  5189 20499   0        - ?           0:00 <defunct>
  nagios  5386 20492   0        - ?           0:00 <defunct>
  nagios  5188 20494   0        - ?           0:00 <defunct>
  nagios  5857 20502   0        - ?           0:00 <defunct>
  nagios  6269 20493   0        - ?           0:00 <defunct>
    root 19403 19186   0   Sep 02 ?           0:03 /usr/sbin/cron
    root 19500 19186   0   Sep 02 ?           0:00 /usr/lib/snmp/snmpdx -y -c /etc/snmp/conf
  nagios  6720 20495   0        - ?           0:00 <defunct>
  nagios  6399 20492   0        - ?           0:00 <defunct>
  nagios  6912 20491   0        - ?           0:00 <defunct>
  nagios  6977 20501   0        - ?           0:00 <defunct>
  nagios  6248 20502   0        - ?           0:00 <defunct>
  nagios  6737 20491   0        - ?           0:00 <defunct>
  nagios  6426 20499   0        - ?           0:00 <defunct>
    root  7131 19186   0   Sep 06 ?           2:06 /opt/coolstack/apache2/bin/httpd -k start
  daemon 19385 19186   0   Sep 02 ?           0:04 /usr/lib/nfs/nfsmapid
  nagios  6383 20501   0        - ?           0:00 <defunct>
  nagios  6891 20493   0        - ?           0:00 <defunct>
  nagios  6253 20495   0        - ?           0:00 <defunct>
  nagios  6874 20491   0        - ?           0:00 <defunct>
  nagios  5842 20499   0        - ?           0:00 <defunct>
  nagios  6425 20501   0        - ?           0:00 <defunct>
  nagios  6773 20502   0        - ?           0:00 <defunct>
  nagios  6102 20494   0        - ?           0:00 <defunct>
  nagios  6094 20494   0        - ?           0:00 <defunct>
  nagios  6907 20493   0        - ?           0:00 <defunct>
  nagios  6811 20494   0        - ?           0:00 <defunct>
  nagios  6947 20493   0        - ?           0:00 <defunct>
  nagios  6241 20502   0        - ?           0:00 <defunct>
  nagios  6070 20495   0        - ?           0:00 <defunct>
  nagios  6103 20495   0        - ?           0:00 <defunct>
  nagios  6362 20493   0        - ?           0:00 <defunct>
  nagios  6176 20491   0        - ?           0:00 <defunct>
  nagios  6217 20502   0        - ?           0:00 <defunct>
  nagios  6181 20495   0        - ?           0:00 <defunct>
  nagios  5825 20499   0        - ?           0:00 <defunct>
  nagios  5163 20495   0        - ?           0:00 <defunct>
  nagios  5260 20499   0        - ?           0:00 <defunct>
  nagios  6959 20499   0        - ?           0:00 <defunct>
  nagios  6823 20499   0        - ?           0:00 <defunct>
  nagios  6532 20502   0        - ?           0:00 <defunct>
  nagios  5200 20493   0        - ?           0:00 <defunct>
  nagios  6890 20492   0        - ?           0:00 <defunct>
  nagios  6448 20491   0        - ?           0:00 <defunct>
  nagios  6942 20491   0        - ?           0:00 <defunct>
  nagios  6086 20495   0        - ?           0:00 <defunct>
  nagios  6606 20493   0        - ?           0:00 <defunct>
  nagios  6726 20502   0        - ?           0:00 <defunct>
  nagios  6240 20501   0        - ?           0:00 <defunct>
  nagios 20499 20489  12   Sep 13 ?        3943:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
  nagios  5219 20494   0        - ?           0:00 <defunct>
    root 19434 19197   0   Sep 02 console     0:00 /usr/lib/saf/ttymon -g -d /dev/console -l console -T vt100 -m ldterm,ttcompat -
  nagios 20489 19186   0   Sep 13 ?          59:28 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
  nagios 20491 20489  12   Sep 13 ?        3945:25 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
  nagios  6764 20502   0        - ?           0:00 <defunct>
  nagios  6792 20499   0        - ?           0:00 <defunct>
  nagios  5165 20494   0        - ?           0:00 <defunct>
  nagios  5859 20494   0        - ?           0:00 <defunct>
  nagios  5898 20502   0        - ?           0:00 <defunct>
  nagios  6576 20502   0        - ?           0:00 <defunct>
  nagios  6188 20493   0        - ?           0:00 <defunct>
  nagios  6628 20501   0        - ?           0:00 <defunct>
  nagios  6946 20502   0        - ?           0:00 <defunct>
  nagios  6212 20494   0        - ?           0:00 <defunct>
  nagios  5168 20499   0        - ?           0:00 <defunct>
  nagios  6423 20494   0        - ?           0:00 <defunct>
  nagios  6778  6777   0 10:50:38 ?           0:00 /usr/local/bin/expect -- /usr/local/nagios/libexec/get_alom_data.exp melsmpmedn
  nagios  6183 20499   0        - ?           0:00 <defunct>
   smmsp 19484 19186   0   Sep 02 ?           0:03 /usr/lib/sendmail -Ac -q15m
  nagios  6935 20495   0        - ?           0:00 <defunct>
  nagios  6364 20494   0        - ?           0:00 <defunct>
  nagios  5169 20495   0        - ?           0:00 <defunct>
  nagios  6084 20492   0        - ?           0:00 <defunct>
  nagios  6175 20502   0        - ?           0:00 <defunct>
  nagios  6354 20494   0        - ?           0:00 <defunct>
  nagios  6376 20492   0        - ?           0:00 <defunct>
  nagios  6591 20492   0        - ?           0:00 <defunct>
  nagios  5946 20491   0        - ?           0:00 <defunct>
  nagios  6245 20493   0        - ?           0:00 <defunct>
  nagios  6759 20501   0        - ?           0:00 <defunct>
  nagios  6573 20494   0        - ?           0:00 <defunct>
  nagios  5310 20491   0        - ?           0:00 <defunct>
  nagios  6970 20492   0        - ?           0:00 <defunct>
  nagios  6579 20501   0        - ?           0:00 <defunct>
  nagios  6592 20493   0        - ?           0:00 <defunct>
  nagios  6171 20493   0        - ?           0:00 <defunct>
  nagios  6646 20493   0        - ?           0:00 <defunct>
  nagios 20502 20489  12   Sep 13 ?        3944:47 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
  nagios  5879 20499   0        - ?           0:00 <defunct>
  nagios  6815 20502   0        - ?           0:00 <defunct>
  nagios  6342 20491   0        - ?           0:00 <defunct>
  nagios  6718 20493   0        - ?           0:00 <defunct>
  nagios  5352 20499   0        - ?           0:00 <defunct>
  nagios  6733 20499   0        - ?           0:00 <defunct>
  nagios  5224 20502   0        - ?           0:00 <defunct>
  nagios  6913 20492   0        - ?           0:00 <defunct>
  nagios  6918 20502   0        - ?           0:00 <defunct>
  nagios  6266 20502   0        - ?           0:00 <defunct>
    root  6994  5353   0 10:51:00 pts/4       0:00 bash
webservd 25402  7131   0 10:36:25 ?           0:00 /opt/coolstack/apache2/bin/httpd -k start
  nagios  6209 20491   0        - ?           0:00 <defunct>
  nagios  6173 20495   0        - ?           0:00 <defunct>
  nagios  6985 20502   0        - ?           0:00 <defunct>
  nagios  5194 20499   0        - ?           0:00 <defunct>
  nagios  6337 20502   0        - ?           0:00 <defunct>
  nagios  6410 20501   0        - ?           0:00 <defunct>
  nagios  6586 20502   0        - ?           0:00 <defunct>
  nagios  6651 20499   0        - ?           0:00 <defunct>
  nagios  6075 20493   0        - ?           0:00 <defunct>
  nagios  5752 20492   0        - ?           0:00 <defunct>
  nagios  5384 20502   0        - ?           0:00 <defunct>
  nagios  6988 20502   0        - ?           0:00 <defunct>
  nagios  5244 20493   0        - ?           0:00 <defunct>
  nagios  6980 20492   0        - ?           0:00 <defunct>
  nagios  6124 20492   0        - ?           0:00 <defunct>
  nagios  6755 20493   0        - ?           0:00 <defunct>
  nagios  6880 20492   0        - ?           0:00 <defunct>
  nagios  6812 20493   0        - ?           0:00 <defunct>
  nagios  6953 20502   0        - ?           0:00 <defunct>
  nagios  5758 20502   0        - ?           0:00 <defunct>
  nagios  6776  6774   0 10:50:38 ?           0:00 sh -c /usr/local/nagios/libexec/get_alom_data.exp melsmpmednbu02-alom > /usr/lo
  nagios  6550 20491   0        - ?           0:00 <defunct>
  nagios  6766 20495   0        - ?           0:00 <defunct>
  nagios  6822 20501   0        - ?           0:00 <defunct>
  nagios  6597 20501   0        - ?           0:00 <defunct>
  nagios  6099 20491   0        - ?           0:00 <defunct>
  nagios  6933 20493   0        - ?           0:00 <defunct>
  nagios  6434 20499   0        - ?           0:00 <defunct>
  nagios  5756 20501   0        - ?           0:00 <defunct>
  nagios  5900 20493   0        - ?           0:00 <defunct>
  nagios  6877 20495   0        - ?           0:00 <defunct>
  nagios  5753 20493   0        - ?           0:00 <defunct>
  nagios  6728 20491   0        - ?           0:00 <defunct>
webservd   503  7131   0 10:43:57 ?           0:00 /opt/coolstack/apache2/bin/httpd -k start
  nagios  6642 20495   0        - ?           0:00 <defunct>
  nagios  5836 20493   0        - ?           0:00 <defunct>
  nagios  5335 20493   0        - ?           0:00 <defunct>
  nagios  6740 20501   0        - ?           0:00 <defunct>
  nagios  5902 20492   0        - ?           0:00 <defunct>
  nagios  5171 20495   0        - ?           0:00 <defunct>
  nagios  6621 20502   0        - ?           0:00 <defunct>
  nagios  6813 20499   0        - ?           0:00 <defunct>
  nagios  5514 20501   0        - ?           0:00 <defunct>
  nagios 20503 20489   0   Sep 13 ?           0:13 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
  nagios  6118 20494   0        - ?           0:00 <defunct>
  nagios  6238 20492   0        - ?           0:00 <defunct>
  nagios  5755 20494   0        - ?           0:00 <defunct>
  nagios  5703 20501   0        - ?           0:00 <defunct>
  nagios  6831 20501   0        - ?           0:00 <defunct>
  nagios  6392 20502   0        - ?           0:00 <defunct>
  nagios  6255 20499   0        - ?           0:00 <defunct>
  nagios  5824 20495   0        - ?           0:00 <defunct>
  nagios 20501 20489  12   Sep 13 ?        3941:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
  nagios  6626 20493   0        - ?           0:00 <defunct>
  nagios  5747 20494   0        - ?           0:00 <defunct>
  nagios  6948 20493   0        - ?           0:00 <defunct>
  nagios  6554 20502   0        - ?           0:00 <defunct>
  nagios  5838 20501   0        - ?           0:00 <defunct>
  nagios  6207 20501   0        - ?           0:00 <defunct>
  nagios  6072 20499   0        - ?           0:00 <defunct>
  nagios  6908 20494   0        - ?           0:00 <defunct>
  nagios  6735 20492   0        - ?           0:00 <defunct>
bash-3.2#
Last edited by tmcdonald on Fri Sep 16, 2016 9:36 am, edited 1 time in total.
Reason: Please use [code][/code] tags around long output
sainudani
Posts: 28
Joined: Mon Sep 05, 2016 8:59 pm

Re: Nagios 4.1.1 too many zombie process and 100% cpu usage

Post by sainudani »

Hi ,
Apologies for delayed response.
2027 services
122 hosts
34 host groups
36 commands

I have also updated with 'ps -ef'.

Thanks,
sainudani
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Nagios 4.1.1 too many zombie process and 100% cpu usage

Post by tmcdonald »

It might help to check the nagios.log file. It is most likely located at /usr/local/nagios/var/nagios.log according to what you have posted so far - please run the following as root and post the output:

tail -40 /usr/local/nagios/var/nagios.log
Former Nagios employee
sainudani
Posts: 28
Joined: Mon Sep 05, 2016 8:59 pm

Re: Nagios 4.1.1 too many zombie process and 100% cpu usage

Post by sainudani »

Hi,

Please see the nagios logs-

Code: Select all

root@melsmpnagios01z:/# tail -40 /usr/local/nagios/var/nagios.log
[1474333112] wproc: Core Worker 4335: job 246298 (pid=29027) timed out. Killing it
[1474333112] wproc: Core Worker 4336: job 246299 with pid 29031 reaped at timeout. timeouts=246279; started=246336
[1474333112] wproc: Core Worker 4335: job 246298 with pid 29027 reaped at timeout. timeouts=246282; started=246344
[1474333112] wproc: Core Worker 4326: job 246299 (pid=29033) timed out. Killing it
[1474333112] wproc: Core Worker 4326: job 246299 with pid 29033 reaped at timeout. timeouts=246286; started=246343
[1474333112] wproc: Core Worker 4335: job 246299 (pid=29035) timed out. Killing it
[1474333112] wproc: Core Worker 4335: job 246299 with pid 29035 reaped at timeout. timeouts=246283; started=246345
[1474333112] wproc: Core Worker 4335: job 246300 (pid=29037) timed out. Killing it
[1474333112] wproc: Core Worker 4335: job 246300 with pid 29037 reaped at timeout. timeouts=246284; started=246345
[1474333112] wproc: Core Worker 4336: job 246300 (pid=29039) timed out. Killing it
[1474333112] wproc: Core Worker 4336: job 246300 with pid 29039 reaped at timeout. timeouts=246280; started=246337
[1474333112] wproc: Core Worker 4325: job 246299 (pid=29029) timed out. Killing it
[1474333112] wproc: Core Worker 4325: job 246299 with pid 29029 reaped at timeout. timeouts=246284; started=246344
[1474333112] wproc: Core Worker 4325: job 246300 (pid=29032) timed out. Killing it
[1474333112] wproc: Core Worker 4325: job 246300 with pid 29032 reaped at timeout. timeouts=246285; started=246344
[1474333112] wproc: Core Worker 4325: job 246301 (pid=29040) timed out. Killing it
[1474333112] wproc: Core Worker 4325: job 246301 with pid 29040 reaped at timeout. timeouts=246286; started=246344
[1474333112] wproc: Core Worker 4336: job 246301 (pid=29046) timed out. Killing it
[1474333112] wproc: Core Worker 4336: job 246301 with pid 29046 reaped at timeout. timeouts=246281; started=246337
[1474333112] wproc: Core Worker 4332: job 246298 with pid 29026 reaped at timeout. timeouts=246269; started=246334
[1474333112] wproc: Core Worker 4332: job 246299 (pid=29034) timed out. Killing it
[1474333112] wproc: Core Worker 4332: job 246299 with pid 29034 reaped at timeout. timeouts=246270; started=246334
[1474333112] wproc: Core Worker 4332: job 246300 (pid=29052) timed out. Killing it
[1474333112] wproc: Core Worker 4332: job 246300 with pid 29052 reaped at timeout. timeouts=246271; started=246334
[1474333112] wproc: Core Worker 4332: job 246301 (pid=29053) timed out. Killing it
[1474333112] wproc: Core Worker 4332: job 246301 with pid 29053 reaped at timeout. timeouts=246272; started=246334
[1474333112] wproc: Core Worker 4326: job 246300 (pid=29041) timed out. Killing it
[1474333112] wproc: Core Worker 4326: job 246300 with pid 29041 reaped at timeout. timeouts=246287; started=246344
[1474333112] wproc: Core Worker 4326: job 246301 (pid=29050) timed out. Killing it
[1474333112] wproc: Core Worker 4327: job 246299 (pid=29042) timed out. Killing it
[1474333112] wproc: Core Worker 4326: job 246301 with pid 29050 reaped at timeout. timeouts=246288; started=246344
[1474333112] wproc: Core Worker 4327: job 246299 with pid 29042 reaped at timeout. timeouts=246290; started=246348
[1474333112] wproc: Core Worker 4327: job 246300 (pid=29045) timed out. Killing it
[1474333112] wproc: Core Worker 4327: job 246300 with pid 29045 reaped at timeout. timeouts=246291; started=246348
[1474333112] wproc: Core Worker 4327: job 246301 (pid=29049) timed out. Killing it
[1474333112] wproc: Core Worker 4327: job 246301 with pid 29049 reaped at timeout. timeouts=246292; started=246348
[1474333112] wproc: Core Worker 4327: job 246302 (pid=29054) timed out. Killing it
[1474333112] wproc: Core Worker 4327: job 246302 with pid 29054 reaped at timeout. timeouts=246293; started=246348
[1474333112] wproc: Core Worker 4325: job 246302 (pid=29048) timed out. Killing it
[1474333112] wproc: Core Worker 4325: job 246302 with pid 29048 reaped at timeout. timeouts=246287; started=246345
Last edited by tmcdonald on Thu Sep 22, 2016 2:20 pm, edited 1 time in total.
Reason: Please use [code][/code] tags around long output
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Nagios 4.1.1 too many zombie process and 100% cpu usage

Post by ssax »

I believe you are hitting a bug that was fixed in later versions of Nagios Core/NDOUtils, please try upgrading Nagios Core to 4.2.1 and NDOUtils to 2.1.1 (if you're running it) and let us know if that resolves the problem, you are required to update both of them though so they work properly.

Nagios Core:

https://www.nagios.org/downloads/nagios-core/

https://assets.nagios.com/downloads/nag ... ading.html


NDOUtils: (If you're running it)

https://sourceforge.net/projects/nagios ... ils-2.1.1/

https://github.com/NagiosEnterprises/nd ... ter/README


Let us know if you have any questions.


Thank you
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Nagios 4.1.1 too many zombie process and 100% cpu usage

Post by ssax »

Also, you will need to kill all those defunct processes:

Code: Select all

pkill -u nagios
service nagios restart
If running NDOUtils:

Code: Select all

service ndo2db restart
If running PNP4Nagios:

Code: Select all

service npcd restart
sainudani
Posts: 28
Joined: Mon Sep 05, 2016 8:59 pm

Re: Nagios 4.1.1 too many zombie process and 100% cpu usage

Post by sainudani »

Hi, Upgrading to 4.2.1 didn't help.

Please find below.

Code: Select all

bash-3.2# tail -40 /usr/local/nagios/var/nagios.log
[1474513790] wproc: Core Worker 9748: job 10 (pid=10436) timed out. Killing it
[1474513790] wproc: Core Worker 9748: job 10 with pid 10436 reaped at timeout. timeouts=7; started=14
[1474513790] wproc: Core Worker 9749: job 10 (pid=10437) timed out. Killing it
[1474513790] wproc: Core Worker 9749: job 10 with pid 10437 reaped at timeout. timeouts=7; started=14
[1474513791] wproc: Core Worker 9772: job 6 (pid=10221) timed out. Killing it
[1474513791] wproc: Core Worker 9772: job 6 with pid 10221 reaped at timeout. timeouts=6; started=13
[1474513791] wproc: Core Worker 9773: job 6 (pid=10222) timed out. Killing it
[1474513791] wproc: Core Worker 9773: job 6 with pid 10222 reaped at timeout. timeouts=6; started=13
[1474513791] wproc: Core Worker 9774: job 6 (pid=10223) timed out. Killing it
[1474513791] wproc: Core Worker 9774: job 6 with pid 10223 reaped at timeout. timeouts=6; started=13
[1474513791] wproc: Core Worker 9775: job 6 (pid=10224) timed out. Killing it
[1474513791] wproc: Core Worker 9775: job 6 with pid 10224 reaped at timeout. timeouts=6; started=13
[1474513791] wproc: Core Worker 9776: job 6 (pid=10225) timed out. Killing it
[1474513791] wproc: Core Worker 9776: job 6 with pid 10225 reaped at timeout. timeouts=7; started=13
[1474513791] wproc: Core Worker 9777: job 6 (pid=10226) timed out. Killing it
[1474513791] wproc: Core Worker 9777: job 6 with pid 10226 reaped at timeout. timeouts=7; started=13
[1474513791] wproc: Core Worker 9778: job 6 (pid=10227) timed out. Killing it
[1474513791] wproc: Core Worker 9778: job 6 with pid 10227 reaped at timeout. timeouts=6; started=13
[1474513791] wproc: Core Worker 9779: job 6 (pid=10228) timed out. Killing it
[1474513791] wproc: Core Worker 9779: job 6 with pid 10228 reaped at timeout. timeouts=6; started=13
[1474513791] wproc: Core Worker 9758: job 10 (pid=10479) timed out. Killing it
[1474513791] wproc: Core Worker 9758: job 10 with pid 10479 reaped at timeout. timeouts=7; started=14
[1474513791] wproc: Core Worker 9780: job 6 (pid=10229) timed out. Killing it
[1474513791] wproc: Core Worker 9780: job 6 with pid 10229 reaped at timeout. timeouts=6; started=13
[1474513791] wproc: Core Worker 9781: job 6 (pid=10230) timed out. Killing it
[1474513791] wproc: Core Worker 9781: job 6 with pid 10230 reaped at timeout. timeouts=6; started=13
[1474513791] wproc: Core Worker 9782: job 6 (pid=10231) timed out. Killing it
[1474513791] wproc: Core Worker 9782: job 6 with pid 10231 reaped at timeout. timeouts=6; started=13
[1474513791] wproc: Core Worker 9783: job 6 (pid=10232) timed out. Killing it
[1474513791] wproc: Core Worker 9783: job 6 with pid 10232 reaped at timeout. timeouts=7; started=13
[1474513791] wproc: Core Worker 9762: job 10 (pid=10483) timed out. Killing it
[1474513791] wproc: Core Worker 9762: job 10 with pid 10483 reaped at timeout. timeouts=8; started=14
[1474513792] wproc: Core Worker 9786: job 6 (pid=10235) timed out. Killing it
[1474513792] wproc: Core Worker 9786: job 6 with pid 10235 reaped at timeout. timeouts=7; started=13
[1474513792] wproc: Core Worker 9785: job 6 (pid=10236) timed out. Killing it
[1474513792] wproc: Core Worker 9785: job 6 with pid 10236 reaped at timeout. timeouts=7; started=13
[1474513792] wproc: Core Worker 9787: job 6 (pid=10237) timed out. Killing it
[1474513792] wproc: Core Worker 9787: job 6 with pid 10237 reaped at timeout. timeouts=7; started=13
[1474513792] wproc: Core Worker 9774: job 10 (pid=10496) timed out. Killing it
[1474513792] wproc: Core Worker 9774: job 10 with pid 10496 reaped at timeout. timeouts=7; started=14
bash-3.2# sar 2 6

SunOS melsmnnagios01z 5.10 Generic_150400-30 sun4v    09/22/2016

13:10:20    %usr    %sys    %wio   %idle
13:10:22      80       7       0      13
13:10:24      81       7       0      12
13:10:26      78       7       0      14
13:10:28      79       7       0      15
13:10:30      78       7       0      15
13:10:32      78       6       0      16

Average       79       7       0      14
bash-3.2# ps -ef|grep -i nagios|grep -i defunct|wc -l
     129
bash-3.2# ps -ef|grep -i nagios|grep -i defunct|wc -l
     123
bash-3.2# ps -ef|grep -i nagios|grep -i defunct|wc -l
     136
Last edited by tmcdonald on Thu Sep 22, 2016 2:21 pm, edited 1 time in total.
Reason: Please use [code][/code] tags around long output
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Nagios 4.1.1 too many zombie process and 100% cpu usage

Post by tmcdonald »

I'm not a Solaris guy, but the following one-liner should grab the PID of the nagios process which is consuming the most CPU and run a trace on it to see what it is doing:

PID=`ps aux | grep nagios | sort -nrk 3,3 | head -1 | awk '{print $2}'`; strace -p $PID -s 256

Some of the arguments might be different between CentOS and Solaris, and you might need to install strace, but if this works it should give us a good idea of what is running when it hangs and eats all the clock cycles.
Former Nagios employee
Locked