Page 1 of 1

Getting a lot of total process warning from XI Server

Posted: Thu Mar 07, 2013 6:37 am
by arnab.roy
Hi Guys,

We are getting a lot these warnings

Notification Type: PROBLEM

Service: Total Processes
Host: localhost
Address: 127.0.0.1
State: WARNING
Info:
PROCS WARNING: 269 processes with STATE = RSZDT
Date/Time: 07/03/2013 08:51:07

Load avg on the box is around 1.5 to 2 so not really under stress. This seems to have started happen recently. Nothing is odd with the output of top .

So wondering whats going on.

Re: Getting a lot of total process warning from XI Server

Posted: Thu Mar 07, 2013 10:25 am
by slansing
Hello arnab,

Has your team been making any major system changes to the XI server? Such as hosting another major piece of software on it now?

This could also be caused by an increased amount of checks being done by the Nagios process, as it forks to complete checks as the cron happens for each.

Re: Getting a lot of total process warning from XI Server

Posted: Fri Mar 08, 2013 10:46 am
by arnab.roy
Hi Slan,

No not at all. We are seeing a lot of these in both our xi servers.

Re: Getting a lot of total process warning from XI Server

Posted: Fri Mar 08, 2013 11:15 am
by abrist
Could you run the following command (when total service numbers are high) and post the output in code wraps?

Code: Select all

ps -aef

Re: Getting a lot of total process warning from XI Server

Posted: Tue Mar 19, 2013 12:22 pm
by arnab.roy
Sorry guys this got a bit left out as I got busy with some other stuff

Code: Select all

UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 Mar07 ?        00:00:07 init [3]                       
root         2     1  0 Mar07 ?        00:00:47 [migration/0]
root         3     1  0 Mar07 ?        00:00:20 [ksoftirqd/0]
root         4     1  0 Mar07 ?        00:00:37 [migration/1]
root         5     1  0 Mar07 ?        00:00:59 [ksoftirqd/1]
root         6     1  0 Mar07 ?        00:00:38 [migration/2]
root         7     1  0 Mar07 ?        00:00:01 [ksoftirqd/2]
root         8     1  0 Mar07 ?        00:00:39 [migration/3]
root         9     1  0 Mar07 ?        00:00:00 [ksoftirqd/3]
root        10     1  0 Mar07 ?        00:10:02 [events/0]
root        11     1  0 Mar07 ?        00:00:08 [events/1]
root        12     1  0 Mar07 ?        00:00:01 [events/2]
root        13     1  0 Mar07 ?        00:00:01 [events/3]
root        14     1  0 Mar07 ?        00:00:02 [khelper]
root        87     1  0 Mar07 ?        00:00:00 [kthread]
root        94    87  0 Mar07 ?        00:00:44 [kblockd/0]
root        95    87  0 Mar07 ?        00:00:56 [kblockd/1]
root        96    87  0 Mar07 ?        00:00:30 [kblockd/2]
root        97    87  0 Mar07 ?        00:00:29 [kblockd/3]
root        98    87  0 Mar07 ?        00:00:00 [kacpid]
root       261    87  0 Mar07 ?        00:00:00 [cqueue/0]
root       262    87  0 Mar07 ?        00:00:00 [cqueue/1]
root       263    87  0 Mar07 ?        00:00:00 [cqueue/2]
root       264    87  0 Mar07 ?        00:00:00 [cqueue/3]
root       267    87  0 Mar07 ?        00:00:00 [khubd]
root       269    87  0 Mar07 ?        00:00:00 [kseriod]
root       363    87  0 Mar07 ?        00:00:00 [khungtaskd]
root       364    87  0 Mar07 ?        00:00:00 [pdflush]
root       365    87  0 Mar07 ?        00:37:06 [pdflush]
root       366    87  0 Mar07 ?        00:00:27 [kswapd0]
root       367    87  0 Mar07 ?        00:00:00 [aio/0]
root       368    87  0 Mar07 ?        00:00:00 [aio/1]
root       369    87  0 Mar07 ?        00:00:00 [aio/2]
root       370    87  0 Mar07 ?        00:00:00 [aio/3]
root       576    87  0 Mar07 ?        00:00:00 [kpsmoused]
root       645    87  0 Mar07 ?        00:00:08 [mpt_poll_0]
root       646    87  0 Mar07 ?        00:00:00 [mpt/0]
root       647    87  0 Mar07 ?        00:00:00 [scsi_eh_0]
root       653    87  0 Mar07 ?        00:00:00 [ata/0]
root       654    87  0 Mar07 ?        00:00:00 [ata/1]
root       655    87  0 Mar07 ?        00:00:00 [ata/2]
root       656    87  0 Mar07 ?        00:00:00 [ata/3]
root       657    87  0 Mar07 ?        00:00:00 [ata_aux]
root       668    87  0 Mar07 ?        00:00:00 [kstriped]
root       689    87  0 Mar07 ?        00:00:00 [ksnapd]
root       712    87  0 Mar07 ?        01:39:40 [kjournald]
root       737    87  0 Mar07 ?        00:00:32 [kauditd]
root       770     1  0 Mar07 ?        00:00:00 /sbin/udevd -d
apache    2051  4369  1 14:05 ?        00:01:59 /usr/sbin/httpd
postgres  2231  4297  0 14:05 ?        00:00:02 postgres: nagiosxi nagiosxi 127.
root      2274    87  0 Mar07 ?        00:00:00 [kmpathd/0]
root      2275    87  0 Mar07 ?        00:00:00 [kmpathd/1]
root      2276    87  0 Mar07 ?        00:00:00 [kmpathd/2]
root      2277    87  0 Mar07 ?        00:00:00 [kmpathd/3]
root      2278    87  0 Mar07 ?        00:00:00 [kmpath_handlerd]
root      2345    87  0 Mar07 ?        00:00:00 [kjournald]
root      2827    87  0 Mar07 ?        00:43:30 [vmmemctl]
root      2963     1  0 Mar07 ?        00:01:15 /usr/sbin/vmtoolsd
root      3046    87  0 Mar07 ?        00:00:00 [iscsi_eh]
root      3108    87  0 Mar07 ?        00:00:00 [cnic_wq]
root      3113    87  0 Mar07 ?        00:00:00 [bnx2i_thread/0]
root      3114    87  0 Mar07 ?        00:00:00 [bnx2i_thread/1]
root      3116    87  0 Mar07 ?        00:00:00 [bnx2i_thread/2]
root      3117    87  0 Mar07 ?        00:00:00 [bnx2i_thread/3]
root      3134    87  0 Mar07 ?        00:00:00 [ib_addr]
root      3149    87  0 Mar07 ?        00:00:00 [ib_mcast]
root      3150    87  0 Mar07 ?        00:00:00 [ib_inform]
root      3151    87  0 Mar07 ?        00:00:00 [local_sa]
root      3157    87  0 Mar07 ?        00:00:00 [iw_cm_wq]
root      3163    87  0 Mar07 ?        00:00:00 [ib_cm/0]
root      3165    87  0 Mar07 ?        00:00:00 [ib_cm/1]
root      3166    87  0 Mar07 ?        00:00:00 [ib_cm/2]
root      3167    87  0 Mar07 ?        00:00:00 [ib_cm/3]
root      3173    87  0 Mar07 ?        00:00:00 [rdma_cm]
root      3194     1  0 Mar07 ?        00:00:00 iscsiuio
root      3201     1  0 Mar07 ?        00:00:00 iscsid
root      3202     1  0 Mar07 ?        00:00:00 iscsid
root      3527     1  0 Mar07 ?        00:06:23 auditd
root      3529  3527  0 Mar07 ?        00:01:21 /sbin/audispd
root      3559     1  0 Mar07 ?        00:01:57 syslogd -m 0
root      3562     1  0 Mar07 ?        00:00:00 klogd -x
root      3666     1  0 Mar07 ?        00:00:45 irqbalance
rpc       3697     1  0 Mar07 ?        00:00:00 portmap
root      3734    87  0 Mar07 ?        00:00:00 [rpciod/0]
root      3735    87  0 Mar07 ?        00:00:00 [rpciod/1]
root      3736    87  0 Mar07 ?        00:00:00 [rpciod/2]
root      3737    87  0 Mar07 ?        00:00:00 [rpciod/3]
rpcuser   3746     1  0 Mar07 ?        00:00:00 rpc.statd
root      3783     1  0 Mar07 ?        00:00:00 rpc.idmapd
dbus      3813     1  0 Mar07 ?        00:00:00 dbus-daemon --system
root      3856     1  0 Mar07 ?        00:00:00 pcscd
root      3870     1  0 Mar07 ?        00:00:00 /usr/sbin/acpid
68        3883     1  0 Mar07 ?        00:00:41 hald
root      3884  3883  0 Mar07 ?        00:00:00 hald-runner
68        3892  3884  0 Mar07 ?        00:00:00 hald-addon-acpi: listening on ac
68        3898  3884  0 Mar07 ?        00:00:00 hald-addon-keyboard: listening o
root      3907  3884  0 Mar07 ?        00:01:43 hald-addon-storage: polling /dev
root      3945     1  0 Mar07 ?        00:00:00 /usr/bin/hidd --server
root      3993     1  0 Mar07 ?        00:00:01 automount --pid-file /var/run/au
root      4014     1  0 Mar07 ?        00:12:55 /usr/sbin/snmptrapd -Lsd -On -p
root      4032     1  0 Mar07 ?        00:00:03 /usr/sbin/sshd
root      4050     1  0 Mar07 ?        00:00:50 xinetd -stayalive -pidfile /var/
ntp       4066     1  0 Mar07 ?        00:00:01 ntpd -u ntp:ntp -p /var/run/ntpd
root      4084     1  0 Mar07 ?        00:00:00 /usr/sbin/vsftpd /etc/vsftpd/vsf
root      4125     1  0 Mar07 ?        00:00:00 /bin/sh /usr/bin/mysqld_safe --d
mysql     4207  4125  2 Mar07 ?        06:27:20 /usr/libexec/mysqld --basedir=/u
postgres  4297     1  0 Mar07 ?        00:01:26 /usr/bin/postmaster -p 5432 -D /
root      4326     1  0 Mar07 ?        00:00:02 sendmail: accepting connections
smmsp     4336     1  0 Mar07 ?        00:00:00 sendmail: Queue runner@01:00:00 
root      4350     1  0 Mar07 ?        00:00:00 gpm -m /dev/input/mice -t exps2
postgres  4364  4297  0 Mar07 ?        00:00:00 postgres: logger process       
postgres  4366  4297  0 Mar07 ?        00:00:12 postgres: writer process       
postgres  4367  4297  0 Mar07 ?        00:00:12 postgres: stats buffer process  
postgres  4368  4367  0 Mar07 ?        00:00:08 postgres: stats collector proces
root      4369     1  0 Mar07 ?        00:00:03 /usr/sbin/httpd
root      4382     1  0 Mar07 ?        00:00:18 crond
xfs       4405     1  0 Mar07 ?        00:00:00 xfs -droppriv -daemon
nagios    4413     1  0 Mar07 ?        00:00:59 /usr/local/nagios/bin/npcd -d -f
root      4437     1  0 Mar07 ?        00:00:00 /usr/sbin/atd
avahi     4463     1  0 Mar07 ?        00:00:01 avahi-daemon: running [karma.loc
avahi     4464  4463  0 Mar07 ?        00:00:00 avahi-daemon: chroot helper
ajaxterm  4481     1  0 Mar07 ?        00:00:02 python /usr/share/ajaxterm/ajaxt
nagios    4564     1  0 Mar07 ?        00:00:00 /usr/local/nagios/bin/ndo2db -c
root      4591     1  0 Mar07 ?        00:00:00 /usr/sbin/smartd -q never
root      4602     1  0 Mar07 tty1     00:00:00 /sbin/mingetty tty1
root      4606     1  0 Mar07 tty2     00:00:00 /sbin/mingetty tty2
root      4608     1  0 Mar07 tty3     00:00:00 /sbin/mingetty tty3
root      4610     1  0 Mar07 tty4     00:00:00 /sbin/mingetty tty4
root      4611     1  0 Mar07 tty5     00:00:00 /sbin/mingetty tty5
root      4614     1  0 Mar07 tty6     00:00:00 /sbin/mingetty tty6
root      4627     1  0 Mar07 ?        00:00:01 /usr/bin/python -tt /usr/sbin/yu
root      4629     1  0 Mar07 ?        00:00:01 /usr/libexec/gam_server
apache    8483  4369  1 11:22 ?        00:03:44 /usr/sbin/httpd
root      9792     1  0 17:18 ?        00:00:00 sudo /usr/local/nagios/libexec/c
root      9799  9792  0 17:18 ?        00:00:00 /usr/local/nagios/libexec/check_
root      9800  9799  0 17:18 ?        00:00:00 /usr/bin/ssh 2.0.1.163 /root/RMS
apache   10552  4369  1 11:55 ?        00:03:21 /usr/sbin/httpd
apache   10675  4369  1 11:55 ?        00:03:30 /usr/sbin/httpd
postgres 10824  4297  0 11:55 ?        00:00:04 postgres: nagiosxi nagiosxi 127.
postgres 11024  4297  0 11:55 ?        00:00:04 postgres: nagiosxi nagiosxi 127.
postgres 11347  4297  0 11:23 ?        00:00:04 postgres: nagiosxi nagiosxi 127.
nagios   12133 14553  0 17:18 ?        00:00:00 /usr/local/nagios/bin/nagios -d
root     12134 12133  0 17:18 ?        00:00:00 sudo /usr/local/nagios/libexec/c
root     12135 12134  0 17:18 ?        00:00:00 /usr/local/nagios/libexec/check_
root     12136 12135  0 17:18 ?        00:00:00 /usr/bin/ssh 2.0.1.164 /root/RMS
nagios   13607 14553  0 17:18 ?        00:00:00 /usr/local/nagios/bin/nagios -d
root     13608 13607  0 17:18 ?        00:00:00 sudo /usr/local/nagios/libexec/c
nagios   13609 14553  0 17:18 ?        00:00:00 /usr/local/nagios/bin/nagios -d
root     13610 13609  0 17:18 ?        00:00:00 sudo /usr/local/nagios/libexec/c
nagios   13611 14553  0 17:18 ?        00:00:00 /usr/local/nagios/bin/nagios -d
root     13612 13611  0 17:18 ?        00:00:00 sudo /usr/local/nagios/libexec/c
root     13613 13608  0 17:18 ?        00:00:00 /usr/local/nagios/libexec/check_
root     13614 13610  0 17:18 ?        00:00:00 /usr/local/nagios/libexec/check_
root     13615 13612  0 17:18 ?        00:00:00 /usr/local/nagios/libexec/check_
root     13616 13613  0 17:18 ?        00:00:00 /usr/bin/ssh 2.0.1.163 /root/RMS
root     13617 13615  0 17:18 ?        00:00:00 /usr/bin/ssh 2.0.1.164 /root/RMS
root     13618 13614  0 17:18 ?        00:00:00 /usr/bin/ssh 2.0.1.164 /root/RMS
nagios   14549  4564  0 Mar14 ?        00:02:59 /usr/local/nagios/bin/ndo2db -c
nagios   14550 14549  0 Mar14 ?        00:30:44 /usr/local/nagios/bin/ndo2db -c
nagios   14553     1  1 Mar14 ?        01:56:01 /usr/local/nagios/bin/nagios -d
apache   14631  4369  1 10:08 ?        00:04:56 /usr/sbin/httpd
postgres 14989  4297  0 10:08 ?        00:00:06 postgres: nagiosxi nagiosxi 127.
nagios   15205  4382  0 17:19 ?        00:00:00 crond
nagios   15206  4382  0 17:19 ?        00:00:00 crond
nagios   15209  4382  0 17:19 ?        00:00:00 crond
nagios   15210  4382  0 17:19 ?        00:00:00 crond
nagios   15211  4382  0 17:19 ?        00:00:00 crond
nagios   15215 15206  0 17:19 ?        00:00:00 /bin/sh -c /usr/bin/php -q /usr/
nagios   15217 15209  0 17:19 ?        00:00:00 /bin/sh -c /usr/bin/php -q /usr/
nagios   15218 15210  0 17:19 ?        00:00:00 /bin/sh -c /usr/bin/php -q /usr/
nagios   15219 15211  0 17:19 ?        00:00:00 /bin/sh -c /usr/bin/php -q /usr/
nagios   15220 15205  0 17:19 ?        00:00:00 /bin/sh -c /usr/bin/php -q /usr/
nagios   15221 15220  0 17:19 ?        00:00:00 /usr/bin/php -q /usr/local/nagio
nagios   15223 15215  0 17:19 ?        00:00:00 /usr/bin/php -q /usr/local/nagio
nagios   15224 15218  0 17:19 ?        00:00:00 /usr/bin/php -q /usr/local/nagio
nagios   15234 15217  1 17:19 ?        00:00:00 /usr/bin/php -q /usr/local/nagio
nagios   15235 15219  0 17:19 ?        00:00:00 /usr/bin/php -q /usr/local/nagio
postgres 15237  4297  0 17:19 ?        00:00:00 postgres: nagiosxi nagiosxi 127.
postgres 15257  4297  0 17:19 ?        00:00:00 postgres: nagiosxi nagiosxi 127.
postgres 15272  4297  0 17:19 ?        00:00:00 postgres: nagiosxi nagiosxi 127.
postgres 15301  4297  1 17:19 ?        00:00:00 postgres: nagiosxi nagiosxi 127.
nagios   15680 14553  0 17:19 ?        00:00:00 /usr/local/nagios/bin/nagios -d
root     15681 15680  0 17:19 ?        00:00:00 sudo /usr/local/nagios/libexec/c
root     15686 15681  0 17:19 ?        00:00:00 /usr/local/nagios/libexec/check_
root     15687 15686  0 17:19 ?        00:00:00 /usr/bin/ssh 2.0.1.163 /root/RMS
apache   15715  4369  1 16:05 ?        00:00:56 /usr/sbin/httpd
root     15887  4032  0 17:19 ?        00:00:00 sshd: root@pts/0 
postgres 15979  4297  0 16:05 ?        00:00:01 postgres: nagiosxi nagiosxi 127.
postgres 16428  4297  0 17:19 ?        00:00:00 postgres: nagiosxi nagiosxi 127.
apache   16678  4369  1 15:36 ?        00:01:11 /usr/sbin/httpd
root     16855 15887  0 17:19 pts/0    00:00:00 -bash
nagios   16931 14553  0 17:19 ?        00:00:00 /usr/local/nagios/bin/nagios -d
nagios   16932 16931  0 17:19 ?        00:00:00 /usr/bin/perl /usr/local/nagios/
apache   17098  4369  1 10:08 ?        00:05:01 /usr/sbin/httpd
postgres 17521  4297  0 10:08 ?        00:00:06 postgres: nagiosxi nagiosxi 127.
postgres 17564  4297  0 15:36 ?        00:00:01 postgres: nagiosxi nagiosxi 127.
apache   17744  4369  1 14:01 ?        00:02:05 /usr/sbin/httpd
apache   17816  4369  1 10:50 ?        00:04:15 /usr/sbin/httpd
postgres 18000  4297  0 10:50 ?        00:00:05 postgres: nagiosxi nagiosxi 127.
postgres 18132  4297  0 14:01 ?        00:00:02 postgres: nagiosxi nagiosxi 127.
nagios   19113 14553  0 17:19 ?        00:00:00 /usr/local/nagios/bin/nagios -d
nagios   19114 19113  0 17:19 ?        00:00:00 /usr/local/nagios/libexec/check_
nagios   19186 14553  0 17:19 ?        00:00:00 /usr/local/nagios/bin/nagios -d
root     19187 19186  0 17:19 ?        00:00:00 sudo /usr/local/nagios/libexec/c
root     19199 19187  0 17:19 ?        00:00:00 /usr/local/nagios/libexec/check_
root     19200 19199  0 17:19 ?        00:00:00 /usr/bin/ssh 2.0.1.164 /root/RMS
nagios   19549 16932  0 17:19 ?        00:00:00 sh -c snmpwalk -c 35k1m05 10.1.1
nagios   19550 19549  5 17:19 ?        00:00:00 snmpwalk -c         10.1.10.67 -
nagios   19551 19549  0 17:19 ?        00:00:00 head -1
nagios   19812 15235  0 17:19 ?        00:00:00 sh -c /usr/bin/iostat -c 5 2 | t
nagios   19813 19812  0 17:19 ?        00:00:00 /usr/bin/iostat -c 5 2
nagios   19814 19812  0 17:19 ?        00:00:00 tail --lines=2
nagios   19815 19812  0 17:19 ?        00:00:00 head --lines=1
nagios   19816 19812  0 17:19 ?        00:00:00 awk { print $1,$2,$3,$4,$5,$6 }
nagios   19832  4050  0 17:19 ?        00:00:00 nsca -c /usr/local/nagios/etc/ns
nagios   19849 14553  0 17:19 ?        00:00:00 /usr/local/nagios/bin/nagios -d
nagios   19850 19849  0 17:19 ?        00:00:00 /usr/bin/php /usr/local/nagiosxi
root     19851 16855  0 17:19 pts/0    00:00:00 ps -aef
apache   25380  4369  1 10:27 ?        00:04:32 /usr/sbin/httpd
postgres 25663  4297  0 10:27 ?        00:00:05 postgres: nagiosxi nagiosxi 127.
apache   27992  4369  1 09:22 ?        00:05:35 /usr/sbin/httpd
postgres 28052  4297  0 09:22 ?        00:00:06 postgres: nagiosxi nagiosxi 127.
apache   28616  4369  1 09:05 ?        00:05:41 /usr/sbin/httpd
postgres 28668  4297  0 09:05 ?        00:00:07 postgres: nagiosxi nagiosxi 127.
apache   28675  4369  1 09:05 ?        00:05:47 /usr/sbin/httpd
postgres 28684  4297  0 09:05 ?        00:00:07 postgres: nagiosxi nagiosxi 127.

Re: Getting a lot of total process warning from XI Server

Posted: Tue Mar 19, 2013 12:31 pm
by abrist
None of these processes looks problematic, though the number is on the higher side. Did you add any new checks/hosts or decrease the interval on any checks recently? What was the average number of processes before this issues arose?