Page 1 of 1

Nagios reporting more services running than actually are

Posted: Mon Jul 08, 2013 12:02 pm
by lce411
A few of the servers I monitor are reporting hundreds of processes running (some as high as the 700's), but a query on the server (ps -ef | wc -l) shows less than 100. Why or how would Nagios be so far off? I was not able to find any zombie processes running. Is it possible that Nagios is also picking up threads within processes? Is there a way to tell Nagios not to count threads?

Re: Nagios reporting more services running than actually are

Posted: Mon Jul 08, 2013 1:58 pm
by abrist
Are there many forks on these servers?

Re: Nagios reporting more services running than actually are

Posted: Mon Jul 08, 2013 2:02 pm
by lmiltchev
Can you show the actual command that you are running in the command line, along with the output of it?
Note: After you run your check, run also "ps -ef | wc -l" and show the output.

Re: Nagios reporting more services running than actually are

Posted: Mon Jul 08, 2013 2:04 pm
by lce411
Not at all. A random administrator may log in (usually just myself), but some have no one accessing them most of the time (squid, smtp). I ran 'ps -eaFm' to try and view the thread count of the currently running procs but I didn't see a "smoking gun", but I will admit that I'm not 100% sure what I should be looking for in the first place.

Code: Select all

UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
root         1     0  0  2592   760   - Jun12 ?        00:00:02 init [3]
root         -     -  0     -     -   0 Jun12 -        00:00:02 -
root         2     1  0     0     0   - Jun12 ?        00:00:00 [migration/0]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root         3     1  0     0     0   - Jun12 ?        00:00:00 [ksoftirqd/0]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root         4     1  0     0     0   - Jun12 ?        00:00:14 [events/0]
root         -     -  0     -     -   0 Jun12 -        00:00:14 -
root         5     1  0     0     0   - Jun12 ?        00:00:00 [khelper]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root        14     1  0     0     0   - Jun12 ?        00:00:00 [kthread]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root        18    14  0     0     0   - Jun12 ?        00:00:00 [kblockd/0]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root        19    14  0     0     0   - Jun12 ?        00:00:00 [kacpid]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       179    14  0     0     0   - Jun12 ?        00:00:00 [cqueue/0]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       182    14  0     0     0   - Jun12 ?        00:00:00 [khubd]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       184    14  0     0     0   - Jun12 ?        00:00:00 [kseriod]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       254    14  0     0     0   - Jun12 ?        00:00:00 [khungtaskd]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       255    14  0     0     0   - Jun12 ?        00:00:00 [pdflush]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       256    14  0     0     0   - Jun12 ?        00:00:00 [pdflush]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       257    14  0     0     0   - Jun12 ?        00:00:00 [kswapd0]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       258    14  0     0     0   - Jun12 ?        00:00:00 [aio/0]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       464    14  0     0     0   - Jun12 ?        00:00:00 [kpsmoused]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       495    14  0     0     0   - Jun12 ?        00:00:00 [mpt_poll_0]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       496    14  0     0     0   - Jun12 ?        00:00:00 [mpt/0]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       497    14  0     0     0   - Jun12 ?        00:00:00 [scsi_eh_0]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       498    14  0     0     0   - Jun12 ?        00:00:00 [mpt_poll_1]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       499    14  0     0     0   - Jun12 ?        00:00:00 [mpt/1]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       500    14  0     0     0   - Jun12 ?        00:00:00 [scsi_eh_1]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       503    14  0     0     0   - Jun12 ?        00:00:00 [ata/0]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       504    14  0     0     0   - Jun12 ?        00:00:00 [ata_aux]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       509    14  0     0     0   - Jun12 ?        00:00:00 [kstriped]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       518    14  0     0     0   - Jun12 ?        00:00:00 [ksnapd]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       529    14  0     0     0   - Jun12 ?        00:00:05 [kjournald]
root         -     -  0     -     -   0 Jun12 -        00:00:05 -
root       559    14  0     0     0   - Jun12 ?        00:00:00 [kauditd]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       592     1  0  3516  2340   - Jun12 ?        00:00:00 /sbin/udevd -d
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      1794    14  0     0     0   - Jun12 ?        00:00:00 [kmpathd/0]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      1795    14  0     0     0   - Jun12 ?        00:00:00 [kmpath_handlerd]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      1829    14  0     0     0   - Jun12 ?        00:00:00 [kjournald]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      1835    14  0     0     0   - Jun12 ?        00:00:00 [kjournald]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      1841    14  0     0     0   - Jun12 ?        00:00:06 [kjournald]
root         -     -  0     -     -   0 Jun12 -        00:00:06 -
root      1846    14  0     0     0   - Jun12 ?        00:00:00 [kjournald]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      1849    14  0     0     0   - Jun12 ?        00:00:00 [kjournald]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      1918     1  0 16522  1616   - Jun12 ?        00:00:00 /bin/bash /etc/rc.d/rc 3
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      2387     1  0 12498  3008   - Jun12 ?        00:02:42 /usr/sbin/vmtoolsd
root         -     -  0     -     -   0 Jun12 -        00:02:42 -
root      2519    14  0     0     0   - Jun12 ?        00:00:00 [iscsi_eh]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      2553    14  0     0     0   - Jun12 ?        00:00:00 [cnic_wq]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      2556    14  0     0     0   - Jun12 ?        00:00:00 [bnx2i_thread/0]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      2569    14  0     0     0   - Jun12 ?        00:00:00 [ib_addr]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      2576    14  0     0     0   - Jun12 ?        00:00:00 [ib_mcast]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      2577    14  0     0     0   - Jun12 ?        00:00:00 [ib_inform]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      2578    14  0     0     0   - Jun12 ?        00:00:00 [local_sa]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      2581    14  0     0     0   - Jun12 ?        00:00:00 [iw_cm_wq]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      2585    14  0     0     0   - Jun12 ?        00:00:00 [ib_cm/0]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      2587    14  0     0     0   - Jun12 ?        00:00:00 [rdma_cm]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      2603     1  0  7173 22544   - Jun12 ?        00:00:00 iscsiuio
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      2607     1  0  1147   508   - Jun12 ?        00:00:00 iscsid
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      2609     1  0  1273  3044   - Jun12 ?        00:00:00 iscsid
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      2699     1  0  3645   544   - Jun12 ?        00:00:00 mcstransd
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      2892     1  0  6319   804   - Jun12 ?        00:00:10 auditd
root         -     -  0     -     -   0 Jun12 -        00:00:05 -
root         -     -  0     -     -   0 Jun12 -        00:00:04 -
root      2894  2892  0  4072   756   - Jun12 ?        00:00:05 /sbin/audispd
root         -     -  0     -     -   0 Jun12 -        00:00:02 -
root         -     -  0     -     -   0 Jun12 -        00:00:03 -
root      2918     1  0  7046 18404   - Jun12 ?        00:00:00 /usr/sbin/restorecond
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
rpc       2965     1  0  2018   604   - Jun12 ?        00:00:00 portmap
rpc          -     -  0     -     -   0 Jun12 -        00:00:00 -
root      3000    14  0     0     0   - Jun12 ?        00:00:00 [rpciod/0]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
rpcuser   3006     1  0  3595   904   - Jun12 ?        00:00:00 rpc.statd
rpcuser      -     -  0     -     -   0 Jun12 -        00:00:00 -
root      3044     1  0  6219   516   - Jun12 ?        00:00:00 rpc.idmapd
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
dbus      3077     1  0 11295  1568   - Jun12 ?        00:00:00 dbus-daemon --system
dbus         -     -  0     -     -   0 Jun12 -        00:00:00 -
dbus         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      3139     1  0  5266  1400   - Jun12 ?        00:00:00 pcscd
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      3154     1  0   955   556   - Jun12 ?        00:00:00 /usr/sbin/acpid
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
68        3168     1  0 12931  5716   - Jun12 ?        00:00:03 hald
68           -     -  0     -     -   0 Jun12 -        00:00:03 -
root      3169  3168  0  5430  1208   - Jun12 ?        00:00:00 hald-runner
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
68        3177  3169  0  3085   916   - Jun12 ?        00:00:00 hald-addon-acpi: listening on acpid socket /var/run/acpid.socket
68           -     -  0     -     -   0 Jun12 -        00:00:00 -
68        3183  3169  0  3086   904   - Jun12 ?        00:00:00 hald-addon-keyboard: listening on /dev/input/event0
68           -     -  0     -     -   0 Jun12 -        00:00:00 -
root      3192  3169  0  2562   756   - Jun12 ?        00:07:53 hald-addon-storage: polling /dev/hdc
root         -     -  0     -     -   0 Jun12 -        00:07:53 -
root      3228     1  0  2134   492   - Jun12 ?        00:00:00 /usr/bin/hidd --server
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      3300     1  0 15727  2044   - Jun12 ?        00:00:00 automount --pid-file /var/run/autofs.pid
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      3336     1  0  6587   860   - Jun12 ?        00:00:00 ./hpiod
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      3341     1  0 38725  6668   - Jun12 ?        00:00:00 /usr/bin/python ./hpssd.py
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      3359     1  0 15671  1208   - Jun12 ?        00:00:00 /usr/sbin/sshd
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
ntp       3380     1  0  9467  9468   - Jun12 ?        00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ntp          -     -  0     -     -   0 Jun12 -        00:00:00 -
nagios    3394     1  0 11984  3256   - Jun12 ?        00:00:07 nrpe -c /etc/nagios/nrpe.cfg -d
nagios       -     -  0     -     -   0 Jun12 -        00:00:07 -
root      3418     1  0 17262  2104   - Jun12 ?        00:00:00 sendmail: accepting connections
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
smmsp     3427     1  0 14949  1796   - Jun12 ?        00:00:00 sendmail: Queue runner@01:00:00 for /var/spool/clientmqueue
smmsp        -     -  0     -     -   0 Jun12 -        00:00:00 -
root      3442     1  0  1619   504   - Jun12 ?        00:00:00 gpm -m /dev/input/mice -t exps2
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      3446  1918  0 16522  1616   - Jun12 ?        00:00:00 /bin/bash /etc/rc3.d/S85httpd start
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      3455  3446  0  2704  1196   - Jun12 ?        00:00:00 /bin/bash -c ulimit -S -c 0 >/dev/null 2>&1 ; /usr/sbin/httpd
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      3456  3455  0 51234  5924   - Jun12 ?        00:00:00 /usr/sbin/httpd
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root     15558     1  0  7649   940   - Jun27 ?        00:00:00 supervising syslog-ng
root         -     -  0     -     -   0 Jun27 -        00:00:00 -
root     15559 15558  0 20864 24916   - Jun27 ?        00:00:19 /opt/syslog-ng/sbin/syslog-ng --no-caps
root         -     -  0     -     -   0 Jun27 -        00:00:19 -
root     17241  3359  0 28285  5588   - 14:53 ?        00:00:00 sshd: USTC_admin [priv]
root         -     -  0     -     -   0 14:53 -        00:00:00 -
528      17244 17241  0 28285  3496   - 14:53 ?        00:00:00 sshd: USTC_admin@pts/0
528          -     -  0     -     -   0 14:53 -        00:00:00 -
528      17245 17244  0 16522  1632   - 14:53 pts/0    00:00:00 -bash
528          -     -  0     -     -   0 14:53 -        00:00:00 -
root     17284 17245  0 34634  3412   - 14:58 pts/0    00:00:00 sudo ps -eaFm
root         -     -  0     -     -   0 14:58 -        00:00:00 -
root     17287 17284  0 16404  1028   - 14:58 pts/0    00:00:00 ps -eaFm
root         -     -  0     -     -   0 14:58 -        00:00:00 -

Code: Select all

ps -ef | wc -l
89
Nagios shows 713 procs on this particular server.

Re: Nagios reporting more services running than actually are

Posted: Mon Jul 08, 2013 2:41 pm
by abrist
Nothing there is out of the ordinary either (186 lines). Hmmm. How about the proc directory itself:

Code: Select all

ls /proc | wc -l

Re: Nagios reporting more services running than actually are

Posted: Tue Jul 09, 2013 9:28 am
by lce411
Sorry for the late response. It seems our working hours are offset. Here are the results:
ls /proc | wc -l
138

Re: Nagios reporting more services running than actually are

Posted: Tue Jul 09, 2013 12:25 pm
by abrist
lce411 wrote:Sorry for the late response. It seems our working hours are offset.
That is quite common around here as we are in -6/-5 GMT (CDT)
What is the full command and check for the service check that is reporting 700+ services?

Re: Nagios reporting more services running than actually are

Posted: Tue Jul 09, 2013 1:52 pm
by lce411
I think I might have figured out what the problem was but I'm unsure why this was the cause of it. I had to reboot one of the servers that was showing an incorrect Total Procs amount. As I watched the boot process, I saw a number of orphaned inodes being removed. After researching exactly what caused orphaned inodes in the first place, the only I can think of that would cause that is a recent upgrade of syslog-ng. Could a bunch of orphaned inodes cause the problem I was having in Nagios? I rebooted all the machines that were erroring out in Nagios and, at last check, all seem to working fine for now.

Re: Nagios reporting more services running than actually are

Posted: Tue Jul 09, 2013 2:45 pm
by abrist
lce411 wrote:Could a bunch of orphaned inodes cause the problem I was having in Nagios?
Maybe? It all depends if there was a another process trying to read from those old inodes and hanging . . . But that would be a very odd case . . .