Nagios reporting more services running than actually are

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
lce411
Posts: 41
Joined: Thu Jun 07, 2012 1:28 pm

Nagios reporting more services running than actually are

Post by lce411 »

A few of the servers I monitor are reporting hundreds of processes running (some as high as the 700's), but a query on the server (ps -ef | wc -l) shows less than 100. Why or how would Nagios be so far off? I was not able to find any zombie processes running. Is it possible that Nagios is also picking up threads within processes? Is there a way to tell Nagios not to count threads?
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Nagios reporting more services running than actually are

Post by abrist »

Are there many forks on these servers?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
lmiltchev
Former Nagios Staff
Posts: 13587
Joined: Mon May 23, 2011 12:15 pm

Re: Nagios reporting more services running than actually are

Post by lmiltchev »

Can you show the actual command that you are running in the command line, along with the output of it?
Note: After you run your check, run also "ps -ef | wc -l" and show the output.
Be sure to check out our Knowledgebase for helpful articles and solutions!
lce411
Posts: 41
Joined: Thu Jun 07, 2012 1:28 pm

Re: Nagios reporting more services running than actually are

Post by lce411 »

Not at all. A random administrator may log in (usually just myself), but some have no one accessing them most of the time (squid, smtp). I ran 'ps -eaFm' to try and view the thread count of the currently running procs but I didn't see a "smoking gun", but I will admit that I'm not 100% sure what I should be looking for in the first place.

Code: Select all

UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
root         1     0  0  2592   760   - Jun12 ?        00:00:02 init [3]
root         -     -  0     -     -   0 Jun12 -        00:00:02 -
root         2     1  0     0     0   - Jun12 ?        00:00:00 [migration/0]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root         3     1  0     0     0   - Jun12 ?        00:00:00 [ksoftirqd/0]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root         4     1  0     0     0   - Jun12 ?        00:00:14 [events/0]
root         -     -  0     -     -   0 Jun12 -        00:00:14 -
root         5     1  0     0     0   - Jun12 ?        00:00:00 [khelper]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root        14     1  0     0     0   - Jun12 ?        00:00:00 [kthread]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root        18    14  0     0     0   - Jun12 ?        00:00:00 [kblockd/0]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root        19    14  0     0     0   - Jun12 ?        00:00:00 [kacpid]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       179    14  0     0     0   - Jun12 ?        00:00:00 [cqueue/0]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       182    14  0     0     0   - Jun12 ?        00:00:00 [khubd]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       184    14  0     0     0   - Jun12 ?        00:00:00 [kseriod]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       254    14  0     0     0   - Jun12 ?        00:00:00 [khungtaskd]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       255    14  0     0     0   - Jun12 ?        00:00:00 [pdflush]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       256    14  0     0     0   - Jun12 ?        00:00:00 [pdflush]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       257    14  0     0     0   - Jun12 ?        00:00:00 [kswapd0]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       258    14  0     0     0   - Jun12 ?        00:00:00 [aio/0]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       464    14  0     0     0   - Jun12 ?        00:00:00 [kpsmoused]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       495    14  0     0     0   - Jun12 ?        00:00:00 [mpt_poll_0]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       496    14  0     0     0   - Jun12 ?        00:00:00 [mpt/0]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       497    14  0     0     0   - Jun12 ?        00:00:00 [scsi_eh_0]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       498    14  0     0     0   - Jun12 ?        00:00:00 [mpt_poll_1]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       499    14  0     0     0   - Jun12 ?        00:00:00 [mpt/1]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       500    14  0     0     0   - Jun12 ?        00:00:00 [scsi_eh_1]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       503    14  0     0     0   - Jun12 ?        00:00:00 [ata/0]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       504    14  0     0     0   - Jun12 ?        00:00:00 [ata_aux]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       509    14  0     0     0   - Jun12 ?        00:00:00 [kstriped]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       518    14  0     0     0   - Jun12 ?        00:00:00 [ksnapd]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       529    14  0     0     0   - Jun12 ?        00:00:05 [kjournald]
root         -     -  0     -     -   0 Jun12 -        00:00:05 -
root       559    14  0     0     0   - Jun12 ?        00:00:00 [kauditd]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root       592     1  0  3516  2340   - Jun12 ?        00:00:00 /sbin/udevd -d
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      1794    14  0     0     0   - Jun12 ?        00:00:00 [kmpathd/0]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      1795    14  0     0     0   - Jun12 ?        00:00:00 [kmpath_handlerd]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      1829    14  0     0     0   - Jun12 ?        00:00:00 [kjournald]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      1835    14  0     0     0   - Jun12 ?        00:00:00 [kjournald]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      1841    14  0     0     0   - Jun12 ?        00:00:06 [kjournald]
root         -     -  0     -     -   0 Jun12 -        00:00:06 -
root      1846    14  0     0     0   - Jun12 ?        00:00:00 [kjournald]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      1849    14  0     0     0   - Jun12 ?        00:00:00 [kjournald]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      1918     1  0 16522  1616   - Jun12 ?        00:00:00 /bin/bash /etc/rc.d/rc 3
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      2387     1  0 12498  3008   - Jun12 ?        00:02:42 /usr/sbin/vmtoolsd
root         -     -  0     -     -   0 Jun12 -        00:02:42 -
root      2519    14  0     0     0   - Jun12 ?        00:00:00 [iscsi_eh]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      2553    14  0     0     0   - Jun12 ?        00:00:00 [cnic_wq]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      2556    14  0     0     0   - Jun12 ?        00:00:00 [bnx2i_thread/0]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      2569    14  0     0     0   - Jun12 ?        00:00:00 [ib_addr]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      2576    14  0     0     0   - Jun12 ?        00:00:00 [ib_mcast]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      2577    14  0     0     0   - Jun12 ?        00:00:00 [ib_inform]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      2578    14  0     0     0   - Jun12 ?        00:00:00 [local_sa]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      2581    14  0     0     0   - Jun12 ?        00:00:00 [iw_cm_wq]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      2585    14  0     0     0   - Jun12 ?        00:00:00 [ib_cm/0]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      2587    14  0     0     0   - Jun12 ?        00:00:00 [rdma_cm]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      2603     1  0  7173 22544   - Jun12 ?        00:00:00 iscsiuio
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      2607     1  0  1147   508   - Jun12 ?        00:00:00 iscsid
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      2609     1  0  1273  3044   - Jun12 ?        00:00:00 iscsid
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      2699     1  0  3645   544   - Jun12 ?        00:00:00 mcstransd
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      2892     1  0  6319   804   - Jun12 ?        00:00:10 auditd
root         -     -  0     -     -   0 Jun12 -        00:00:05 -
root         -     -  0     -     -   0 Jun12 -        00:00:04 -
root      2894  2892  0  4072   756   - Jun12 ?        00:00:05 /sbin/audispd
root         -     -  0     -     -   0 Jun12 -        00:00:02 -
root         -     -  0     -     -   0 Jun12 -        00:00:03 -
root      2918     1  0  7046 18404   - Jun12 ?        00:00:00 /usr/sbin/restorecond
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
rpc       2965     1  0  2018   604   - Jun12 ?        00:00:00 portmap
rpc          -     -  0     -     -   0 Jun12 -        00:00:00 -
root      3000    14  0     0     0   - Jun12 ?        00:00:00 [rpciod/0]
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
rpcuser   3006     1  0  3595   904   - Jun12 ?        00:00:00 rpc.statd
rpcuser      -     -  0     -     -   0 Jun12 -        00:00:00 -
root      3044     1  0  6219   516   - Jun12 ?        00:00:00 rpc.idmapd
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
dbus      3077     1  0 11295  1568   - Jun12 ?        00:00:00 dbus-daemon --system
dbus         -     -  0     -     -   0 Jun12 -        00:00:00 -
dbus         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      3139     1  0  5266  1400   - Jun12 ?        00:00:00 pcscd
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      3154     1  0   955   556   - Jun12 ?        00:00:00 /usr/sbin/acpid
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
68        3168     1  0 12931  5716   - Jun12 ?        00:00:03 hald
68           -     -  0     -     -   0 Jun12 -        00:00:03 -
root      3169  3168  0  5430  1208   - Jun12 ?        00:00:00 hald-runner
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
68        3177  3169  0  3085   916   - Jun12 ?        00:00:00 hald-addon-acpi: listening on acpid socket /var/run/acpid.socket
68           -     -  0     -     -   0 Jun12 -        00:00:00 -
68        3183  3169  0  3086   904   - Jun12 ?        00:00:00 hald-addon-keyboard: listening on /dev/input/event0
68           -     -  0     -     -   0 Jun12 -        00:00:00 -
root      3192  3169  0  2562   756   - Jun12 ?        00:07:53 hald-addon-storage: polling /dev/hdc
root         -     -  0     -     -   0 Jun12 -        00:07:53 -
root      3228     1  0  2134   492   - Jun12 ?        00:00:00 /usr/bin/hidd --server
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      3300     1  0 15727  2044   - Jun12 ?        00:00:00 automount --pid-file /var/run/autofs.pid
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      3336     1  0  6587   860   - Jun12 ?        00:00:00 ./hpiod
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      3341     1  0 38725  6668   - Jun12 ?        00:00:00 /usr/bin/python ./hpssd.py
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      3359     1  0 15671  1208   - Jun12 ?        00:00:00 /usr/sbin/sshd
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
ntp       3380     1  0  9467  9468   - Jun12 ?        00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ntp          -     -  0     -     -   0 Jun12 -        00:00:00 -
nagios    3394     1  0 11984  3256   - Jun12 ?        00:00:07 nrpe -c /etc/nagios/nrpe.cfg -d
nagios       -     -  0     -     -   0 Jun12 -        00:00:07 -
root      3418     1  0 17262  2104   - Jun12 ?        00:00:00 sendmail: accepting connections
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
smmsp     3427     1  0 14949  1796   - Jun12 ?        00:00:00 sendmail: Queue runner@01:00:00 for /var/spool/clientmqueue
smmsp        -     -  0     -     -   0 Jun12 -        00:00:00 -
root      3442     1  0  1619   504   - Jun12 ?        00:00:00 gpm -m /dev/input/mice -t exps2
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      3446  1918  0 16522  1616   - Jun12 ?        00:00:00 /bin/bash /etc/rc3.d/S85httpd start
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      3455  3446  0  2704  1196   - Jun12 ?        00:00:00 /bin/bash -c ulimit -S -c 0 >/dev/null 2>&1 ; /usr/sbin/httpd
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root      3456  3455  0 51234  5924   - Jun12 ?        00:00:00 /usr/sbin/httpd
root         -     -  0     -     -   0 Jun12 -        00:00:00 -
root     15558     1  0  7649   940   - Jun27 ?        00:00:00 supervising syslog-ng
root         -     -  0     -     -   0 Jun27 -        00:00:00 -
root     15559 15558  0 20864 24916   - Jun27 ?        00:00:19 /opt/syslog-ng/sbin/syslog-ng --no-caps
root         -     -  0     -     -   0 Jun27 -        00:00:19 -
root     17241  3359  0 28285  5588   - 14:53 ?        00:00:00 sshd: USTC_admin [priv]
root         -     -  0     -     -   0 14:53 -        00:00:00 -
528      17244 17241  0 28285  3496   - 14:53 ?        00:00:00 sshd: USTC_admin@pts/0
528          -     -  0     -     -   0 14:53 -        00:00:00 -
528      17245 17244  0 16522  1632   - 14:53 pts/0    00:00:00 -bash
528          -     -  0     -     -   0 14:53 -        00:00:00 -
root     17284 17245  0 34634  3412   - 14:58 pts/0    00:00:00 sudo ps -eaFm
root         -     -  0     -     -   0 14:58 -        00:00:00 -
root     17287 17284  0 16404  1028   - 14:58 pts/0    00:00:00 ps -eaFm
root         -     -  0     -     -   0 14:58 -        00:00:00 -

Code: Select all

ps -ef | wc -l
89
Nagios shows 713 procs on this particular server.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Nagios reporting more services running than actually are

Post by abrist »

Nothing there is out of the ordinary either (186 lines). Hmmm. How about the proc directory itself:

Code: Select all

ls /proc | wc -l
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
lce411
Posts: 41
Joined: Thu Jun 07, 2012 1:28 pm

Re: Nagios reporting more services running than actually are

Post by lce411 »

Sorry for the late response. It seems our working hours are offset. Here are the results:
ls /proc | wc -l
138
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Nagios reporting more services running than actually are

Post by abrist »

lce411 wrote:Sorry for the late response. It seems our working hours are offset.
That is quite common around here as we are in -6/-5 GMT (CDT)
What is the full command and check for the service check that is reporting 700+ services?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
lce411
Posts: 41
Joined: Thu Jun 07, 2012 1:28 pm

Re: Nagios reporting more services running than actually are

Post by lce411 »

I think I might have figured out what the problem was but I'm unsure why this was the cause of it. I had to reboot one of the servers that was showing an incorrect Total Procs amount. As I watched the boot process, I saw a number of orphaned inodes being removed. After researching exactly what caused orphaned inodes in the first place, the only I can think of that would cause that is a recent upgrade of syslog-ng. Could a bunch of orphaned inodes cause the problem I was having in Nagios? I rebooted all the machines that were erroring out in Nagios and, at last check, all seem to working fine for now.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Nagios reporting more services running than actually are

Post by abrist »

lce411 wrote:Could a bunch of orphaned inodes cause the problem I was having in Nagios?
Maybe? It all depends if there was a another process trying to read from those old inodes and hanging . . . But that would be a very odd case . . .
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Locked