Nagios reporting more services running than actually are
Nagios reporting more services running than actually are
A few of the servers I monitor are reporting hundreds of processes running (some as high as the 700's), but a query on the server (ps -ef | wc -l) shows less than 100. Why or how would Nagios be so far off? I was not able to find any zombie processes running. Is it possible that Nagios is also picking up threads within processes? Is there a way to tell Nagios not to count threads?
Re: Nagios reporting more services running than actually are
Are there many forks on these servers?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: Nagios reporting more services running than actually are
Can you show the actual command that you are running in the command line, along with the output of it?
Note: After you run your check, run also "ps -ef | wc -l" and show the output.
Note: After you run your check, run also "ps -ef | wc -l" and show the output.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Nagios reporting more services running than actually are
Not at all. A random administrator may log in (usually just myself), but some have no one accessing them most of the time (squid, smtp). I ran 'ps -eaFm' to try and view the thread count of the currently running procs but I didn't see a "smoking gun", but I will admit that I'm not 100% sure what I should be looking for in the first place.
Nagios shows 713 procs on this particular server.
Code: Select all
UID PID PPID C SZ RSS PSR STIME TTY TIME CMD
root 1 0 0 2592 760 - Jun12 ? 00:00:02 init [3]
root - - 0 - - 0 Jun12 - 00:00:02 -
root 2 1 0 0 0 - Jun12 ? 00:00:00 [migration/0]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 3 1 0 0 0 - Jun12 ? 00:00:00 [ksoftirqd/0]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 4 1 0 0 0 - Jun12 ? 00:00:14 [events/0]
root - - 0 - - 0 Jun12 - 00:00:14 -
root 5 1 0 0 0 - Jun12 ? 00:00:00 [khelper]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 14 1 0 0 0 - Jun12 ? 00:00:00 [kthread]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 18 14 0 0 0 - Jun12 ? 00:00:00 [kblockd/0]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 19 14 0 0 0 - Jun12 ? 00:00:00 [kacpid]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 179 14 0 0 0 - Jun12 ? 00:00:00 [cqueue/0]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 182 14 0 0 0 - Jun12 ? 00:00:00 [khubd]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 184 14 0 0 0 - Jun12 ? 00:00:00 [kseriod]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 254 14 0 0 0 - Jun12 ? 00:00:00 [khungtaskd]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 255 14 0 0 0 - Jun12 ? 00:00:00 [pdflush]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 256 14 0 0 0 - Jun12 ? 00:00:00 [pdflush]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 257 14 0 0 0 - Jun12 ? 00:00:00 [kswapd0]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 258 14 0 0 0 - Jun12 ? 00:00:00 [aio/0]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 464 14 0 0 0 - Jun12 ? 00:00:00 [kpsmoused]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 495 14 0 0 0 - Jun12 ? 00:00:00 [mpt_poll_0]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 496 14 0 0 0 - Jun12 ? 00:00:00 [mpt/0]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 497 14 0 0 0 - Jun12 ? 00:00:00 [scsi_eh_0]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 498 14 0 0 0 - Jun12 ? 00:00:00 [mpt_poll_1]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 499 14 0 0 0 - Jun12 ? 00:00:00 [mpt/1]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 500 14 0 0 0 - Jun12 ? 00:00:00 [scsi_eh_1]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 503 14 0 0 0 - Jun12 ? 00:00:00 [ata/0]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 504 14 0 0 0 - Jun12 ? 00:00:00 [ata_aux]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 509 14 0 0 0 - Jun12 ? 00:00:00 [kstriped]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 518 14 0 0 0 - Jun12 ? 00:00:00 [ksnapd]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 529 14 0 0 0 - Jun12 ? 00:00:05 [kjournald]
root - - 0 - - 0 Jun12 - 00:00:05 -
root 559 14 0 0 0 - Jun12 ? 00:00:00 [kauditd]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 592 1 0 3516 2340 - Jun12 ? 00:00:00 /sbin/udevd -d
root - - 0 - - 0 Jun12 - 00:00:00 -
root 1794 14 0 0 0 - Jun12 ? 00:00:00 [kmpathd/0]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 1795 14 0 0 0 - Jun12 ? 00:00:00 [kmpath_handlerd]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 1829 14 0 0 0 - Jun12 ? 00:00:00 [kjournald]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 1835 14 0 0 0 - Jun12 ? 00:00:00 [kjournald]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 1841 14 0 0 0 - Jun12 ? 00:00:06 [kjournald]
root - - 0 - - 0 Jun12 - 00:00:06 -
root 1846 14 0 0 0 - Jun12 ? 00:00:00 [kjournald]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 1849 14 0 0 0 - Jun12 ? 00:00:00 [kjournald]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 1918 1 0 16522 1616 - Jun12 ? 00:00:00 /bin/bash /etc/rc.d/rc 3
root - - 0 - - 0 Jun12 - 00:00:00 -
root 2387 1 0 12498 3008 - Jun12 ? 00:02:42 /usr/sbin/vmtoolsd
root - - 0 - - 0 Jun12 - 00:02:42 -
root 2519 14 0 0 0 - Jun12 ? 00:00:00 [iscsi_eh]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 2553 14 0 0 0 - Jun12 ? 00:00:00 [cnic_wq]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 2556 14 0 0 0 - Jun12 ? 00:00:00 [bnx2i_thread/0]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 2569 14 0 0 0 - Jun12 ? 00:00:00 [ib_addr]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 2576 14 0 0 0 - Jun12 ? 00:00:00 [ib_mcast]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 2577 14 0 0 0 - Jun12 ? 00:00:00 [ib_inform]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 2578 14 0 0 0 - Jun12 ? 00:00:00 [local_sa]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 2581 14 0 0 0 - Jun12 ? 00:00:00 [iw_cm_wq]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 2585 14 0 0 0 - Jun12 ? 00:00:00 [ib_cm/0]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 2587 14 0 0 0 - Jun12 ? 00:00:00 [rdma_cm]
root - - 0 - - 0 Jun12 - 00:00:00 -
root 2603 1 0 7173 22544 - Jun12 ? 00:00:00 iscsiuio
root - - 0 - - 0 Jun12 - 00:00:00 -
root - - 0 - - 0 Jun12 - 00:00:00 -
root - - 0 - - 0 Jun12 - 00:00:00 -
root 2607 1 0 1147 508 - Jun12 ? 00:00:00 iscsid
root - - 0 - - 0 Jun12 - 00:00:00 -
root 2609 1 0 1273 3044 - Jun12 ? 00:00:00 iscsid
root - - 0 - - 0 Jun12 - 00:00:00 -
root 2699 1 0 3645 544 - Jun12 ? 00:00:00 mcstransd
root - - 0 - - 0 Jun12 - 00:00:00 -
root 2892 1 0 6319 804 - Jun12 ? 00:00:10 auditd
root - - 0 - - 0 Jun12 - 00:00:05 -
root - - 0 - - 0 Jun12 - 00:00:04 -
root 2894 2892 0 4072 756 - Jun12 ? 00:00:05 /sbin/audispd
root - - 0 - - 0 Jun12 - 00:00:02 -
root - - 0 - - 0 Jun12 - 00:00:03 -
root 2918 1 0 7046 18404 - Jun12 ? 00:00:00 /usr/sbin/restorecond
root - - 0 - - 0 Jun12 - 00:00:00 -
rpc 2965 1 0 2018 604 - Jun12 ? 00:00:00 portmap
rpc - - 0 - - 0 Jun12 - 00:00:00 -
root 3000 14 0 0 0 - Jun12 ? 00:00:00 [rpciod/0]
root - - 0 - - 0 Jun12 - 00:00:00 -
rpcuser 3006 1 0 3595 904 - Jun12 ? 00:00:00 rpc.statd
rpcuser - - 0 - - 0 Jun12 - 00:00:00 -
root 3044 1 0 6219 516 - Jun12 ? 00:00:00 rpc.idmapd
root - - 0 - - 0 Jun12 - 00:00:00 -
dbus 3077 1 0 11295 1568 - Jun12 ? 00:00:00 dbus-daemon --system
dbus - - 0 - - 0 Jun12 - 00:00:00 -
dbus - - 0 - - 0 Jun12 - 00:00:00 -
root 3139 1 0 5266 1400 - Jun12 ? 00:00:00 pcscd
root - - 0 - - 0 Jun12 - 00:00:00 -
root - - 0 - - 0 Jun12 - 00:00:00 -
root 3154 1 0 955 556 - Jun12 ? 00:00:00 /usr/sbin/acpid
root - - 0 - - 0 Jun12 - 00:00:00 -
68 3168 1 0 12931 5716 - Jun12 ? 00:00:03 hald
68 - - 0 - - 0 Jun12 - 00:00:03 -
root 3169 3168 0 5430 1208 - Jun12 ? 00:00:00 hald-runner
root - - 0 - - 0 Jun12 - 00:00:00 -
68 3177 3169 0 3085 916 - Jun12 ? 00:00:00 hald-addon-acpi: listening on acpid socket /var/run/acpid.socket
68 - - 0 - - 0 Jun12 - 00:00:00 -
68 3183 3169 0 3086 904 - Jun12 ? 00:00:00 hald-addon-keyboard: listening on /dev/input/event0
68 - - 0 - - 0 Jun12 - 00:00:00 -
root 3192 3169 0 2562 756 - Jun12 ? 00:07:53 hald-addon-storage: polling /dev/hdc
root - - 0 - - 0 Jun12 - 00:07:53 -
root 3228 1 0 2134 492 - Jun12 ? 00:00:00 /usr/bin/hidd --server
root - - 0 - - 0 Jun12 - 00:00:00 -
root 3300 1 0 15727 2044 - Jun12 ? 00:00:00 automount --pid-file /var/run/autofs.pid
root - - 0 - - 0 Jun12 - 00:00:00 -
root - - 0 - - 0 Jun12 - 00:00:00 -
root - - 0 - - 0 Jun12 - 00:00:00 -
root - - 0 - - 0 Jun12 - 00:00:00 -
root - - 0 - - 0 Jun12 - 00:00:00 -
root 3336 1 0 6587 860 - Jun12 ? 00:00:00 ./hpiod
root - - 0 - - 0 Jun12 - 00:00:00 -
root 3341 1 0 38725 6668 - Jun12 ? 00:00:00 /usr/bin/python ./hpssd.py
root - - 0 - - 0 Jun12 - 00:00:00 -
root 3359 1 0 15671 1208 - Jun12 ? 00:00:00 /usr/sbin/sshd
root - - 0 - - 0 Jun12 - 00:00:00 -
ntp 3380 1 0 9467 9468 - Jun12 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ntp - - 0 - - 0 Jun12 - 00:00:00 -
nagios 3394 1 0 11984 3256 - Jun12 ? 00:00:07 nrpe -c /etc/nagios/nrpe.cfg -d
nagios - - 0 - - 0 Jun12 - 00:00:07 -
root 3418 1 0 17262 2104 - Jun12 ? 00:00:00 sendmail: accepting connections
root - - 0 - - 0 Jun12 - 00:00:00 -
smmsp 3427 1 0 14949 1796 - Jun12 ? 00:00:00 sendmail: Queue runner@01:00:00 for /var/spool/clientmqueue
smmsp - - 0 - - 0 Jun12 - 00:00:00 -
root 3442 1 0 1619 504 - Jun12 ? 00:00:00 gpm -m /dev/input/mice -t exps2
root - - 0 - - 0 Jun12 - 00:00:00 -
root 3446 1918 0 16522 1616 - Jun12 ? 00:00:00 /bin/bash /etc/rc3.d/S85httpd start
root - - 0 - - 0 Jun12 - 00:00:00 -
root 3455 3446 0 2704 1196 - Jun12 ? 00:00:00 /bin/bash -c ulimit -S -c 0 >/dev/null 2>&1 ; /usr/sbin/httpd
root - - 0 - - 0 Jun12 - 00:00:00 -
root 3456 3455 0 51234 5924 - Jun12 ? 00:00:00 /usr/sbin/httpd
root - - 0 - - 0 Jun12 - 00:00:00 -
root 15558 1 0 7649 940 - Jun27 ? 00:00:00 supervising syslog-ng
root - - 0 - - 0 Jun27 - 00:00:00 -
root 15559 15558 0 20864 24916 - Jun27 ? 00:00:19 /opt/syslog-ng/sbin/syslog-ng --no-caps
root - - 0 - - 0 Jun27 - 00:00:19 -
root 17241 3359 0 28285 5588 - 14:53 ? 00:00:00 sshd: USTC_admin [priv]
root - - 0 - - 0 14:53 - 00:00:00 -
528 17244 17241 0 28285 3496 - 14:53 ? 00:00:00 sshd: USTC_admin@pts/0
528 - - 0 - - 0 14:53 - 00:00:00 -
528 17245 17244 0 16522 1632 - 14:53 pts/0 00:00:00 -bash
528 - - 0 - - 0 14:53 - 00:00:00 -
root 17284 17245 0 34634 3412 - 14:58 pts/0 00:00:00 sudo ps -eaFm
root - - 0 - - 0 14:58 - 00:00:00 -
root 17287 17284 0 16404 1028 - 14:58 pts/0 00:00:00 ps -eaFm
root - - 0 - - 0 14:58 - 00:00:00 -
Code: Select all
ps -ef | wc -l
89
Re: Nagios reporting more services running than actually are
Nothing there is out of the ordinary either (186 lines). Hmmm. How about the proc directory itself:
Code: Select all
ls /proc | wc -l
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: Nagios reporting more services running than actually are
Sorry for the late response. It seems our working hours are offset. Here are the results:
ls /proc | wc -l
138
ls /proc | wc -l
138
Re: Nagios reporting more services running than actually are
That is quite common around here as we are in -6/-5 GMT (CDT)lce411 wrote:Sorry for the late response. It seems our working hours are offset.
What is the full command and check for the service check that is reporting 700+ services?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: Nagios reporting more services running than actually are
I think I might have figured out what the problem was but I'm unsure why this was the cause of it. I had to reboot one of the servers that was showing an incorrect Total Procs amount. As I watched the boot process, I saw a number of orphaned inodes being removed. After researching exactly what caused orphaned inodes in the first place, the only I can think of that would cause that is a recent upgrade of syslog-ng. Could a bunch of orphaned inodes cause the problem I was having in Nagios? I rebooted all the machines that were erroring out in Nagios and, at last check, all seem to working fine for now.
Re: Nagios reporting more services running than actually are
Maybe? It all depends if there was a another process trying to read from those old inodes and hanging . . . But that would be a very odd case . . .lce411 wrote:Could a bunch of orphaned inodes cause the problem I was having in Nagios?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.