Page 1 of 1

check_proc alerts don't match system results

Posted: Wed Mar 27, 2019 2:11 pm
by dcj
We've suddenly developed an issue with check_procs. Nagios is registering a critical number of processes, but checking the system does not show the same number.

We're using the standard npre config file, with the following command:
command[check_total_procs]=/usr/lib/nagios/plugins/check_procs -w 275 -c 300

Our Nagios server says:
Total Processes PROCS CRITICAL: 1994 processes

However, checking the processes on the machine gives a very different answer:
root:~# ps -ef | wc -l
253

Even the npre script gives a different answer:
root:~# /usr/lib/nagios/plugins/check_procs -w 275 -c 300
PROCS OK: 250 processes | procs=250;275;300;0;

The Nagios server proc count goes up by 30-50 procs every notification, but the actual procs on the system are pretty stable.

The problem manifested over the weekend. I have two other similar systems showing the same issue: We're using Ubuntu 16.04 on virtual machines.

Any help would be appreciated.

Re: check_proc alerts don't match system results

Posted: Wed Mar 27, 2019 2:48 pm
by npolovenko
Hello, @dcj. So running this command from the command line gives 250, but when Nagios executes the same command it shows 1994?
/usr/lib/nagios/plugins/check_procs -w 275 -c 300
Can you show me the command and service definitions from the nagios server?

Re: check_proc alerts don't match system results

Posted: Wed Mar 27, 2019 3:32 pm
by dcj
Hi @npolovenko. Thanks for the quick response. It turns out the server file was accidentally pointed at the wrong machine. When we looked at the proper machine, the process counts were correct. Please close this as user error.