We've suddenly developed an issue with check_procs. Nagios is registering a critical number of processes, but checking the system does not show the same number.
We're using the standard npre config file, with the following command:
command[check_total_procs]=/usr/lib/nagios/plugins/check_procs -w 275 -c 300
Our Nagios server says:
Total Processes PROCS CRITICAL: 1994 processes
However, checking the processes on the machine gives a very different answer:
root:~# ps -ef | wc -l
253
Even the npre script gives a different answer:
root:~# /usr/lib/nagios/plugins/check_procs -w 275 -c 300
PROCS OK: 250 processes | procs=250;275;300;0;
The Nagios server proc count goes up by 30-50 procs every notification, but the actual procs on the system are pretty stable.
The problem manifested over the weekend. I have two other similar systems showing the same issue: We're using Ubuntu 16.04 on virtual machines.
Any help would be appreciated.
check_proc alerts don't match system results
-
- Support Tech
- Posts: 3457
- Joined: Mon May 15, 2017 5:00 pm
Re: check_proc alerts don't match system results
Hello, @dcj. So running this command from the command line gives 250, but when Nagios executes the same command it shows 1994?
Can you show me the command and service definitions from the nagios server?/usr/lib/nagios/plugins/check_procs -w 275 -c 300
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: check_proc alerts don't match system results
Hi @npolovenko. Thanks for the quick response. It turns out the server file was accidentally pointed at the wrong machine. When we looked at the proper machine, the process counts were correct. Please close this as user error.