check_proc alerts don't match system results
Posted: Wed Mar 27, 2019 2:11 pm
We've suddenly developed an issue with check_procs. Nagios is registering a critical number of processes, but checking the system does not show the same number.
We're using the standard npre config file, with the following command:
command[check_total_procs]=/usr/lib/nagios/plugins/check_procs -w 275 -c 300
Our Nagios server says:
Total Processes PROCS CRITICAL: 1994 processes
However, checking the processes on the machine gives a very different answer:
root:~# ps -ef | wc -l
253
Even the npre script gives a different answer:
root:~# /usr/lib/nagios/plugins/check_procs -w 275 -c 300
PROCS OK: 250 processes | procs=250;275;300;0;
The Nagios server proc count goes up by 30-50 procs every notification, but the actual procs on the system are pretty stable.
The problem manifested over the weekend. I have two other similar systems showing the same issue: We're using Ubuntu 16.04 on virtual machines.
Any help would be appreciated.
We're using the standard npre config file, with the following command:
command[check_total_procs]=/usr/lib/nagios/plugins/check_procs -w 275 -c 300
Our Nagios server says:
Total Processes PROCS CRITICAL: 1994 processes
However, checking the processes on the machine gives a very different answer:
root:~# ps -ef | wc -l
253
Even the npre script gives a different answer:
root:~# /usr/lib/nagios/plugins/check_procs -w 275 -c 300
PROCS OK: 250 processes | procs=250;275;300;0;
The Nagios server proc count goes up by 30-50 procs every notification, but the actual procs on the system are pretty stable.
The problem manifested over the weekend. I have two other similar systems showing the same issue: We're using Ubuntu 16.04 on virtual machines.
Any help would be appreciated.