CPU usage of a single process measured by ncpa

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
toper
Posts: 57
Joined: Tue Jul 31, 2012 7:04 am

CPU usage of a single process measured by ncpa

Post by toper »

Hi

I need to survey a single process's use of cpu, and to put up notifications for this.
Because sometimes we have a java process going wild, and using up almost all of the cpu capacity.
I am using ncpa to monitor it

This is my check command:
/usr/local/nagios/libexec/check_ncpa.py -H 10.117.55.6 -t 'DC-vest' -P 9991 -M 'processes' -q 'name=java_ase' -w 60 -c 100
OK: Process count for processes named java_ase was 1 | 'process_count'=1;60;100; 'cpu'=0.15%;;; 'memory'=12.85%;;; 'memory_vms'=5.98GB;;; 'memory_rss'=1.06GB;;;
Processes Matched
PID: Name: Username: Exe: Memory: CPU
-----------------------------------
23906: java_ase: aseuser: 12.85 % (VMS 5.98 GB, RSS 1.06 GB): 0.15 %

Total Memory: 12.85 % (VMS 5.98 GB, RSS 1.06 GB)
Total CPU: 0.15 %

I can see it should be possible to set this up because of this 'cpu'=0.15%;;;
That indicates that it should be possible to set some limits on CPU, but I can't find any documentation on how to do that?
Peter Calum
dchurch
Posts: 858
Joined: Wed Oct 07, 2020 12:46 pm
Location: Yo mama

Re: CPU usage of a single process measured by ncpa

Post by dchurch »

You can add filters to the check so that it only includes processes that exceed certain memory or cpu usage. For example, I'm applying filters to return the number of processes that match test.exe and have a memory usage of 80% or more:

Code: Select all

./check_ncpa.py -H IP -t '<your token>' -M 'processes' -q 'mem_percent=80,exe=test.exe' -c 1
Then -c 1 so that if it finds even ONE of them exceeding the threshold, it'll result in a critical.

Most checks accept "critical" and "warning" thresholds with lower and upper bounds in the form of [LOWER]:[UPPER]. That is, if critical=1:5, then if it's outside the range of 1-5 inclusive (for example 0 or 7), it'll consider it "critical." When UPPER is empty, it assumed it's Infinity. Likewise, if LOWER is empty, it's assumed to be 0.

Most of the time, you can disable the critical and warning by simply not specifying them. But if you're dealing with a check script that has a default critical or warning threshold if not specified, you can use the value critical=0: to disable the check from going critical.

NCPA API reference
If you didn't get an 8% raise over the course of the pandemic, you took a pay cut.

Discussion of wages is protected speech under the National Labor Relations Act, and no employer can tell you you can't disclose your pay with your fellow employees.
toper
Posts: 57
Joined: Tue Jul 31, 2012 7:04 am

Re: CPU usage of a single process measured by ncpa

Post by toper »

Hi

Sure this can give me an warning/critical if this process exceeds a specific limit.
But then I need to make 3 checks like this

To give me a warning:
/usr/local/nagios/libexec/check_ncpa.py -H xxx.yyy.zzz.www -t 'DC-vest' -P 9991 -M 'processes' -q 'name=java_ase,cpu_percent=0.1' -w 0

And this to give me a critical:
/usr/local/nagios/libexec/check_ncpa.py -H xxx.yyy.zzz.www -t 'DC-vest' -P 9991 -M 'processes' -q 'name=java_ase,cpu_percent=0.1' -c 0

And one like this to follow the specific process:
/usr/local/nagios/libexec/check_ncpa.py -H xxx.yyy.zzz.www -t 'DC-vest' -P 9991 -M 'processes' -q 'name=java_ase' -c 0:1

The reason for the 2 first is that you can't check on the cpu usage, you can only check for the number of processes that exceed the cpu usage limit defined by your filter statement.

And this is not a nice way of doing checks.

The nice way of doing those checks would have been something like this:

/usr/local/nagios/libexec/check_ncpa.py -H xxx.yyy.zzz.www -t 'DC-vest' -P 9991 -M 'processes' -q 'name=java_ase' -cprocammount 0:1 -wcpu 85% -ccpu 95% -wmem 25% -cmem 50%

Then you would have had only one line with seperate alerts for specific topics.

And in a nagiosxi system with 20k checks then it matters if you should add 2 extra checks for each of the 2000 servers. ;)
Peter Calum
dchurch
Posts: 858
Joined: Wed Oct 07, 2020 12:46 pm
Location: Yo mama

Re: CPU usage of a single process measured by ncpa

Post by dchurch »

Something else you can do is write your own plugin for NCPA to execute remotely. Put runaway_process.sh in /usr/local/ncpa/plugins and put a shell script like this in it:

Code: Select all

#!/bin/bash
# vi: et sts=4 sw=4 ts=4

BINARY=java_ase
THRESH_W=70
THRESH_C=90
PCT=$(
ps -p $(echo $(pgrep "$BINARY") | tr ' ' ,) -o %cpu |
perl -mList::Util -e '<>; print List::Util::sum(<>);'
)

if [[ $PCT > $THRESH_C ]]; then
    printf 'CRITICAL: %s taking up %.2f%% CPU\n' "$BINARY" "$PCT"
    exit 2
fi
if [[ $PCT > $THRESH_W ]]; then
    printf 'WARNING: %s taking up %.2f%% CPU\n' "$BINARY" "$PCT"
    exit 1
fi

printf 'OK: %s taking up %.2f%% CPU\n' "$BINARY" "$PCT"
It should show up in the NCPA wizard automatically.
If you didn't get an 8% raise over the course of the pandemic, you took a pay cut.

Discussion of wages is protected speech under the National Labor Relations Act, and no employer can tell you you can't disclose your pay with your fellow employees.
Locked