check_ncpa get wrong alert from CPUs

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
sacom01
Posts: 194
Joined: Wed Dec 23, 2020 10:15 pm

check_ncpa get wrong alert from CPUs

Post by sacom01 »

hi team,
i use Check_ncpa for check cpu. And our server has many CPUs.

when i set 'aggregate=avg' it got wrong information, the server come to more 80-100% but it show more than 60% only.
when i set 'aggregate=max' it got Critical always, seem it get info from highest cpu.
and if i delete that argument, it will get info from all cpus.

what can i do for get and alert exact info from CPUs.

thanks.
You do not have the required permissions to view the files attached to this post.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: check_ncpa get wrong alert from CPUs

Post by ssax »

AVG will get you the average of all CPUs combined.

MAX will use the highest value of all CPUs.

MIN will use the lowest value of all CPUs.

Please run that check command again without arguments (so I can see all CPU percentages) and then attach the text version of it so that I can use the values to see if it's calculating it properly.

Thank you!
sacom01
Posts: 194
Joined: Wed Dec 23, 2020 10:15 pm

Re: check_ncpa get wrong alert from CPUs

Post by sacom01 »

pls find attach file for details.
You do not have the required permissions to view the files attached to this post.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: check_ncpa get wrong alert from CPUs

Post by ssax »

You'll want to use the aggregate=avg:

/usr/local/nagios/libexec/check_ncpa.py ... egate=avg'

Please note that the timing of when the top checks vs when NCPA checks will be different, you are likely rarely going to see them match.

What NCPA version are you running on the remote system? What OS/version is it?
sacom01
Posts: 194
Joined: Wed Dec 23, 2020 10:15 pm

Re: check_ncpa get wrong alert from CPUs

Post by sacom01 »

NCPA Agent_version was ['2.2.1']
os level: AIX 7.2.3
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: check_ncpa get wrong alert from CPUs

Post by ssax »

What is the full output of this command?

Code: Select all

/usr/local/nagios/libexec/check_ncpa.py -H X.X.X.X -t 'yourtoken' -P 5693 -M cpu/percent -w '20' -c '40' -q 'aggregate=avg'
sacom01
Posts: 194
Joined: Wed Dec 23, 2020 10:15 pm

Re: check_ncpa get wrong alert from CPUs

Post by sacom01 »

i told you ssax, avg get wrong info.
our system ab 80-100% performance but it show about more than 60% only
User avatar
vtrac
Posts: 903
Joined: Tue Oct 27, 2020 1:35 pm

Re: check_ncpa get wrong alert from CPUs

Post by vtrac »

Hi,
Hope you are having a good day!! ... :-)

Let try this, we will try to get the average CPU on your NCPA remote.
I am assuming that your NCPA agent is windows.

Please open a Nagios XI command prompt, then run the steps below.

1) Let get number of CPU cores in your remote NCPA agent.
NOTE: x.x.x.x is your remote NCPA agent, and "yourTOken" is your NCPA's token defined in "ncpa.cfg" file

Code: Select all

curl -k "https://x.x.x.x:5693/api/cpu/count?token=yourToken"
Example outputs:

Code: Select all

{
    "count": [
        [
            8
        ],
        "cores"
    ]
}

2) Now, sum (total) of CPU percent of each CPU:

Code: Select all

curl -k "https://x.x.x.x:5693/api/cpu/percent?token=yourToken" | egrep "[0-9]" | awk -F',' '{sum+=$1;}END{print sum;}'
Example outputs of step (2):

Code: Select all

106.3
3) Now, take the output of step (2) and divide that with number of CPU cores in step (1).
This will give you the average CPU percent.

Example: average CPU/percent

Code: Select all

106.3 / 8 = 12.2875 
Does this match your other average?


Best regards,
Vinh
sacom01
Posts: 194
Joined: Wed Dec 23, 2020 10:15 pm

Re: check_ncpa get wrong alert from CPUs

Post by sacom01 »

well, let me tell you details, it's too complex :D

Our system have both Redhat and AIX servers
when i run the command and check with Nagios XI, it match number between avg in command and Nagios XI.

When i check avg in command with TOP and TOPAS command on servers :
REDHAT : avg in command > number in TOP command (eg : 20% > 3%)
AIX : avg in command < number in TOPAS command (eg : 20% < 45%)

Now i'm very confuse and do not believe with which number :((

by the way, Vinh pls check inbox.
User avatar
vtrac
Posts: 903
Joined: Tue Oct 27, 2020 1:35 pm

Re: check_ncpa get wrong alert from CPUs

Post by vtrac »

Hi Sacom01,
CPU changes so fast that it is very hard to get the exact number.
In your case, looks like your system is very large, with so many CPU's and some not even being used (many are 0%).

I noticed in your opening page, your system is set to monitor every 1 minute.
I would suggest that you change that to "5 minutes" instead since your system is large. Monitor every minute does not make sense since many are not even used ( at 0% ... based on picture).

The issue we want to watch for is that CPU stay at 90+% at a long period.

Please let me know if there is anything else that we can do to help!!


Best Regards,
Vinh
Locked