False alert

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Frédéric GRANAT
Posts: 445
Joined: Mon Nov 19, 2012 11:36 am

False alert

Post by Frédéric GRANAT »

Hi,
For two service checks using command check_win_cpu ($USER1$/check_wmi_plus.pl -H $HOSTADDRESS$ -u $ARG1$ -p $ARG2$ -m checkcpu -w $ARG3$ -c $ARG4$ $ARG5$), CRITICAL status is displayed (Average CPU utilisation 99,99 %) whereas the server doesn't consume as much CPU.
I dropped/created the check but the critical alert is still displayed.

Please help me.

Frederic
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: False alert

Post by tgriep »

Can you login to the XI server, run the following and post the output?
su nagios
/usr/local/nagios/libexec/check_wmi_plus.pl -H xxx.xxx.xxx.xxx -u <username> -p <password> -m checkcpu -d

Be sure to replace xxx.xxx.xxx.xxx with the IP address of the server that you are having the issue with and also the username and password will have to be changed.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Frédéric GRANAT
Posts: 445
Joined: Mon Nov 19, 2012 11:36 am

Re: False alert

Post by Frédéric GRANAT »

Hi,
Here it is.

Code: Select all

[nagios@nagiosxi root]$ /usr/local/nagios/libexec/check_wmi_plus.pl -H 172.16.1.14 -u domcompta/svc_riverbed -p dsisvc -m checkcpu -d
Command Line (v1.6): /usr/local/nagios/libexec/check_wmi_plus.pl -H 172.16.1.14 -u USER -p PASS -m checkcpu -d
Base Dir: /usr/local/nagios/libexec
Conf File Dir: /usr/local/nagios/libexec
Loaded Conf File /usr/local/nagios/libexec/check_wmi_plus.conf
Starting Keep State Mode
STATE FILE: /tmp/cwpss_checkcpu__17216114___.state
Checking previous data's expiry - Timestamp 1481552582 vs Expiry After 1481549020 (Keep State Expiry setting is 3600sec)
Using Existing WMI DATA of:$VAR1 = [
          [
            {
              '_ChecksOK' => 1,
              'PercentProcessorTime' => '145426072384437',
              'Timestamp_Sys100NS' => '131260261824668403',
              'Name' => '_Total',
              '_ItemCount' => '1',
              '_KeepStateCreateTimestamp' => 1481552582
            }
          ]
        ];
Round #2 of 2
QUERY: /usr/bin/wmic '-U' 'USER%PASS' '--namespace' 'root/cimv2' '//172.16.1.14' 'select PercentProcessorTime,Timestamp_Sys100NS from Win32_PerfRawData_PerfOS_Processor where Name="_Total"'
OUTPUT: CLASS: Win32_PerfRawData_PerfOS_Processor
Name|PercentProcessorTime|Timestamp_Sys100NS
_Total|138539935468750|140801746826000

COLUMNS(last index=2):Name|PercentProcessorTime|Timestamp_Sys100NS
Now looking for (.*?)\n (use_split=1)
FIELDS (via Split):COLNAME=Name,FIELD=_Total
COLNAME=PercentProcessorTime,FIELD=138539935468750
COLNAME=Timestamp_Sys100NS,FIELD=140801746826000

Row Data Found OK
WMI DATA:$VAR1 = [
          [
            {
              '_KeepStateSamplePeriod' => 38,
              '_ChecksOK' => 2,
              'PercentProcessorTime' => '145426072384437',
              'Timestamp_Sys100NS' => '131260261824668403',
              'Name' => '_Total',
              '_ItemCount' => '1',
              '_KeepStateCreateTimestamp' => 1481552582
            }
          ],
          [
            {
              'Timestamp_Sys100NS' => '140801746826000',
              'PercentProcessorTime' => '138539935468750',
              '_ItemCount' => 1,
              'Name' => '_Total'
            }
          ]
        ];
Storing new WMI results in the state file $VAR1 = [
          [
            {
              '_ChecksOK' => 1,
              'Timestamp_Sys100NS' => '140801746826000',
              'PercentProcessorTime' => '138539935468750',
              '_KeepStateCreateTimestamp' => 1481552620,
              '_ItemCount' => 1,
              'Name' => '_Total'
            }
          ]
        ];
Copying predefined fields to the last WMI result set [0] to [1]
NEW WMI DATA:$VAR1 = [
          [
            {
              'PercentProcessorTime' => '145426072384437',
              'Timestamp_Sys100NS' => '131260261824668403',
              'Name' => '_Total',
              '_ItemCount' => '1'
            }
          ],
          [
            {
              '_KeepStateSamplePeriod' => 38,
              '_ChecksOK' => 2,
              'Timestamp_Sys100NS' => '140801746826000',
              'PercentProcessorTime' => '138539935468750',
              '_KeepStateCreateTimestamp' => 1481552582,
              '_ItemCount' => 1,
              'Name' => '_Total'
            }
          ]
        ];
Creating '_AvgCPU' (WMIQuery:1, Row:0) using 'PERF_100NSEC_TIMER_INV' (Parameters: PercentProcessorTime,%.2f,100)
Core Calc: (1 - (138539935468750 - 145426072384437) /
                           (140801746826000 - 131260261824668403)  ) * 100 =  99.9947481961018
   Setting _AvgCPU to 99.99
Testing TEST VALUES $VAR1 = {
          '_KeepStateSamplePeriod' => 38,
          '_ChecksOK' => 2,
          '_AvgCPU' => '99.99',
          'Timestamp_Sys100NS' => '140801746826000',
          'PercentProcessorTime' => '138539935468750',
          '_KeepStateCreateTimestamp' => 1481552582,
          '_ItemCount' => 1,
          'Name' => '_Total'
        };
WARNING SPECS: $VAR1 = undef;
CRITICAL SPECS: $VAR1 = undef;
------------ Critical Check ------------
------------ Warning Check ------------
------------ End Check ------------
Data Passed back from check: $VAR1 = {
          '_AvgCPU' => '99.99',
          '_DisplayMsg' => 'OK (Sample Period 38 sec)',
          '_TestResult' => 0,
          'Timestamp_Sys100NS' => '140801746826000',
          'PercentProcessorTime' => '138539935468750',
          '_KeepStateCreateTimestamp' => 1481552582,
          '_ChecksOK' => 2,
          '_KeepStateSamplePeriod' => 38,
          '_StatusType' => 'OK (Sample Period 38 sec)',
          'Name' => '_Total',
          '_ItemCount' => 1,
          '_Triggers' => ''
        };
---------- Building Up Display
Incoming Data $VAR1 = {
          '_submode' => '',
          '_nodatastring' => 'WMI Query returned no data. The item you were looking for may NOT exist or the software that creates the WMI Class may not be running, or all data has been excluded.
',
          '_TestResult' => 0,
          'PercentProcessorTime' => '138539935468750',
          '_KeepStateCreateTimestamp' => 1481552582,
          '_arg5' => '',
          '_ChecksOK' => 2,
          '_KeepStateSamplePeriod' => 38,
          '_host' => '172.16.1.14',
          '_nodatamode' => '',
          '_mode' => 'checkcpu',
          '_savedbytefactor' => '',
          '_ItemCount' => 1,
          '_arg3' => '',
          '_AvgCPU' => '99.99',
          '_DisplayMsg' => 'OK (Sample Period 38 sec)',
          '_arg1' => '',
          'Timestamp_Sys100NS' => '140801746826000',
          '_truncate_output' => 8192,
          '_arg2' => '',
          '_timeout' => '',
          '_StatusType' => 'OK (Sample Period 38 sec)',
          '_delay' => 38,
          '_bytefactor' => 1024,
          'Name' => '_Total',
          '_arg4' => undef,
          '_nodataexit' => '',
          '_Triggers' => ''
        };
------- Processing _DisplayMsg||~|~| - ||
Complex Format:_DisplayMsg,,~,~, - ,,
_DisplayMsg||~|~| - || ----> OK (Sample Period 38 sec) -
------- Processing _AvgCPU|%|Average CPU Utilisation| |~||
Complex Format:_AvgCPU,%,Average CPU Utilisation, ,~,,
_AvgCPU|%|Average CPU Utilisation| |~|| ----> Average CPU Utilisation 99.99%
---------- Building Up Performance Data
------- Processing _AvgCPU|%|Avg CPU Utilisation
Complex Format:_AvgCPU,%,Avg CPU Utilisation
_AvgCPU|%|Avg CPU Utilisation (Field=_AvgCPU) ----> 'Avg CPU Utilisation'=99.99%;;;
---------- Done
OUT:OK (Sample Period 38 sec) - Average CPU Utilisation 99.99%|'Avg CPU Utilisation'=99.99%;

OK (Sample Period 38 sec) - Average CPU Utilisation 99.99%|'Avg CPU Utilisation'=99.99%;
[nagios@nagiosxi root]$

User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: False alert

Post by tgriep »

You make have to reset the performance counter's on the Windows server to see if that fixes the issue.
Take a look at the links below for instructions to reset the counters.
https://support.microsoft.com/en-us/kb/2554336
https://community.whatsupgold.com/libra ... dowsdevice
Be sure to check out our Knowledgebase for helpful articles and solutions!
Frédéric GRANAT
Posts: 445
Joined: Mon Nov 19, 2012 11:36 am

Re: False alert

Post by Frédéric GRANAT »

Hi,
We tried to reset the counters but no positive effect.

Frederic
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: False alert

Post by tgriep »

It could be that you are running an older version of the plugin and it may have a bug.
To check the version, you can run the following.

Code: Select all

/usr/local/nagios/libexec/check_wmi_plus.pl --version
If the version of the plugin is less than 1.59, you may want to upgrade to a newer version and the following link are the instructions for upgrading the plugin.
https://assets.nagios.com/downloads/nag ... pgrade.pdf
Be sure to check out our Knowledgebase for helpful articles and solutions!
Frédéric GRANAT
Posts: 445
Joined: Mon Nov 19, 2012 11:36 am

Re: False alert

Post by Frédéric GRANAT »

Hi,

[root@nagiosxi ~]# /usr/local/nagios/libexec/check_wmi_plus.pl --version
Version: 1.6
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: False alert

Post by tgriep »

What OS and patch level is running on the Windows system showing the invalid info?
How many CPU's is it running?
Can you login to the Windows host, run the following in a command prompt and post the output?

Code: Select all

wmic cpu get loadpercentage
Be sure to check out our Knowledgebase for helpful articles and solutions!
Frédéric GRANAT
Posts: 445
Joined: Mon Nov 19, 2012 11:36 am

Re: False alert

Post by Frédéric GRANAT »

What OS and patch level is running on the Windows system showing the invalid info?
=> Windows 2003 R2 service pack 2
How many CPU's is it running?
=> It's a virtual machine (1 vCPU)
Can you login to the Windows host, run the following in a command prompt and post the output?
=> C:\Documents and Settings\support>wmic cpu get loadpercentage
LoadPercentage
4
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: False alert

Post by tgriep »

It could be a bad state file causing the issue. When the plugin runs, it keeps the last run in a file so it can calculate the output from the previous entry to the current check.
Run the following as root on the server to delete the state files. Wait until the check has run twice after deleting the files and see if it is reporting the correct load for the server.

Code: Select all

rm -f /tmp/cwpss_checkcpu*
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked