False alert
-
Frédéric GRANAT
- Posts: 445
- Joined: Mon Nov 19, 2012 11:36 am
False alert
Hi,
For two service checks using command check_win_cpu ($USER1$/check_wmi_plus.pl -H $HOSTADDRESS$ -u $ARG1$ -p $ARG2$ -m checkcpu -w $ARG3$ -c $ARG4$ $ARG5$), CRITICAL status is displayed (Average CPU utilisation 99,99 %) whereas the server doesn't consume as much CPU.
I dropped/created the check but the critical alert is still displayed.
Please help me.
Frederic
For two service checks using command check_win_cpu ($USER1$/check_wmi_plus.pl -H $HOSTADDRESS$ -u $ARG1$ -p $ARG2$ -m checkcpu -w $ARG3$ -c $ARG4$ $ARG5$), CRITICAL status is displayed (Average CPU utilisation 99,99 %) whereas the server doesn't consume as much CPU.
I dropped/created the check but the critical alert is still displayed.
Please help me.
Frederic
Re: False alert
Can you login to the XI server, run the following and post the output?
su nagios
/usr/local/nagios/libexec/check_wmi_plus.pl -H xxx.xxx.xxx.xxx -u <username> -p <password> -m checkcpu -d
Be sure to replace xxx.xxx.xxx.xxx with the IP address of the server that you are having the issue with and also the username and password will have to be changed.
su nagios
/usr/local/nagios/libexec/check_wmi_plus.pl -H xxx.xxx.xxx.xxx -u <username> -p <password> -m checkcpu -d
Be sure to replace xxx.xxx.xxx.xxx with the IP address of the server that you are having the issue with and also the username and password will have to be changed.
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
Frédéric GRANAT
- Posts: 445
- Joined: Mon Nov 19, 2012 11:36 am
Re: False alert
Hi,
Here it is.
Here it is.
Code: Select all
[nagios@nagiosxi root]$ /usr/local/nagios/libexec/check_wmi_plus.pl -H 172.16.1.14 -u domcompta/svc_riverbed -p dsisvc -m checkcpu -d
Command Line (v1.6): /usr/local/nagios/libexec/check_wmi_plus.pl -H 172.16.1.14 -u USER -p PASS -m checkcpu -d
Base Dir: /usr/local/nagios/libexec
Conf File Dir: /usr/local/nagios/libexec
Loaded Conf File /usr/local/nagios/libexec/check_wmi_plus.conf
Starting Keep State Mode
STATE FILE: /tmp/cwpss_checkcpu__17216114___.state
Checking previous data's expiry - Timestamp 1481552582 vs Expiry After 1481549020 (Keep State Expiry setting is 3600sec)
Using Existing WMI DATA of:$VAR1 = [
[
{
'_ChecksOK' => 1,
'PercentProcessorTime' => '145426072384437',
'Timestamp_Sys100NS' => '131260261824668403',
'Name' => '_Total',
'_ItemCount' => '1',
'_KeepStateCreateTimestamp' => 1481552582
}
]
];
Round #2 of 2
QUERY: /usr/bin/wmic '-U' 'USER%PASS' '--namespace' 'root/cimv2' '//172.16.1.14' 'select PercentProcessorTime,Timestamp_Sys100NS from Win32_PerfRawData_PerfOS_Processor where Name="_Total"'
OUTPUT: CLASS: Win32_PerfRawData_PerfOS_Processor
Name|PercentProcessorTime|Timestamp_Sys100NS
_Total|138539935468750|140801746826000
COLUMNS(last index=2):Name|PercentProcessorTime|Timestamp_Sys100NS
Now looking for (.*?)\n (use_split=1)
FIELDS (via Split):COLNAME=Name,FIELD=_Total
COLNAME=PercentProcessorTime,FIELD=138539935468750
COLNAME=Timestamp_Sys100NS,FIELD=140801746826000
Row Data Found OK
WMI DATA:$VAR1 = [
[
{
'_KeepStateSamplePeriod' => 38,
'_ChecksOK' => 2,
'PercentProcessorTime' => '145426072384437',
'Timestamp_Sys100NS' => '131260261824668403',
'Name' => '_Total',
'_ItemCount' => '1',
'_KeepStateCreateTimestamp' => 1481552582
}
],
[
{
'Timestamp_Sys100NS' => '140801746826000',
'PercentProcessorTime' => '138539935468750',
'_ItemCount' => 1,
'Name' => '_Total'
}
]
];
Storing new WMI results in the state file $VAR1 = [
[
{
'_ChecksOK' => 1,
'Timestamp_Sys100NS' => '140801746826000',
'PercentProcessorTime' => '138539935468750',
'_KeepStateCreateTimestamp' => 1481552620,
'_ItemCount' => 1,
'Name' => '_Total'
}
]
];
Copying predefined fields to the last WMI result set [0] to [1]
NEW WMI DATA:$VAR1 = [
[
{
'PercentProcessorTime' => '145426072384437',
'Timestamp_Sys100NS' => '131260261824668403',
'Name' => '_Total',
'_ItemCount' => '1'
}
],
[
{
'_KeepStateSamplePeriod' => 38,
'_ChecksOK' => 2,
'Timestamp_Sys100NS' => '140801746826000',
'PercentProcessorTime' => '138539935468750',
'_KeepStateCreateTimestamp' => 1481552582,
'_ItemCount' => 1,
'Name' => '_Total'
}
]
];
Creating '_AvgCPU' (WMIQuery:1, Row:0) using 'PERF_100NSEC_TIMER_INV' (Parameters: PercentProcessorTime,%.2f,100)
Core Calc: (1 - (138539935468750 - 145426072384437) /
(140801746826000 - 131260261824668403) ) * 100 = 99.9947481961018
Setting _AvgCPU to 99.99
Testing TEST VALUES $VAR1 = {
'_KeepStateSamplePeriod' => 38,
'_ChecksOK' => 2,
'_AvgCPU' => '99.99',
'Timestamp_Sys100NS' => '140801746826000',
'PercentProcessorTime' => '138539935468750',
'_KeepStateCreateTimestamp' => 1481552582,
'_ItemCount' => 1,
'Name' => '_Total'
};
WARNING SPECS: $VAR1 = undef;
CRITICAL SPECS: $VAR1 = undef;
------------ Critical Check ------------
------------ Warning Check ------------
------------ End Check ------------
Data Passed back from check: $VAR1 = {
'_AvgCPU' => '99.99',
'_DisplayMsg' => 'OK (Sample Period 38 sec)',
'_TestResult' => 0,
'Timestamp_Sys100NS' => '140801746826000',
'PercentProcessorTime' => '138539935468750',
'_KeepStateCreateTimestamp' => 1481552582,
'_ChecksOK' => 2,
'_KeepStateSamplePeriod' => 38,
'_StatusType' => 'OK (Sample Period 38 sec)',
'Name' => '_Total',
'_ItemCount' => 1,
'_Triggers' => ''
};
---------- Building Up Display
Incoming Data $VAR1 = {
'_submode' => '',
'_nodatastring' => 'WMI Query returned no data. The item you were looking for may NOT exist or the software that creates the WMI Class may not be running, or all data has been excluded.
',
'_TestResult' => 0,
'PercentProcessorTime' => '138539935468750',
'_KeepStateCreateTimestamp' => 1481552582,
'_arg5' => '',
'_ChecksOK' => 2,
'_KeepStateSamplePeriod' => 38,
'_host' => '172.16.1.14',
'_nodatamode' => '',
'_mode' => 'checkcpu',
'_savedbytefactor' => '',
'_ItemCount' => 1,
'_arg3' => '',
'_AvgCPU' => '99.99',
'_DisplayMsg' => 'OK (Sample Period 38 sec)',
'_arg1' => '',
'Timestamp_Sys100NS' => '140801746826000',
'_truncate_output' => 8192,
'_arg2' => '',
'_timeout' => '',
'_StatusType' => 'OK (Sample Period 38 sec)',
'_delay' => 38,
'_bytefactor' => 1024,
'Name' => '_Total',
'_arg4' => undef,
'_nodataexit' => '',
'_Triggers' => ''
};
------- Processing _DisplayMsg||~|~| - ||
Complex Format:_DisplayMsg,,~,~, - ,,
_DisplayMsg||~|~| - || ----> OK (Sample Period 38 sec) -
------- Processing _AvgCPU|%|Average CPU Utilisation| |~||
Complex Format:_AvgCPU,%,Average CPU Utilisation, ,~,,
_AvgCPU|%|Average CPU Utilisation| |~|| ----> Average CPU Utilisation 99.99%
---------- Building Up Performance Data
------- Processing _AvgCPU|%|Avg CPU Utilisation
Complex Format:_AvgCPU,%,Avg CPU Utilisation
_AvgCPU|%|Avg CPU Utilisation (Field=_AvgCPU) ----> 'Avg CPU Utilisation'=99.99%;;;
---------- Done
OUT:OK (Sample Period 38 sec) - Average CPU Utilisation 99.99%|'Avg CPU Utilisation'=99.99%;
OK (Sample Period 38 sec) - Average CPU Utilisation 99.99%|'Avg CPU Utilisation'=99.99%;
[nagios@nagiosxi root]$
Re: False alert
You make have to reset the performance counter's on the Windows server to see if that fixes the issue.
Take a look at the links below for instructions to reset the counters.
https://support.microsoft.com/en-us/kb/2554336
https://community.whatsupgold.com/libra ... dowsdevice
Take a look at the links below for instructions to reset the counters.
https://support.microsoft.com/en-us/kb/2554336
https://community.whatsupgold.com/libra ... dowsdevice
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
Frédéric GRANAT
- Posts: 445
- Joined: Mon Nov 19, 2012 11:36 am
Re: False alert
Hi,
We tried to reset the counters but no positive effect.
Frederic
We tried to reset the counters but no positive effect.
Frederic
Re: False alert
It could be that you are running an older version of the plugin and it may have a bug.
To check the version, you can run the following.
If the version of the plugin is less than 1.59, you may want to upgrade to a newer version and the following link are the instructions for upgrading the plugin.
https://assets.nagios.com/downloads/nag ... pgrade.pdf
To check the version, you can run the following.
Code: Select all
/usr/local/nagios/libexec/check_wmi_plus.pl --versionhttps://assets.nagios.com/downloads/nag ... pgrade.pdf
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
Frédéric GRANAT
- Posts: 445
- Joined: Mon Nov 19, 2012 11:36 am
Re: False alert
Hi,
[root@nagiosxi ~]# /usr/local/nagios/libexec/check_wmi_plus.pl --version
Version: 1.6
[root@nagiosxi ~]# /usr/local/nagios/libexec/check_wmi_plus.pl --version
Version: 1.6
Re: False alert
What OS and patch level is running on the Windows system showing the invalid info?
How many CPU's is it running?
Can you login to the Windows host, run the following in a command prompt and post the output?
How many CPU's is it running?
Can you login to the Windows host, run the following in a command prompt and post the output?
Code: Select all
wmic cpu get loadpercentageBe sure to check out our Knowledgebase for helpful articles and solutions!
-
Frédéric GRANAT
- Posts: 445
- Joined: Mon Nov 19, 2012 11:36 am
Re: False alert
What OS and patch level is running on the Windows system showing the invalid info?
=> Windows 2003 R2 service pack 2
How many CPU's is it running?
=> It's a virtual machine (1 vCPU)
Can you login to the Windows host, run the following in a command prompt and post the output?
=> C:\Documents and Settings\support>wmic cpu get loadpercentage
LoadPercentage
4
=> Windows 2003 R2 service pack 2
How many CPU's is it running?
=> It's a virtual machine (1 vCPU)
Can you login to the Windows host, run the following in a command prompt and post the output?
=> C:\Documents and Settings\support>wmic cpu get loadpercentage
LoadPercentage
4
Re: False alert
It could be a bad state file causing the issue. When the plugin runs, it keeps the last run in a file so it can calculate the output from the previous entry to the current check.
Run the following as root on the server to delete the state files. Wait until the check has run twice after deleting the files and see if it is reporting the correct load for the server.
Run the following as root on the server to delete the state files. Wait until the check has run twice after deleting the files and see if it is reporting the correct load for the server.
Code: Select all
rm -f /tmp/cwpss_checkcpu*Be sure to check out our Knowledgebase for helpful articles and solutions!