Page 1 of 1

service frequently goes to unknown state

Posted: Wed Mar 21, 2018 8:45 am
by padu_3891
Hello Team,

Please help.
We are monitoring the window CPU service using the below counter in VB script. It is configured on almost all the windows servers.

Set objWMIService = GetObject("winmgmts:" _
& "Win32_PerfFormattedData_PerfOS_Processor." _
& "name='_Total'")
cpuused = objWMIService.PercentProcessorTime

In most of the servers, the service fluctuates from OK to unknown for every checks as per the below logs.

Mar 21 04:35:10 saclx127 nagios: SERVICE ALERT: HKDNT877;COUNTER-Processor_Total-Processor-Time;OK;HARD;3;OK:Processor(_Total)%Processor Time:0
Mar 21 04:46:10 saclx127 nagios: SERVICE ALERT: HKDNT877;COUNTER-Processor_Total-Processor-Time;UNKNOWN;SOFT;1;Unable to establish communication with Agent
Mar 21 05:00:52 saclx127 nagios: SERVICE ALERT: HKDNT877;COUNTER-Processor_Total-Processor-Time;OK;SOFT;2;OK:Processor(_Total)%Processor Time:0
Mar 21 05:11:28 saclx127 nagios: SERVICE ALERT: HKDNT877;COUNTER-Processor_Total-Processor-Time;UNKNOWN;SOFT;1;Unable to establish communication with Agent
Mar 21 05:26:03 saclx127 nagios: SERVICE ALERT: HKDNT877;COUNTER-Processor_Total-Processor-Time;OK;SOFT;2;OK:Processor(_Total)%Processor Time:0
Mar 21 05:37:16 saclx127 nagios: SERVICE ALERT: HKDNT877;COUNTER-Processor_Total-Processor-Time;UNKNOWN;SOFT;1;Unable to establish communication with Agent
Mar 21 05:51:57 saclx127 nagios: SERVICE ALERT: HKDNT877;COUNTER-Processor_Total-Processor-Time;OK;SOFT;2;OK:Processor(_Total)%Processor Time:0
Mar 21 06:02:52 saclx127 nagios: SERVICE ALERT: HKDNT877;COUNTER-Processor_Total-Processor-Time;UNKNOWN;SOFT;1;Unable to establish communication with Agent
Mar 21 06:18:21 saclx127 nagios: SERVICE ALERT: HKDNT877;COUNTER-Processor_Total-Processor-Time;OK;SOFT;2;OK:Processor(_Total)%Processor Time:0
Mar 21 06:29:39 saclx127 nagios: SERVICE ALERT: HKDNT877;COUNTER-Processor_Total-Processor-Time;UNKNOWN;SOFT;1;Unable to establish communication with Agent


Entry in nsclient.ini

check_cpu_mem=cscript.exe /NoLogo scripts\\custom\\check_cpu_mem.vbe $ARG1$ $ARG2$ $ARG3$ $ARG4$

But when i execute it in check command all looks fine.

[gsspmuth@SACLX127 etc]$ /usr/local/nagios/libexec/check_nrpe -H 10.209.41.158 -p 56660 -t 30 -c check_cpu_mem -a CPU 90 95 5
OK:Processor(_Total)%Processor Time:0
|'\Processor(_Total)\% Processor Time'=0;90;95
[gsspmuth@SACLX127 etc]$ /usr/local/nagios/libexec/check_nrpe -H 10.209.41.158 -p 56660 -t 30 -c check_cpu_mem -a CPU 90 95 5
OK:Processor(_Total)%Processor Time:0
|'\Processor(_Total)\% Processor Time'=0;90;95
[gsspmuth@SACLX127 etc]$ /usr/local/nagios/libexec/check_nrpe -H 10.209.41.158 -p 56660 -t 30 -c check_cpu_mem -a CPU 90 95 5
OK:Processor(_Total)%Processor Time:0
|'\Processor(_Total)\% Processor Time'=0;90;95
[gsspmuth@SACLX127 etc]$ /usr/local/nagios/libexec/check_nrpe -H 10.209.41.158 -p 56660 -t 30 -c check_cpu_mem -a CPU 90 95 5
OK:Processor(_Total)%Processor Time:0
|'\Processor(_Total)\% Processor Time'=0;90;95
[gsspmuth@SACLX127 etc]$ /usr/local/nagios/libexec/check_nrpe -H 10.209.41.158 -p 56660 -t 30 -c check_cpu_mem -a CPU 90 95 5
OK:Processor(_Total)%Processor Time:2
|'\Processor(_Total)\% Processor Time'=2;90;95
[gsspmuth@SACLX127 etc]$ /usr/local/nagios/libexec/check_nrpe -H 10.209.41.158 -p 56660 -t 30 -c check_cpu_mem -a CPU 90 95 5
OK:Processor(_Total)%Processor Time:0
|'\Processor(_Total)\% Processor Time'=0;90;95
[gsspmuth@SACLX127 etc]$ /usr/local/nagios/libexec/check_nrpe -H 10.209.41.158 -p 56660 -t 30 -c check_cpu_mem -a CPU 90 95 5
OK:Processor(_Total)%Processor Time:0
|'\Processor(_Total)\% Processor Time'=0;90;95

Re: service frequently goes to unknown state

Posted: Wed Mar 21, 2018 8:48 am
by padu_3891
Error from one of the window server from nsclient.log file,

2018-03-18 09:21:05: e:..\..\..\..\nscp\modules\CheckEventLog\eventlog_wrapper.cpp:31: Failed to close eventlog: 1717: The interface is unknown.


2018-03-18 09:21:05: e:..\..\..\..\nscp\modules\CheckEventLog\eventlog_wrapper.cpp:173: Failed to read eventlog record(0): 6: The handle is invalid.


2018-03-18 09:21:12: e:..\..\..\..\nscp\modules\CheckSystem\PDHCollector.cpp:141: Failed to query performance counters: PdhCollectQueryData failed: : -2147481643: No data to return.


2018-03-18 09:21:12: e:..\..\..\..\nscp\modules\CheckSystem\PDHCollector.cpp:141: Failed to query performance counters: PdhCollectQueryData failed: : -2147481643: No data to return.



Thx.

Re: service frequently goes to unknown state

Posted: Thu Mar 22, 2018 1:03 pm
by lmiltchev
In most of the servers, the service fluctuates from OK to unknown for every checks as per the below logs.
I wonder if you have an issue with a specific version of NSClient++ agent. What is the NSClient++ version that you are currently using?

I see the following error in the log:

Code: Select all

2018-03-18 09:21:12: e:..\..\..\..\nscp\modules\CheckSystem\PDHCollector.cpp:141: Failed to query performance counters: PdhCollectQueryData failed: : -2147481643: No data to return.
Similar issue have been reported in the past. Here's an old post I found on the NSClient++ support forum:
https://forums.nsclient.org/t/performan ... oblem/3522

Can you provide us with a download link to the "check_cpu_mem.vbe" script? You can also rename it with the *.txt extension, and post it on the forum. We will try to test it in-house.