service frequently goes to unknown state

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
padu_3891
Posts: 50
Joined: Thu Sep 05, 2013 10:12 pm

service frequently goes to unknown state

Post by padu_3891 »

Hello Team,

Please help.
We are monitoring the window CPU service using the below counter in VB script. It is configured on almost all the windows servers.

Set objWMIService = GetObject("winmgmts:" _
& "Win32_PerfFormattedData_PerfOS_Processor." _
& "name='_Total'")
cpuused = objWMIService.PercentProcessorTime

In most of the servers, the service fluctuates from OK to unknown for every checks as per the below logs.

Mar 21 04:35:10 saclx127 nagios: SERVICE ALERT: HKDNT877;COUNTER-Processor_Total-Processor-Time;OK;HARD;3;OK:Processor(_Total)%Processor Time:0
Mar 21 04:46:10 saclx127 nagios: SERVICE ALERT: HKDNT877;COUNTER-Processor_Total-Processor-Time;UNKNOWN;SOFT;1;Unable to establish communication with Agent
Mar 21 05:00:52 saclx127 nagios: SERVICE ALERT: HKDNT877;COUNTER-Processor_Total-Processor-Time;OK;SOFT;2;OK:Processor(_Total)%Processor Time:0
Mar 21 05:11:28 saclx127 nagios: SERVICE ALERT: HKDNT877;COUNTER-Processor_Total-Processor-Time;UNKNOWN;SOFT;1;Unable to establish communication with Agent
Mar 21 05:26:03 saclx127 nagios: SERVICE ALERT: HKDNT877;COUNTER-Processor_Total-Processor-Time;OK;SOFT;2;OK:Processor(_Total)%Processor Time:0
Mar 21 05:37:16 saclx127 nagios: SERVICE ALERT: HKDNT877;COUNTER-Processor_Total-Processor-Time;UNKNOWN;SOFT;1;Unable to establish communication with Agent
Mar 21 05:51:57 saclx127 nagios: SERVICE ALERT: HKDNT877;COUNTER-Processor_Total-Processor-Time;OK;SOFT;2;OK:Processor(_Total)%Processor Time:0
Mar 21 06:02:52 saclx127 nagios: SERVICE ALERT: HKDNT877;COUNTER-Processor_Total-Processor-Time;UNKNOWN;SOFT;1;Unable to establish communication with Agent
Mar 21 06:18:21 saclx127 nagios: SERVICE ALERT: HKDNT877;COUNTER-Processor_Total-Processor-Time;OK;SOFT;2;OK:Processor(_Total)%Processor Time:0
Mar 21 06:29:39 saclx127 nagios: SERVICE ALERT: HKDNT877;COUNTER-Processor_Total-Processor-Time;UNKNOWN;SOFT;1;Unable to establish communication with Agent


Entry in nsclient.ini

check_cpu_mem=cscript.exe /NoLogo scripts\\custom\\check_cpu_mem.vbe $ARG1$ $ARG2$ $ARG3$ $ARG4$

But when i execute it in check command all looks fine.

[gsspmuth@SACLX127 etc]$ /usr/local/nagios/libexec/check_nrpe -H 10.209.41.158 -p 56660 -t 30 -c check_cpu_mem -a CPU 90 95 5
OK:Processor(_Total)%Processor Time:0
|'\Processor(_Total)\% Processor Time'=0;90;95
[gsspmuth@SACLX127 etc]$ /usr/local/nagios/libexec/check_nrpe -H 10.209.41.158 -p 56660 -t 30 -c check_cpu_mem -a CPU 90 95 5
OK:Processor(_Total)%Processor Time:0
|'\Processor(_Total)\% Processor Time'=0;90;95
[gsspmuth@SACLX127 etc]$ /usr/local/nagios/libexec/check_nrpe -H 10.209.41.158 -p 56660 -t 30 -c check_cpu_mem -a CPU 90 95 5
OK:Processor(_Total)%Processor Time:0
|'\Processor(_Total)\% Processor Time'=0;90;95
[gsspmuth@SACLX127 etc]$ /usr/local/nagios/libexec/check_nrpe -H 10.209.41.158 -p 56660 -t 30 -c check_cpu_mem -a CPU 90 95 5
OK:Processor(_Total)%Processor Time:0
|'\Processor(_Total)\% Processor Time'=0;90;95
[gsspmuth@SACLX127 etc]$ /usr/local/nagios/libexec/check_nrpe -H 10.209.41.158 -p 56660 -t 30 -c check_cpu_mem -a CPU 90 95 5
OK:Processor(_Total)%Processor Time:2
|'\Processor(_Total)\% Processor Time'=2;90;95
[gsspmuth@SACLX127 etc]$ /usr/local/nagios/libexec/check_nrpe -H 10.209.41.158 -p 56660 -t 30 -c check_cpu_mem -a CPU 90 95 5
OK:Processor(_Total)%Processor Time:0
|'\Processor(_Total)\% Processor Time'=0;90;95
[gsspmuth@SACLX127 etc]$ /usr/local/nagios/libexec/check_nrpe -H 10.209.41.158 -p 56660 -t 30 -c check_cpu_mem -a CPU 90 95 5
OK:Processor(_Total)%Processor Time:0
|'\Processor(_Total)\% Processor Time'=0;90;95
padu_3891
Posts: 50
Joined: Thu Sep 05, 2013 10:12 pm

Re: service frequently goes to unknown state

Post by padu_3891 »

Error from one of the window server from nsclient.log file,

2018-03-18 09:21:05: e:..\..\..\..\nscp\modules\CheckEventLog\eventlog_wrapper.cpp:31: Failed to close eventlog: 1717: The interface is unknown.


2018-03-18 09:21:05: e:..\..\..\..\nscp\modules\CheckEventLog\eventlog_wrapper.cpp:173: Failed to read eventlog record(0): 6: The handle is invalid.


2018-03-18 09:21:12: e:..\..\..\..\nscp\modules\CheckSystem\PDHCollector.cpp:141: Failed to query performance counters: PdhCollectQueryData failed: : -2147481643: No data to return.


2018-03-18 09:21:12: e:..\..\..\..\nscp\modules\CheckSystem\PDHCollector.cpp:141: Failed to query performance counters: PdhCollectQueryData failed: : -2147481643: No data to return.



Thx.
User avatar
lmiltchev
Former Nagios Staff
Posts: 13587
Joined: Mon May 23, 2011 12:15 pm

Re: service frequently goes to unknown state

Post by lmiltchev »

In most of the servers, the service fluctuates from OK to unknown for every checks as per the below logs.
I wonder if you have an issue with a specific version of NSClient++ agent. What is the NSClient++ version that you are currently using?

I see the following error in the log:

Code: Select all

2018-03-18 09:21:12: e:..\..\..\..\nscp\modules\CheckSystem\PDHCollector.cpp:141: Failed to query performance counters: PdhCollectQueryData failed: : -2147481643: No data to return.
Similar issue have been reported in the past. Here's an old post I found on the NSClient++ support forum:
https://forums.nsclient.org/t/performan ... oblem/3522

Can you provide us with a download link to the "check_cpu_mem.vbe" script? You can also rename it with the *.txt extension, and post it on the forum. We will try to test it in-house.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked