"UNKNOWN" services checks status while monitoring our window

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
talalmog
Posts: 18
Joined: Thu Jan 24, 2019 5:20 am

"UNKNOWN" services checks status while monitoring our window

Post by talalmog »

Hi,

We are experiencing issues with many "UNKNOWN" services checks status while monitoring our windows servers
The Nagios log is full of the following error message :
"CURRENT SERVICE STATE: DUUH-XXXX-01.XXXX.corp;Performance - TCPv4/Connection Failures;UNKNOWN;HARD;3;UNKNOWN: Error occurred while running the plugin. Use the verbose flag for more details."
"SERVICE ALERT: XXXX-XX-04.XXX.local;Telnet Socket - pcp+ptr snap_players_count;UNKNOWN;SOFT;1;UNKNOWN: Error occurred while running the plugin. Use the verbose flag for more details.."

It seems the many service checks fails in their first attempt and in the next checks it succeeds.
We WERE NOT able to manually reproduce the failure while using the verbose mode .

On the remote monitored windows server the local NCPA logs is has the following error message :
"2019-03-28 12:35:04,694:ERROR:windowscounters:(-1073738822, 'GetFormattedCounterValue', 'The returned data is not valid.')
Traceback (most recent call last):
File "C:\ncpa\agent\listener\windowscounters.py", line 43, in counter_method
return WindowsCountersNode.get_counter_val(self.name, *args, **kwargs)
File "C:\ncpa\agent\listener\windowscounters.py", line 79, in get_counter_val
_, value = win32pdh.GetFormattedCounterValue(counter, win32pdh.PDH_FMT_DOUBLE)
error: (-1073738822, 'GetFormattedCounterValue', 'The returned data is not valid.')
2019-03-28 12:35:08,891:ERROR:windowscounters:(-1073738822, 'GetFormattedCounterValue', 'The returned data is not valid.')
Traceback (most recent call last):
File "C:\ncpa\agent\listener\windowscounters.py", line 43, in counter_method
return WindowsCountersNode.get_counter_val(self.name, *args, **kwargs)
File "C:\ncpa\agent\listener\windowscounters.py", line 79, in get_counter_val
_, value = win32pdh.GetFormattedCounterValue(counter, win32pdh.PDH_FMT_DOUBLE)
error: (-1073738822, 'GetFormattedCounterValue', 'The returned data is not valid.') "

The strange thing about the above is that our installation isn't in this location but rather on drive D:\

Our environment is VMWare based
Our current configuration is
Nagios XI version: 5.5.5
Number of services : 6560
Number of Hosts: 456
NCPA version – 2.1.5

We will appreciate your assistance with this problem
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: "UNKNOWN" services checks status while monitoring our wi

Post by ssax »

Ignore the internal drive name that NCPA is using, it's unrelated.

Please post the check command that you're running with the passed arguments.

I'm wondering which counter you're checking, maybe you're hitting this:

Code: Select all

https://github.com/NagiosEnterprises/ncpa/issues/520
Try upgrading to NCPA 2.1.6 and see if that resolves the issue, if not, please send us your ncpa.cfg file from the remote system.

What version of the check_ncpa plugin are you using? Please send the output of this command:

Code: Select all

/usr/local/nagios/libexec/check_ncpa.py -V
talalmog
Posts: 18
Joined: Thu Jan 24, 2019 5:20 am

Re: "UNKNOWN" services checks status while monitoring our wi

Post by talalmog »

hi,
thank you for you replay.
with NCPA version 2.1.6 we got the same results.

for your request:
/usr/local/nagios/libexec/check_ncpa.py -V
check_ncpa.py, Version 1.1.4


it happens for more than one service, sometimse it is for cpu check with powershell plugin

for example:
-t 'TOKEN' -P 5693 -M 'plugins/PerformanceCounters.ps1' -a 'AuthenticateService,FinancialService'
-t 'TOKEN' -P 5693 -M cpu/percent -w 90 -c 95 -q 'aggregate=avg'

thanks
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: "UNKNOWN" services checks status while monitoring our wi

Post by ssax »

Do you see any windows event logs that could be related? Do you have any security software that could be impacting it on the remote system?
talalmog
Posts: 18
Joined: Thu Jan 24, 2019 5:20 am

Re: "UNKNOWN" services checks status while monitoring our wi

Post by talalmog »

hi,
no, no events and no security software on the server
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: "UNKNOWN" services checks status while monitoring our wi

Post by ssax »

I think you (and another customer) are hitting this:

https://www.reddit.com/r/networking/com ... ort_reuse/

I don't have a solution for you at this time, I've notified the NCPA developer of it and he says it may be the case (we saw the port being reused in packet captures).
17g
Posts: 1
Joined: Tue Jan 10, 2017 11:26 am

Re: "UNKNOWN" services checks status while monitoring our wi

Post by 17g »

Hi talalmog

I am the other Nagios/NCPA user with this issue. Are you still experiencing the problem? If so, maybe we can compare environments to see if there is anything common?
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: "UNKNOWN" services checks status while monitoring our wi

Post by benjaminsmith »

Hi @talamog,

Thanks @17g. Yes, please let us know your status, resolved or not?

If you have the time, we are trying to replicate the issue here and Nagio/NCPA logs along with the environment details the Windows server (i.e. version) would be very helpful.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked