Windows Check_WMI checks sporadic failures

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
tonyleatwork
Posts: 91
Joined: Mon Jul 07, 2014 8:55 am

Windows Check_WMI checks sporadic failures

Post by tonyleatwork »

Hi -

We have a fairly large Windows environment that we monitor using NAgios XI with the WMI plugin (some via Wizard and some via customizations) but some hosts have random, temporary check failures while others have some checks permanently failed.

All the hosts are configured to receive the same 5 checks:

CPU, CPU Queue Length, Disk IO, Page File, Free Disk Space.

The vast majority of systems work fine but I randomly get some that would shoot an "UNKNOWN" for CPU Queue Length for example, some work with 4 and will just miss one. Most of our windows servers are Windows 2008+ (up to Windows 2012 R2). There are some Windows 2000 / 2003 that we

The error is always the generic:

Code: Select all

UNKNOWN - The WMI query had problems. The error text from wmic is: [librpc/rpc/dcerpc_connect.c:329:dcerpc_pipe_connect_ncacn_ip_tcp_recv()] failed NT status (c00000b5) in dcerpc_pipe_connect_ncacn_ip_tcp_recv [truncated].
Here's me testing one (Windows 2008 R2 64bit) with a debug flag on:

Code: Select all

[root@nwd2ng01 libexec]# ./check_wmi_plus.pl -H nwd2wdprd2.ad.analog.com -d -u analog/svcwmi -p xxxxxxx -m checkpage
Command Line (v1.49): ./check_wmi_plus.pl -H nwd2wdprd2.ad.analog.com -d -u USER -p PASS -m checkpage
Conf File Dir: /usr/local/nagios/libexec
Loaded Conf File /usr/local/nagios/libexec/check_wmi_plus.conf
Round #1 of 1
QUERY: /usr/bin/wmic --namespace root/cimv2 -U USER%PASS //nwd2wdprd2.ad.analog.com 'Select AllocatedBaseSize,CurrentUsage,PeakUsage from Win32_PageFileUsage'
OUTPUT: [librpc/rpc/dcerpc_connect.c:329:dcerpc_pipe_connect_ncacn_ip_tcp_recv()] failed NT status (c00000b5) in dcerpc_pipe_connect_ncacn_ip_tcp_recv
[librpc/rpc/dcerpc_connect.c:790:dcerpc_pipe_connect_b_recv()] failed NT status (c00000b5) in dcerpc_pipe_connect_b_recv

Could not find the CLASS: line - an error occurred
WMI DATA:$VAR1 = [];
UNKNOWN - The WMI query had problems. The error text from wmic is: [librpc/rpc/dcerpc_connect.c:329:dcerpc_pipe_connect_ncacn_ip_tcp_recv()] failed NT status (c00000b5) in dcerpc_pipe_connect_ncacn_ip_tcp_recv
[librpc/rpc/dcerpc_connect.c:790:dcerpc_pipe_connect_b_recv()] failed NT status (c00000b5) in dcerpc_pipe_connect_b_recv


Here's the direct wmic command with the debug flag turned up to 7 on the same box:

Code: Select all

[root@nwd2ng01 libexec]#  /usr/bin/wmic --namespace root/cimv2 -U domain/svcwmi%xxxxxxx //nwd2wdprd2.ad.analog.com 'Select AllocatedBaseSize,CurrentUsage,PeakUsage from Win32_PageFileUsage'
[librpc/rpc/dcerpc_connect.c:329:dcerpc_pipe_connect_ncacn_ip_tcp_recv()] failed NT status (c00000b5) in dcerpc_pipe_connect_ncacn_ip_tcp_recv
[librpc/rpc/dcerpc_connect.c:790:dcerpc_pipe_connect_b_recv()] failed NT status (c00000b5) in dcerpc_pipe_connect_b_recv
[root@nwd2ng01 libexec]#  /usr/bin/wmic -d 7 --namespace root/cimv2 -U analog/svcwmi%svc_n@gios1 //nwd2wdprd2.ad.analog.com 'Select AllocatedBaseSize,CurrentUsage,PeakUsage from Win32_PageFileUsage'
[param/loadparm.c:587:init_globals()] Initialising global parameters
[param/loadparm.c:2462:lp_load()] lp_load: refreshing parameters from /dev/null
[param/params.c:556:pm_process()] params.c:pm_process() - Processing configuration file "/dev/null"
[param/loadparm.c:2471:lp_load()] pm_process() returned Yes
[param/loadparm.c:1343:lp_add_hidden()] adding hidden service IPC$
[param/loadparm.c:1343:lp_add_hidden()] adding hidden service ADMIN$
[auth/credentials/credentials_krb5.c:171:cli_credentials_set_ccache()] failed to get principal from default ccache: No such file or directory: open(/tmp/krb5cc_0): No such file or directory
[auth/gensec/gensec.c:1229:gensec_register()] GENSEC backend 'sasl-DIGEST-MD5' registered
[auth/auth.c:447:auth_register()] AUTH backend 'winbind_samba3' registered
[auth/auth.c:447:auth_register()] AUTH backend 'winbind' registered
[auth/auth.c:447:auth_register()] AUTH backend 'name_to_ntstatus' registered
[auth/auth.c:447:auth_register()] AUTH backend 'fixed_challenge' registered
[auth/auth.c:447:auth_register()] AUTH backend 'unix' registered
[auth/auth.c:447:auth_register()] AUTH backend 'anonymous' registered
[auth/auth.c:447:auth_register()] AUTH backend 'sam' registered
[auth/auth.c:447:auth_register()] AUTH backend 'sam_ignoredomain' registered
[auth/gensec/gensec.c:1229:gensec_register()] GENSEC backend 'krb5' registered
[auth/gensec/gensec.c:1205:gensec_register()] gensec subsystem fake_gssapi_krb5 is disabled
[auth/gensec/gensec.c:1229:gensec_register()] GENSEC backend 'schannel' registered
[auth/gensec/gensec.c:1229:gensec_register()] GENSEC backend 'spnego' registered
[auth/gensec/gensec.c:1205:gensec_register()] gensec subsystem gssapi_spnego is disabled
[auth/gensec/gensec.c:1229:gensec_register()] GENSEC backend 'gssapi_krb5' registered
[auth/gensec/gensec.c:1229:gensec_register()] GENSEC backend 'gssapi_krb5_sasl' registered
[auth/gensec/gensec.c:1229:gensec_register()] GENSEC backend 'ntlmssp' registered
[lib/com/dcom/main.c:528:dcom_determine_rpc_binding()] Using binding ncacn_ip_tcp:nwd2wdprd2.ad.analog.com
[librpc/rpc/dcerpc_connect.c:513:continue_map_binding()] Mapped to DCERPC endpoint 135
[lib/com/dcom/main.c:413:determine_rpc_binding_continue2()] dcerpc_ndr_request_recv returned NT_STATUS_OK
[lib/com/dcom/main.c:417:determine_rpc_binding_continue2()] IObjectExporter::ServerAlive returned NT_STATUS_OK
[auth/gensec/gensec.c:599:gensec_start_mech()] Starting GENSEC mechanism spnego
[auth/gensec/gensec.c:599:gensec_start_mech()] Starting GENSEC submechanism gssapi_krb5
[auth/kerberos/kerberos_util.c:236:kinit_to_ccache()] kinit for svcwmi@ANALOG failed (Cannot contact any KDC for requested realm: unable to reach any KDC in realm ANALOG)
[auth/credentials/credentials_krb5.c:300:cli_credentials_get_client_gss_creds()] Failed to get CCACHE for GSSAPI client: Cannot contact any KDC for requested realm
[auth/gensec/gensec_gssapi.c:354:gensec_gssapi_client_start()] Cannot reach a KDC we require
[auth/gensec/gensec.c:606:gensec_start_mech()] Failed to start GENSEC client mech gssapi_krb5: NT_STATUS_INVALID_PARAMETER
[auth/gensec/gensec.c:599:gensec_start_mech()] Starting GENSEC submechanism ntlmssp
[auth/ntlmssp/ntlmssp_client.c:128:ntlmssp_client_challenge()] Got challenge flags:
[auth/ntlmssp/ntlmssp.c:72:debug_ntlmssp_flags()] Got NTLMSSP neg_flags=0x62898205
  NTLMSSP_NEGOTIATE_UNICODE
  NTLMSSP_REQUEST_TARGET
  NTLMSSP_NEGOTIATE_NTLM
  NTLMSSP_NEGOTIATE_ALWAYS_SIGN
  NTLMSSP_NEGOTIATE_NTLM2
  NTLMSSP_CHAL_TARGET_INFO
  NTLMSSP_NEGOTIATE_128
  NTLMSSP_NEGOTIATE_KEY_EXCH
[auth/credentials/credentials_ntlm.c:130:cli_credentials_get_ntlm_response()] NTLMSSP challenge set by NTLM2
[auth/credentials/credentials_ntlm.c:131:cli_credentials_get_ntlm_response()] challenge is:
[000] 36 EC E3 A6 28 AD 25 D4                           6...(.%.
[auth/ntlmssp/ntlmssp_client.c:242:ntlmssp_client_challenge()] NTLMSSP: Set final flags:
[auth/ntlmssp/ntlmssp.c:72:debug_ntlmssp_flags()] Got NTLMSSP neg_flags=0x60088205
  NTLMSSP_NEGOTIATE_UNICODE
  NTLMSSP_REQUEST_TARGET
  NTLMSSP_NEGOTIATE_NTLM
  NTLMSSP_NEGOTIATE_ALWAYS_SIGN
  NTLMSSP_NEGOTIATE_NTLM2
  NTLMSSP_NEGOTIATE_128
  NTLMSSP_NEGOTIATE_KEY_EXCH
[librpc/ndr/ndr_string.c:214:ndr_pull_string()] long string ''
[lib/com/dcom/main.c:570:complete_activation()] Negotiated COM version: 5.1 using binding ncacn_ip_tcp:nwd2wdprd2.ad.analog.com[135]
[lib/com/dcom/main.c:1172:bind_new_pipe()] lib/com/dcom/main.c:1172: dcom_get_pipe: host=nwd2wdprd2.ad.analog.com, similar=NWD2WDPRD2[49154]
[lib/util/util.c:334:interpret_addr()] sys_gethostbyname: Unknown host. NWD2WDPRD2
[lib/socket/interface.c:103:add_interface()] added interface ip=10.64.52.120 nmask=255.255.255.0
[librpc/rpc/dcerpc_connect.c:329:dcerpc_pipe_connect_ncacn_ip_tcp_recv()] failed NT status (c00000b5) in dcerpc_pipe_connect_ncacn_ip_tcp_recv
[librpc/rpc/dcerpc_connect.c:790:dcerpc_pipe_connect_b_recv()] failed NT status (c00000b5) in dcerpc_pipe_connect_b_recv
[auth/gensec/gensec.c:599:gensec_start_mech()] Starting GENSEC mechanism ntlmssp
[auth/ntlmssp/ntlmssp_client.c:128:ntlmssp_client_challenge()] Got challenge flags:
[auth/ntlmssp/ntlmssp.c:72:debug_ntlmssp_flags()] Got NTLMSSP neg_flags=0x62898215
  NTLMSSP_NEGOTIATE_UNICODE
  NTLMSSP_REQUEST_TARGET
  NTLMSSP_NEGOTIATE_SIGN
  NTLMSSP_NEGOTIATE_NTLM
  NTLMSSP_NEGOTIATE_ALWAYS_SIGN
  NTLMSSP_NEGOTIATE_NTLM2
  NTLMSSP_CHAL_TARGET_INFO
  NTLMSSP_NEGOTIATE_128
  NTLMSSP_NEGOTIATE_KEY_EXCH
[auth/credentials/credentials_ntlm.c:130:cli_credentials_get_ntlm_response()] NTLMSSP challenge set by NTLM2
[auth/credentials/credentials_ntlm.c:131:cli_credentials_get_ntlm_response()] challenge is:
[000] 5A 39 DB AE 9A B7 B5 7E                           Z9.....~
[auth/ntlmssp/ntlmssp_client.c:242:ntlmssp_client_challenge()] NTLMSSP: Set final flags:
[auth/ntlmssp/ntlmssp.c:72:debug_ntlmssp_flags()] Got NTLMSSP neg_flags=0x60088215
  NTLMSSP_NEGOTIATE_UNICODE
  NTLMSSP_REQUEST_TARGET
  NTLMSSP_NEGOTIATE_SIGN
  NTLMSSP_NEGOTIATE_NTLM
  NTLMSSP_NEGOTIATE_ALWAYS_SIGN
  NTLMSSP_NEGOTIATE_NTLM2
  NTLMSSP_NEGOTIATE_128
  NTLMSSP_NEGOTIATE_KEY_EXCH
[auth/ntlmssp/ntlmssp_sign.c:318:ntlmssp_sign_init()] NTLMSSP Sign/Seal - Initialising with flags:
[auth/ntlmssp/ntlmssp.c:72:debug_ntlmssp_flags()] Got NTLMSSP neg_flags=0x60088215
  NTLMSSP_NEGOTIATE_UNICODE
  NTLMSSP_REQUEST_TARGET
  NTLMSSP_NEGOTIATE_SIGN
  NTLMSSP_NEGOTIATE_NTLM
  NTLMSSP_NEGOTIATE_ALWAYS_SIGN
  NTLMSSP_NEGOTIATE_NTLM2
  NTLMSSP_NEGOTIATE_128
  NTLMSSP_NEGOTIATE_KEY_EXCH
[librpc/ndr/ndr_string.c:214:ndr_pull_string()] long string ''
[wmi/wmic.c:196:main()] OK   : Login to remote object.
[librpc/ndr/ndr_string.c:214:ndr_pull_string()] long string ''
[wmi/wmic.c:200:main()] OK   : WMI query execute.
[librpc/ndr/ndr_string.c:214:ndr_pull_string()] long string ''
[wmi/wmic.c:203:main()] OK   : Reset result of WMI query.
[librpc/ndr/ndr_string.c:214:ndr_pull_string()] long string ''
[librpc/ndr/ndr_string.c:214:ndr_pull_string()] long string ''
[librpc/ndr/ndr_string.c:214:ndr_pull_string()] long string ''
[wmi/wmic.c:212:main()] OK   : Retrieve result data.
[root@nwd2ng01 libexec]#
We are in the middle of a migration from SiteScope to NAgios and the same hosts that have permanently failed checks work fine there (also via WMI). How do we further trouble shoot this issue as we are scheduled to go live at the end of January!

Profile Below:

Code: Select all

Nagios XI Installation Profile
Download Profile	
System:
Nagios XI Version : 2014R2.3
nwd2ng01.corp.analog.com 2.6.32-504.3.3.el6.x86_64 x86_64
CentOS release 6.6 (Final)
Gnome is not installed
Apache Information
PHP Version: 5.3.3
Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0
Server Name: nwd2ng01.corp.analog.com
Server Address: 10.64.52.120
Server Port: 80
Date/Time
PHP Timezone: America/New_York
PHP Time: Sat, 24 Jan 2015 22:35:30 -0500
System Time: Sat, 24 Jan 2015 22:35:30 -0500
Nagios XI Data
License ends in: MSTNQS

nagios (pid 25579) is running...
NPCD running (pid 1953).
ndo2db (pid 2079) is running...
CPU Load 15: 16.14
Total Hosts: 445
Total Services: 2895
Function 'get_base_uri' returns: http://nwd2ng01.corp.analog.com/nagiosxi/
Function 'get_base_url' returns: http://nwd2ng01.corp.analog.com/nagiosxi/
Function 'get_backend_url(internal_call=false)' returns: http://nwd2ng01.corp.analog.com/nagiosxi/includes/components/profile/profile.php
Function 'get_backend_url(internal_call=true)' returns: http://localhost/nagiosxi/backend/
Ping Test localhost
Running:

/bin/ping -c 3 localhost 2>&1 

PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.038 ms
64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=64 time=0.039 ms
64 bytes from localhost (127.0.0.1): icmp_seq=3 ttl=64 time=0.038 ms

--- localhost ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.038/0.038/0.039/0.005 ms
Test wget To localhost
WGET From URL: http://localhost/nagiosxi/includes/components/ccm/
Running:

/usr/bin/wget http://localhost/nagiosxi/includes/components/ccm/ 

--2015-01-24 22:35:33-- http://localhost/nagiosxi/includes/components/ccm/
Resolving localhost... ::1, 127.0.0.1
Connecting to localhost|::1|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: "/usr/local/nagiosxi/tmp/ccm_index.tmp"

0K ........ 342M=0s

2015-01-24 22:35:33 (342 MB/s) - "/usr/local/nagiosxi/tmp/ccm_index.tmp" saved [8407]
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Windows Check_WMI checks sporadic failures

Post by lmiltchev »

In my opinion, this could be either a DNS issue or a firewall issue. I would recommend changing some of your checks (for testing purposes) to use IP address instead of FQDN and see if this is going to resolve the issue. Also, check to see if you are having the same issue if firewall is disabled.
Be sure to check out our Knowledgebase for helpful articles and solutions!
tonyleatwork
Posts: 91
Joined: Mon Jul 07, 2014 8:55 am

Re: Windows Check_WMI checks sporadic failures

Post by tonyleatwork »

lmiltchev wrote:In my opinion, this could be either a DNS issue or a firewall issue. I would recommend changing some of your checks (for testing purposes) to use IP address instead of FQDN and see if this is going to resolve the issue. Also, check to see if you are having the same issue if firewall is disabled.
Hi -

The systems that are in scope have some WMI checks working while other WMI checks aren't. To me, this rules out the firewall / DNS as I believe that would be an "all" or "nothing" type issue.

Also one additional point of data is that the same WMI checks that are failing in Nagios is working in SiteScope (the tool that we are migrating from).

Another way of putting is that I have 80% of WMI checks working for a Windows system in Nagios, while the same system is 100% working (same checks, WMI namespaces) in SiteScope.

How do I get this escalated to a ticket?
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Windows Check_WMI checks sporadic failures

Post by lmiltchev »

How do I get this escalated to a ticket?
Send us an email at [email protected] and type "Re: Windows Check_WMI checks sporadic failures" in the email's subject field.
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Windows Check_WMI checks sporadic failures

Post by tgriep »

Try changing the check_wmi_plus check and add a longer timeout. Try -t 60
The default is 15 seconds and a longer timeout might cure some of the random issues.
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
GldRush98
Posts: 259
Joined: Wed May 25, 2011 10:51 am
Location: Springfield, IL
Contact:

Re: Windows Check_WMI checks sporadic failures

Post by GldRush98 »

I have seen this issue before with WMI checks as well.
Best I could ever determine was there was an issue on the windows WMI side of things.
We never found a way to get WMI to work reliably on the machines that exhibited this problem and switch to nsclient.
Prod XI: Debian 12 - Nagios XI 2026R1.2
Dev XI: Debian 12 - Nagios XI 2026R1.2
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Windows Check_WMI checks sporadic failures

Post by lmiltchev »

Thanks for the feedback, GldRush98!
tonyleatwork, have you tried increasing the timeout as suggested by tgriep?
Be sure to check out our Knowledgebase for helpful articles and solutions!
tonyleatwork
Posts: 91
Joined: Mon Jul 07, 2014 8:55 am

Re: Windows Check_WMI checks sporadic failures

Post by tonyleatwork »

lmiltchev wrote:Thanks for the feedback, GldRush98!
tonyleatwork, have you tried increasing the timeout as suggested by tgriep?
For this specific case that did not work (expected it not to though). I do like the suggestion and may use it for our time out issues.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Windows Check_WMI checks sporadic failures

Post by tgriep »

This message from the wmic output provided, is saying that it could not connect to a Domain Controller.
Cannot contact any KDC for requested realm: unable to reach any KDC in realm ANALOG
Can you check the logs for any error messages on your Domain Controllers?
Be sure to check out our Knowledgebase for helpful articles and solutions!
tonyleatwork
Posts: 91
Joined: Mon Jul 07, 2014 8:55 am

Re: Windows Check_WMI checks sporadic failures

Post by tonyleatwork »

I'm not 100% sure, but I think it just failed on the first KDC in the list. Going further down in the log, you'll see that it actually authenticates OK (with an OK status).
Locked