check_wmi_plus service check returns critical status when up

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
brdr
Posts: 312
Joined: Mon Jun 02, 2014 12:49 pm

check_wmi_plus service check returns critical status when up

Post by brdr »

Hi folks,

We use XI 5.3.4, and use check_wmi_plus plugin a lot.

We see a regular occurrence of check_wmi_plus pluging returning UNKNOWN status when it checks a windows host. I believe it does this because the host it is checking is too busy to handle the check. When XI goes to check the same host next time it succeeds. This is normal. However, what we are seeing is that when check_wmi_plus runs against a host checking a service it is returning CRITICAL status for services that are UP. It may return CRITICAL over multiple checks before it goes back to OK.

Have you seen this behavior before?

Thx
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: check_wmi_plus service check returns critical status whe

Post by mcapra »

Can you share some historical Nagios logs that show the status outputs that are generated when a CRITICAL occurs? Feel free to PM them if you have security concerns.

The historical logs are typically found here:

Code: Select all

/usr/local/nagios/var/archives
Former Nagios employee
https://www.mcapra.com/
brdr
Posts: 312
Joined: Mon Jun 02, 2014 12:49 pm

Re: check_wmi_plus service check returns critical status whe

Post by brdr »

Sure. Here is a snippet from today's log... shows UKNOWN to OK to CRITICAL. The log entries below are sequential.....

[1493363878] SERVICE ALERT: sea-600-61;LZ SS ODB 01 Sentinel DataServer;UNKNOWN;SOFT;1;UNKNOWN - The WMI query had problems. The error text from wmic is: [wmi/wmic.c:212:main()] ERROR: Retrieve result data.
[1493363995] SERVICE ALERT: sea-600-61;LZ SS ODB 01 Sentinel DataServer;OK;SOFT;2;OK - Found 1 Services(s), 1 OK and 0 with problems (0 excluded). 'LZ SS ODB 01 Sentinel DataServer' (lz_odb_ss_sentineldataserver) is Running.
[1493366669] SERVICE ALERT: sea-600-61;LZ SS ODB 01 Sentinel DataServer;UNKNOWN;SOFT;1;UNKNOWN - The WMI query had problems. The error text from wmic is: [wmi/wmic.c:212:main()] ERROR: Retrieve result data.
[1493366790] SERVICE ALERT: sea-600-61;LZ SS ODB 01 Sentinel DataServer;OK;SOFT;2;OK - Found 1 Services(s), 1 OK and 0 with problems (0 excluded). 'LZ SS ODB 01 Sentinel DataServer' (lz_odb_ss_sentineldataserver) is Running.
[1493366967] SERVICE ALERT: sea-600-61;LZ SS ODB 01 Sentinel DataServer;UNKNOWN;SOFT;1;UNKNOWN - The WMI query had problems. The error text from wmic is: [wmi/wmic.c:212:main()] ERROR: Retrieve result data.
[1493367085] SERVICE ALERT: sea-600-61;LZ SS ODB 01 Sentinel DataServer;UNKNOWN;SOFT;2;UNKNOWN - The WMI query had problems. The error text from wmic is: [wmi/wmic.c:212:main()] ERROR: Retrieve result data.
[1493367209] SERVICE ALERT: sea-600-61;LZ SS ODB 01 Sentinel DataServer;OK;SOFT;3;OK - Found 1 Services(s), 1 OK and 0 with problems (0 excluded). 'LZ SS ODB 01 Sentinel DataServer' (lz_odb_ss_sentineldataserver) is Running.
[1493367559] SERVICE ALERT: sea-600-61;LZ SS ODB 01 Sentinel DataServer;UNKNOWN;SOFT;1;UNKNOWN - The WMI query had problems. The error text from wmic is: [wmi/wmic.c:212:main()] ERROR: Retrieve result data.
[1493367680] SERVICE ALERT: sea-600-61;LZ SS ODB 01 Sentinel DataServer;OK;SOFT;2;OK - Found 1 Services(s), 1 OK and 0 with problems (0 excluded). 'LZ SS ODB 01 Sentinel DataServer' (lz_odb_ss_sentineldataserver) is Running.
[1493367856] SERVICE ALERT: sea-600-61;LZ SS ODB 01 Sentinel DataServer;CRITICAL;SOFT;1;CRITICAL - [Triggered by _NumGood<1] - Found 0 Services(s), 0 OK and 0 with problems (0 excluded).
[1493367975] SERVICE ALERT: sea-600-61;LZ SS ODB 01 Sentinel DataServer;OK;SOFT;2;OK - Found 1 Services(s), 1 OK and 0 with problems (0 excluded). 'LZ SS ODB 01 Sentinel DataServer' (lz_odb_ss_sentineldataserver) is Running.
[1493368509] SERVICE ALERT: sea-600-61;LZ SS ODB 01 Sentinel DataServer;UNKNOWN;SOFT;1;UNKNOWN - The WMI query had problems. The error text from wmic is: [wmi/wmic.c:212:main()] ERROR: Retrieve result data.
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: check_wmi_plus service check returns critical status whe

Post by mcapra »

Honestly, WMI checks are a bit heavy in general and take a long time to make the trip. It might make sense to increase the timeout of your checks using the -t argument for check_wmi_plus. You could probably do that in the command definition itself and have it apply to all your current checks.

If that doesn't work, this document has some debug steps I'd recommend running and sharing the outputs for:
https://support.nagios.com/kb/article.php?id=579

Specifically, enabling better wmic debugging with this:

Code: Select all

 --extrawmicarg "--debuglevel=4"
Former Nagios employee
https://www.mcapra.com/
brdr
Posts: 312
Joined: Mon Jun 02, 2014 12:49 pm

Re: check_wmi_plus service check returns critical status whe

Post by brdr »

Thanks. I tried increasing the timeout to 60 seconds and had no affect. I know that WMI is heavy... Will def try out the debug on Monday.

Thx for your help. Will circle back on Monday.

brdr
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: check_wmi_plus service check returns critical status whe

Post by tmcdonald »

We'll keep this open for you.
Former Nagios employee
Locked