Page 1 of 1

check_wmi_plus service check returns critical status when up

Posted: Fri Apr 28, 2017 9:00 am
by brdr
Hi folks,

We use XI 5.3.4, and use check_wmi_plus plugin a lot.

We see a regular occurrence of check_wmi_plus pluging returning UNKNOWN status when it checks a windows host. I believe it does this because the host it is checking is too busy to handle the check. When XI goes to check the same host next time it succeeds. This is normal. However, what we are seeing is that when check_wmi_plus runs against a host checking a service it is returning CRITICAL status for services that are UP. It may return CRITICAL over multiple checks before it goes back to OK.

Have you seen this behavior before?

Thx

Re: check_wmi_plus service check returns critical status whe

Posted: Fri Apr 28, 2017 10:28 am
by mcapra
Can you share some historical Nagios logs that show the status outputs that are generated when a CRITICAL occurs? Feel free to PM them if you have security concerns.

The historical logs are typically found here:

Code: Select all

/usr/local/nagios/var/archives

Re: check_wmi_plus service check returns critical status whe

Posted: Fri Apr 28, 2017 1:30 pm
by brdr
Sure. Here is a snippet from today's log... shows UKNOWN to OK to CRITICAL. The log entries below are sequential.....

[1493363878] SERVICE ALERT: sea-600-61;LZ SS ODB 01 Sentinel DataServer;UNKNOWN;SOFT;1;UNKNOWN - The WMI query had problems. The error text from wmic is: [wmi/wmic.c:212:main()] ERROR: Retrieve result data.
[1493363995] SERVICE ALERT: sea-600-61;LZ SS ODB 01 Sentinel DataServer;OK;SOFT;2;OK - Found 1 Services(s), 1 OK and 0 with problems (0 excluded). 'LZ SS ODB 01 Sentinel DataServer' (lz_odb_ss_sentineldataserver) is Running.
[1493366669] SERVICE ALERT: sea-600-61;LZ SS ODB 01 Sentinel DataServer;UNKNOWN;SOFT;1;UNKNOWN - The WMI query had problems. The error text from wmic is: [wmi/wmic.c:212:main()] ERROR: Retrieve result data.
[1493366790] SERVICE ALERT: sea-600-61;LZ SS ODB 01 Sentinel DataServer;OK;SOFT;2;OK - Found 1 Services(s), 1 OK and 0 with problems (0 excluded). 'LZ SS ODB 01 Sentinel DataServer' (lz_odb_ss_sentineldataserver) is Running.
[1493366967] SERVICE ALERT: sea-600-61;LZ SS ODB 01 Sentinel DataServer;UNKNOWN;SOFT;1;UNKNOWN - The WMI query had problems. The error text from wmic is: [wmi/wmic.c:212:main()] ERROR: Retrieve result data.
[1493367085] SERVICE ALERT: sea-600-61;LZ SS ODB 01 Sentinel DataServer;UNKNOWN;SOFT;2;UNKNOWN - The WMI query had problems. The error text from wmic is: [wmi/wmic.c:212:main()] ERROR: Retrieve result data.
[1493367209] SERVICE ALERT: sea-600-61;LZ SS ODB 01 Sentinel DataServer;OK;SOFT;3;OK - Found 1 Services(s), 1 OK and 0 with problems (0 excluded). 'LZ SS ODB 01 Sentinel DataServer' (lz_odb_ss_sentineldataserver) is Running.
[1493367559] SERVICE ALERT: sea-600-61;LZ SS ODB 01 Sentinel DataServer;UNKNOWN;SOFT;1;UNKNOWN - The WMI query had problems. The error text from wmic is: [wmi/wmic.c:212:main()] ERROR: Retrieve result data.
[1493367680] SERVICE ALERT: sea-600-61;LZ SS ODB 01 Sentinel DataServer;OK;SOFT;2;OK - Found 1 Services(s), 1 OK and 0 with problems (0 excluded). 'LZ SS ODB 01 Sentinel DataServer' (lz_odb_ss_sentineldataserver) is Running.
[1493367856] SERVICE ALERT: sea-600-61;LZ SS ODB 01 Sentinel DataServer;CRITICAL;SOFT;1;CRITICAL - [Triggered by _NumGood<1] - Found 0 Services(s), 0 OK and 0 with problems (0 excluded).
[1493367975] SERVICE ALERT: sea-600-61;LZ SS ODB 01 Sentinel DataServer;OK;SOFT;2;OK - Found 1 Services(s), 1 OK and 0 with problems (0 excluded). 'LZ SS ODB 01 Sentinel DataServer' (lz_odb_ss_sentineldataserver) is Running.
[1493368509] SERVICE ALERT: sea-600-61;LZ SS ODB 01 Sentinel DataServer;UNKNOWN;SOFT;1;UNKNOWN - The WMI query had problems. The error text from wmic is: [wmi/wmic.c:212:main()] ERROR: Retrieve result data.

Re: check_wmi_plus service check returns critical status whe

Posted: Fri Apr 28, 2017 2:06 pm
by mcapra
Honestly, WMI checks are a bit heavy in general and take a long time to make the trip. It might make sense to increase the timeout of your checks using the -t argument for check_wmi_plus. You could probably do that in the command definition itself and have it apply to all your current checks.

If that doesn't work, this document has some debug steps I'd recommend running and sharing the outputs for:
https://support.nagios.com/kb/article.php?id=579

Specifically, enabling better wmic debugging with this:

Code: Select all

 --extrawmicarg "--debuglevel=4"

Re: check_wmi_plus service check returns critical status whe

Posted: Fri Apr 28, 2017 2:22 pm
by brdr
Thanks. I tried increasing the timeout to 60 seconds and had no affect. I know that WMI is heavy... Will def try out the debug on Monday.

Thx for your help. Will circle back on Monday.

brdr

Re: check_wmi_plus service check returns critical status whe

Posted: Mon May 01, 2017 9:07 am
by tmcdonald
We'll keep this open for you.