Service checks via rrd return zero values cluster disks

faajbuhr · Post by **faajbuhr** » Fri Jan 10, 2020 3:45 am

Hi,

We are experiencing problems with performance counter checks via rrd config for average values (over 5 minutes / 10 minutes / 15 minutes).
For instance, there's a Microsoft failover cluster (2 Windows servers) with the following config:
- 2 local disks
- 1 clustered disk that is only mouted at one server, but can failover to the other server.

When configuring the performance checks via rrd config for average values, it runs fine on the server with the clustered disk mounted.
But the server where the clustered disk is not mounted, the following error is thrown and all other service checks which involve performance counters averages are returning zero.
I know it's normal that a performance counter throws an error about a disk that isn't found, but why should all the other performance counters return zero also?

Has by any chance anyone experienced the same issue and found a way to resolve this?
Thanks in advance.

Error message
2020-01-10 09:16:07: error:c:\source\master\modules\CheckSystem\pdh_thread.cpp:247: Failed to query performance counters: PercentDiskNReadTimeAvg Failed to poll counter \LogicalDisk(N:)\% Disk Read Time: c0000bc6: The data is not valid.

Nagios Xi configuration example
Command = $USER1$/check_nrpe -2 -H $HOSTADDRESS$ -t 90 -c $ARG1$ $ARG2$
$ARG1$ = check_pdh
$ARG2$ = -a 'counter:N: % Write Time=PercentDiskNWriteTimeAvg' 'critical=value>100' 'perf-config=*(suffix:none)' 'time=5m'

Local server configuration in nsclient.ini
[/settings/system/windows/counters/PercentDiskNWriteTimeAvg]
; ---------------------------------------------
counter=\LogicalDisk(N:)\% Disk Write Time
collection strategy=rrd
buffer size=1h

Post by **tacolover101** » Fri Jan 10, 2020 2:16 pm

i believe this is expected, since the clustered disk isn't mounted.

now, to fix the perf data, you could write a wrapper on either the windows side in powershell, or the bash side on nagios.
1. if on the windows side, write a powershell script to check if it's mounted, if so, then run the normal command
if not, then run the command excluding the clustered disk

2. if on the nagios side, write a command to utilize check_nrpe, and run the check as expected. write logic to check if the perf data is null, if so, then run the command again excluding the clustered disk

both ways are essentially checking if it's mounted, and running the command needed to return perf data.

you could likely write the logic a few more ways - just wanted to give you an idea of what's possible.

scottwilkerson · Post by **scottwilkerson** » Mon Jan 13, 2020 9:28 am

Thanks @tacolover101!

Nagios Support Forum

Service checks via rrd return zero values cluster disks

Service checks via rrd return zero values cluster disks

Re: Service checks via rrd return zero values cluster disks

Re: Service checks via rrd return zero values cluster disks