Service checks via rrd return zero values cluster disks
Posted: Fri Jan 10, 2020 3:45 am
Hi,
We are experiencing problems with performance counter checks via rrd config for average values (over 5 minutes / 10 minutes / 15 minutes).
For instance, there's a Microsoft failover cluster (2 Windows servers) with the following config:
- 2 local disks
- 1 clustered disk that is only mouted at one server, but can failover to the other server.
When configuring the performance checks via rrd config for average values, it runs fine on the server with the clustered disk mounted.
But the server where the clustered disk is not mounted, the following error is thrown and all other service checks which involve performance counters averages are returning zero.
I know it's normal that a performance counter throws an error about a disk that isn't found, but why should all the other performance counters return zero also?
Has by any chance anyone experienced the same issue and found a way to resolve this?
Thanks in advance.
Error message
2020-01-10 09:16:07: error:c:\source\master\modules\CheckSystem\pdh_thread.cpp:247: Failed to query performance counters: PercentDiskNReadTimeAvg Failed to poll counter \LogicalDisk(N:)\% Disk Read Time: c0000bc6: The data is not valid.
Nagios Xi configuration example
Command = $USER1$/check_nrpe -2 -H $HOSTADDRESS$ -t 90 -c $ARG1$ $ARG2$
$ARG1$ = check_pdh
$ARG2$ = -a 'counter:N: % Write Time=PercentDiskNWriteTimeAvg' 'critical=value>100' 'perf-config=*(suffix:none)' 'time=5m'
Local server configuration in nsclient.ini
[/settings/system/windows/counters/PercentDiskNWriteTimeAvg]
; ---------------------------------------------
counter=\LogicalDisk(N:)\% Disk Write Time
collection strategy=rrd
buffer size=1h
We are experiencing problems with performance counter checks via rrd config for average values (over 5 minutes / 10 minutes / 15 minutes).
For instance, there's a Microsoft failover cluster (2 Windows servers) with the following config:
- 2 local disks
- 1 clustered disk that is only mouted at one server, but can failover to the other server.
When configuring the performance checks via rrd config for average values, it runs fine on the server with the clustered disk mounted.
But the server where the clustered disk is not mounted, the following error is thrown and all other service checks which involve performance counters averages are returning zero.
I know it's normal that a performance counter throws an error about a disk that isn't found, but why should all the other performance counters return zero also?
Has by any chance anyone experienced the same issue and found a way to resolve this?
Thanks in advance.
Error message
2020-01-10 09:16:07: error:c:\source\master\modules\CheckSystem\pdh_thread.cpp:247: Failed to query performance counters: PercentDiskNReadTimeAvg Failed to poll counter \LogicalDisk(N:)\% Disk Read Time: c0000bc6: The data is not valid.
Nagios Xi configuration example
Command = $USER1$/check_nrpe -2 -H $HOSTADDRESS$ -t 90 -c $ARG1$ $ARG2$
$ARG1$ = check_pdh
$ARG2$ = -a 'counter:N: % Write Time=PercentDiskNWriteTimeAvg' 'critical=value>100' 'perf-config=*(suffix:none)' 'time=5m'
Local server configuration in nsclient.ini
[/settings/system/windows/counters/PercentDiskNWriteTimeAvg]
; ---------------------------------------------
counter=\LogicalDisk(N:)\% Disk Write Time
collection strategy=rrd
buffer size=1h