Service checks via rrd return zero values cluster disks

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
faajbuhr
Posts: 2
Joined: Wed Jul 02, 2014 5:58 am

Service checks via rrd return zero values cluster disks

Post by faajbuhr »

Hi,

We are experiencing problems with performance counter checks via rrd config for average values (over 5 minutes / 10 minutes / 15 minutes).
For instance, there's a Microsoft failover cluster (2 Windows servers) with the following config:
- 2 local disks
- 1 clustered disk that is only mouted at one server, but can failover to the other server.

When configuring the performance checks via rrd config for average values, it runs fine on the server with the clustered disk mounted.
But the server where the clustered disk is not mounted, the following error is thrown and all other service checks which involve performance counters averages are returning zero.
I know it's normal that a performance counter throws an error about a disk that isn't found, but why should all the other performance counters return zero also?

Has by any chance anyone experienced the same issue and found a way to resolve this?
Thanks in advance.


Error message
2020-01-10 09:16:07: error:c:\source\master\modules\CheckSystem\pdh_thread.cpp:247: Failed to query performance counters: PercentDiskNReadTimeAvg Failed to poll counter \LogicalDisk(N:)\% Disk Read Time: c0000bc6: The data is not valid.

Nagios Xi configuration example
Command = $USER1$/check_nrpe -2 -H $HOSTADDRESS$ -t 90 -c $ARG1$ $ARG2$
$ARG1$ = check_pdh
$ARG2$ = -a 'counter:N: % Write Time=PercentDiskNWriteTimeAvg' 'critical=value>100' 'perf-config=*(suffix:none)' 'time=5m'

Local server configuration in nsclient.ini
[/settings/system/windows/counters/PercentDiskNWriteTimeAvg]
; ---------------------------------------------
counter=\LogicalDisk(N:)\% Disk Write Time
collection strategy=rrd
buffer size=1h
User avatar
tacolover101
Posts: 432
Joined: Mon Apr 10, 2017 11:55 am

Re: Service checks via rrd return zero values cluster disks

Post by tacolover101 »

i believe this is expected, since the clustered disk isn't mounted.

now, to fix the perf data, you could write a wrapper on either the windows side in powershell, or the bash side on nagios.
1. if on the windows side, write a powershell script to check if it's mounted, if so, then run the normal command
if not, then run the command excluding the clustered disk

2. if on the nagios side, write a command to utilize check_nrpe, and run the check as expected. write logic to check if the perf data is null, if so, then run the command again excluding the clustered disk

both ways are essentially checking if it's mounted, and running the command needed to return perf data.

you could likely write the logic a few more ways - just wanted to give you an idea of what's possible.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Service checks via rrd return zero values cluster disks

Post by scottwilkerson »

Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
Locked