Health Checks
Health checks allow you to query different aspects of your system to be alerted of potential issues. A good example is checking the SMART attributes of a hard drive.
Permissions
Some of these checks require elevated permissions, more commonly in Linux to the sudoers file.
Nagios Plugins / NRPE / Linux
The following KB articles explain how to add sudoers entries and to configure NRPE to execute checks using sudo.
Warning: This Plugin Must Be Either Run As Root Or Setuid
NRPE: Unable To Read Output (The Plugin Requires "sudo" Privileges)
Disk Health
Disk health checks can be queried by checking the SMART status.
Nagios Plugins
The check_ide_smart plugin is part of Nagios Plugins.
Command:
/usr/local/nagios/libexec/check_ide_smart -d /dev/sda
Output:
OK - Operational (15/15 tests passed)
NCPA
NPCA does not includes a SMART disk module.
NSClient++ via check_nt
NSClient++ does not includes a SMART disk module.
NSClient++ via check_nrpe
NSClient++ does not includes a SMART disk module.
WMI
Check WMI Plus includes a checksmart module.
Command:
./check_wmi_plus.pl -H 10.25.11.2 -u wmiagent -p Str0ngP@ssw0rd -m checksmart
Output:
Overall Status - OK - Found 1 Disks(s), 1 OK and 0 failing |'WD-WXXXXXXXXXXX_Reallocated_Sector_Count'=0; 'WD-WXXXXXXXXXXX_Power_On_Hours'=12401;
'WD-WXXXXXXXXXXX_Power_Cycle_Count'=441; 'WD-WXXXXXXXXXXX_Temperature'=36; 'WD-WXXXXXXXXXXX_Current_Pending_Sector'=0; 'WD-WXXXXXXXXXXX_Offline_Uncorrectable'=0;
OK - Dev#0, WDC WD50000000-0000001, Serial#WD-WXXXXXXXXXXX, PredictFailure=False, Temperature=36
SNMP
Using SNMP to check disk health is possible but requires some additional work that is not possible to cover in this KB article.
Sensors
Sensor data is usually attributed to temperature and fan data.
Nagios Plugins
The check_sensors plugin is part of Nagios Plugins. It uses the program sensors/lm_sensors to interrogate the system hardware like CPU and Motherboard, this can report temperature and fan alerts.
Command:./check_sensors
Output:
SENSOR CRITICAL - Sensor alarm detected!
NCPA
NPCA does not include any sensor capabilities.
NSClient++ via check_nt
NSClient++ does not include any sensor capabilities.
NSClient++ via check_nrpe
NSClient++ does not include any sensor capabilities.
WMI
Check WMI Plus does not include any sensor capabilities.
SNMP
The check_snmp plugin allows you to target any type of sensor using SNMP. Here is an example of checking the internal temperature sensor on an APC UPS:
Command:
./check_snmp -H 10.25.19.2 -C box293 -P 2c -o PowerNet-MIB::upsHighPrecBatteryTemperature.0 -c 450
Output:
SNMP OK - 391 | PowerNet-MIB::upsHighPrecBatteryTemperature.0=391;;450;
The value the UPS returns is in tenths of degrees Celsius. If you wanted to make this easier to read you would need to create a wrapper script to manipulate the results.
Final Thoughts
For any support related questions please visit the Nagios Support Forums at: