Disk Health Checks

Engage with the community of users including those using the open source solutions.
Includes Nagios Core, Plugins, and NCPA

Disk Health Checks

Postby maxTim » Thu Aug 05, 2021 11:35 pm

hello, I'm pretty new to using Nagios and was looking for a bit of advice.

I have a few servers with some old hardware and I'd like to check disk health*. I've installed smartmontools here on my local machine to get to know it, but the information is pretty complex and I'm not really sure how to aggregate it. I will be looking deeper into this tool as it does seem to be quite comprehensive and informative.

I've browsed to plugins such as check_smart_attributes and check_smartmon. One thing stuck out, however, being that smartctl requires sudo access. I'm not super stoked about giving sudo access to user nagios or modifying the sudoers file. Another thing that I'm wondering about is which checks do I run and how can I decipher them? It's been my experience that just running a simple SMART health check and seeing 'passed' is not very proactive. In fact, I've seen failing drives that 'passed' the SMART health check in BIOS or live rescue environments.

Also, what if I'm running zfs in Debian? I understand that that's a software RAID(?). So then I'm still just checking the individual disks in that case? And what about Windows hosts running ncpa?

Anyways, thanks for any advice.

*Note: Yes, I know - just proactively replace the drives. But that requires capitol, which for some of these devices really isn't always available. These aren't mission-critical devices, but I would like to stay ahead of the curve should the drives begin to fail.
maxTim
 
Posts: 1
Joined: Tue Oct 20, 2020 11:56 pm

Return to Community Support

Who is online

Users browsing this forum: Google [Bot] and 25 guests