Simplest way to check all logical disks

jdsa4444 · Post by **jdsa4444** » Wed Jun 22, 2016 8:17 am

Hi everyone, I'm a total newbie to Nagios, bear with me.

I've come into an environment in the last year using Nagios Core 3.2.0 - (I got that much from running the config checker). About all I've done is add some hosts and simple check_nt checks for system and service statuses. Now we're at the point where I'd like to try something new. Rather than manually specify each disk to check C:\ D:\ E:\ etc. I'd like to run a command that will check them all, that way any server with any drive letter combination will be covered, and we won't miss some random server's Z:\ drive (if you follow me). I read somewhere there is a NRPE check that can do that, and also saw people's custom plugins. But I'm wondering what is the best approach for a linux and nagios newb such as I.

I'm not sure we even have the NRPE plugin on our server yet. How can I check for that? And how can I add it if we don't. Our NSClient++ on the server does have NRPE enabled. Is this the easiest way?

I don't want to mess up our nagios server.

Thanks.

rkennedy · Post by **rkennedy** » Wed Jun 22, 2016 10:42 am

Are you currently running NRPE on the Windows machine? You should be able to do so with check_drivesize - here's an example:

Code: Select all

[root@localhost libexec]# ./check_nrpe -H 192.168.5.47 -c check_drivesize -a crit='used > 90%'
OK All 4 drive(s) are ok|'C:\ used'=64.228GB;374.89921;421.76161;0;468.62402 'C:\ used %'=13%;80;90;0;100 'D:\ used'=0B;0;0;0;0 'X:\ used'=217.42253GB;360.89218;406.0037;0;451.11523 'X:\ used %'=48%;79;89;0;100 '\\?\Volume{cc48a6c4-0899-11e5-befc-806e6f6e6963}\ used'=8.96619GB;9.38593;10.55917;0;11.73241 '\\?\Volume{cc48a6c4-0899-11e5-befc-806e6f6e6963}\ used %'=76%;79;89;0;100

This will check all drives on the machine, and send a critical warning if any of them reach 90%.

Post by **Box293** » Wed Jun 22, 2016 8:21 pm

This guide shows you how to install the check_nrpe plugin on your Nagios server.

http://sites.box293.com/nagios/guides/n ... centos-6-5

nozlaf · Post by **nozlaf** » Wed Jun 22, 2016 9:53 pm

jdsa4444 wrote: that way any server with any drive letter combination will be covered, and we won't miss some random server's Z:\ drive (if you follow me).

This is true and I certainly am on board with this method, but there is another side to this coin if a disk vanishes from a system there is actually no alert...

I know that sounds silly right ? but say if you reboot a server every day or weekly or whatever and the disk fails or is iscsi or whatever and doesn't mount / attach when the machine comes up you get absolutely no notification for it

also if you have it check all disks at once the thresholds are not able to me adjusted so easily

for example

your C drive on server may be 40gb
and you may have a few 500gb disks for storage

10% of the 40gb disk is significantly different to 10% of the 500gb disk

when you check them all you get this situation where all of your disks have the same check which is less than ideal

but at the same time adding each disk individually is a pain and its easy for people to add disks to servers and forget to add it to monitoring

so as a wise girl once said..... "Why not have both"

Post by **eloyd** » Thu Jun 23, 2016 8:09 am

I have to agree with @nozlaf.

You should know what drives are on your servers. It's not hard to duplicate a service check, change "C" to "D" and repeat 24 times to get drive configurations for C-Z (we even have some servers with a B drive!). Once you've got that, assign the service to a hostgroup called "Windows has C Drive" and "Windows has D drive" and repeat 24 times for C-Z drive hostgroups.

NOW you can simply add machines to hostgroups based on what disks they have. You'll get alerted when there are problems, missing disks, and so forth, rather than one big ugly check that might get cut off due to buffer limits on the output. Plus, if your tmp drive is 95% full until tonight's reboot 14 hours from now, you won't get alerts on the massive single check_drivespace check just because one drive that you don't really care much about is full.

I know it sounds like a pain, but you also get capacity planning capabilities and downtime reporting that is much more granular if you set them up separately from the beginning.

jdsa4444 · Post by **jdsa4444** » Thu Jun 23, 2016 8:43 am

I guess what I'm worried about is a drive getting added by another admin and a check not setup in Nagios. Let's just say this sort of thing happens a lot.

Thanks for the thoughts though. I can see value on both sides and I may end up going a different route altogether. I'll play around with the different suggestions and revisit this. Thanks everybody.

Post by **eloyd** » Thu Jun 23, 2016 9:16 am

Since you're posting in a Nagios Core forum, I assume you have some scripting abilities.

You could always dynamically create a set of service checks to find drives on the remote host. You could, for instance, use cron to fire off the check_drivesize check, parse the output, find a list of drives on the machine, then create new checks for each of those drives and repopulate the Nagios configs. Run your drive checker at 3am and you'll have two welcome side-effects:

1) You'll always have a list of update drives to check on the server(s).
2) Since you're NOT deleting old drive configurations, you'll know if someone removes a drive because it will eventually go critical once the drive has been removed but Nagios is still checking it.

rkennedy · Post by **rkennedy** » Thu Jun 23, 2016 9:33 am

eloyd wrote:Since you're posting in a Nagios Core forum, I assume you have some scripting abilities.

You could always dynamically create a set of service checks to find drives on the remote host. You could, for instance, use cron to fire off the check_drivesize check, parse the output, find a list of drives on the machine, then create new checks for each of those drives and repopulate the Nagios configs. Run your drive checker at 3am and you'll have two welcome side-effects:

1) You'll always have a list of update drives to check on the server(s).
2) Since you're NOT deleting old drive configurations, you'll know if someone removes a drive because it will eventually go critical once the drive has been removed but Nagios is still checking it.

^ This is a really great idea. Nice one @eloyd! The blue print has been completed

@jdsa4444 - let us know if you have any further questions.

Post by **eloyd** » Thu Jun 23, 2016 9:41 am

I have a book full of good ideas. Unfortunately, it's got the word "manifesto" in the title and I'm having a hard time getting it published.

ssax · Post by **ssax** » Thu Jun 23, 2016 1:28 pm

Another problem that I see is that if you are checking a disk that is currently present (and returning performance data for it), if the disk ever gets removed the performance data for all the drives in that check will stop updating because the RRD is expecting that additional datasource when inserting.

Nagios Support Forum

Simplest way to check all logical disks

Simplest way to check all logical disks

Re: Simplest way to check all logical disks

Re: Simplest way to check all logical disks

Re: Simplest way to check all logical disks

Re: Simplest way to check all logical disks

Re: Simplest way to check all logical disks

Re: Simplest way to check all logical disks

Re: Simplest way to check all logical disks

Re: Simplest way to check all logical disks

Re: Simplest way to check all logical disks