Page 1 of 2

RAID controller monitoring

Posted: Wed Jan 23, 2019 11:46 am
by vy3734
Hi,
we have a new requirement to monitor raid controller health. Does nagios provide any plugins that can do this?
i did google for third party nagios plugins to monitor raid controller, not sure which one would work for my server.
This is the raid info i found
$ lspci | grep RAID
03:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3108 [Invader] (rev 02)
I want to know if i can leverage nagios tool to monitor this RAID device.

Can anyone help?

Re: RAID controller monitoring

Posted: Wed Jan 23, 2019 11:59 am
by bomahony
I am monitoring about 2000 raid controllers using check_lsi_raid [it requires storcli, the newer version of MegaRaidUtil] from here:

https://github.com/thomas-krenn/check_lsi_raid

As long as its not Cisco hardware. That crap is a nightmare due to there non standard logs / id codes :(

Re: RAID controller monitoring

Posted: Wed Jan 23, 2019 3:12 pm
by vy3734
hi,
thank you so much for responding back. i was able to download the plugin, install the strocli and perl dependencies.
$ lspci | grep RAID
02:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2008 [Falcon] (rev 03)
This means that i have a RAID device on my server correct?
when i execute the following command i see
$/opt/MegaRAID/storcli/storcli64 /c0/eall/sall show all
Controller = 0
Status = Failure
Description = Controller 0 not found
i tried executing the plugin
$/usr/local/nagios/libexec/check_lsi_raid -C 1 -p /opt/MegaRAID/storcli/storcli64
Error: invalid controller number, controller not found!

Am i missing something here? This stuff is new for me. Can you please help!

Re: RAID controller monitoring

Posted: Wed Jan 23, 2019 5:02 pm
by scottwilkerson
looks like your Controller ID is 0 or it is saying their are no controllers found, not sure.

Try

Code: Select all

/usr/local/nagios/libexec/check_lsi_raid -C 0 -p /opt/MegaRAID/storcli/storcli64
full disclosure we do not have this equipment in house so I cannot test it.

Re: RAID controller monitoring

Posted: Tue Jan 29, 2019 6:54 am
by bomahony
Try this to get your controller details:

bommer@XX-001 0 11:53:10 ~ $ sudo /opt/MegaRAID/storcli/storcli64 /call show
[sudo] password for bommer:
Generating detailed summary of the adapter, it may take a while to complete.

CLI Version = 007.0606.0000.0000 Mar 20, 2018
Operating system = Linux 3.10.0-693.43.1.el7.x86_64
Controller = 0
Status = Success
Description = None

Product Name = Cisco 12G SAS Modular Raid Controller
Serial Number = SK74581808
SAS Address = 570708bff885a180

Re: RAID controller monitoring

Posted: Tue Jan 29, 2019 8:10 am
by scottwilkerson
Thanks for helping @bomahony

Re: RAID controller monitoring

Posted: Thu Jan 31, 2019 1:49 pm
by vy3734
@scottwilkerson
@bomhony
this is what i see after running the command

# ./check_lsi_raid -C 0 -p /opt/MegaRAID/storcli/storcli64
OK (CTR, LD, PD, CV)|CV_Temperature=24;70;85 ROC_Temperature=56;85;95 c0/e8/s0_Drive_Temperature=24;40;45 c0/e8/s1_Drive_Temperature=25;40;45 c0/e8/s2_Drive_Temperature=24;40;45 c0/e8/s3_Drive_Temperature=24;40;45 c0/e8/s4_Drive_Temperature=24;40;45 c0/e8/s5_Drive_Temperature=24;40;45

it seems to be giving out a set of temperature readings.

attached a file that has out of the command
/opt/MegaRAID/storcli/storcli64 /call show
Raid_info.txt
it seems to me that the plugin is working. I need to know if any of the disks has any issues and alert based on that.
Any inputs as to how to tweak this plugin to get the info i need?

Re: RAID controller monitoring

Posted: Thu Jan 31, 2019 3:03 pm
by scottwilkerson
vy3734 wrote:it seems to me that the plugin is working.
I agree.
vy3734 wrote:I need to know if any of the disks has any issues and alert based on that.
Any inputs as to how to tweak this plugin to get the info i need?
I looked over the plugin and it does check for a whole bunch of issues automatically (you just don't have any so it is reporting OK)
It looks for the following on the controller

Code: Select all

Degraded
Offline
Critical Disks
Failed Disks
Memory Correctable Errors
Memory Uncorrectable Errors
Failed Disks
and a bunch of other items on the Physical disk, Virtual disks and the CV module. You can add -vvv to the plugin when running from the command line to see more verbose output

Re: RAID controller monitoring

Posted: Thu Jan 31, 2019 3:14 pm
by vy3734
ahh, so if any of the parameters are in critcal state it would return critical status and tell what exactly is the issue.

Re: RAID controller monitoring

Posted: Thu Jan 31, 2019 3:26 pm
by scottwilkerson
vy3734 wrote:ahh, so if any of the parameters are in critcal state it would return critical status and tell what exactly is the issue.
correct, at least that is what I deduced looking over the plugin code.