RAID controller monitoring

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
vy3734
Posts: 109
Joined: Tue Sep 29, 2015 4:48 pm

RAID controller monitoring

Post by vy3734 »

Hi,
we have a new requirement to monitor raid controller health. Does nagios provide any plugins that can do this?
i did google for third party nagios plugins to monitor raid controller, not sure which one would work for my server.
This is the raid info i found
$ lspci | grep RAID
03:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3108 [Invader] (rev 02)
I want to know if i can leverage nagios tool to monitor this RAID device.

Can anyone help?
bomahony
Posts: 133
Joined: Wed Jul 04, 2018 10:46 am

Re: RAID controller monitoring

Post by bomahony »

I am monitoring about 2000 raid controllers using check_lsi_raid [it requires storcli, the newer version of MegaRaidUtil] from here:

https://github.com/thomas-krenn/check_lsi_raid

As long as its not Cisco hardware. That crap is a nightmare due to there non standard logs / id codes :(
vy3734
Posts: 109
Joined: Tue Sep 29, 2015 4:48 pm

Re: RAID controller monitoring

Post by vy3734 »

hi,
thank you so much for responding back. i was able to download the plugin, install the strocli and perl dependencies.
$ lspci | grep RAID
02:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2008 [Falcon] (rev 03)
This means that i have a RAID device on my server correct?
when i execute the following command i see
$/opt/MegaRAID/storcli/storcli64 /c0/eall/sall show all
Controller = 0
Status = Failure
Description = Controller 0 not found
i tried executing the plugin
$/usr/local/nagios/libexec/check_lsi_raid -C 1 -p /opt/MegaRAID/storcli/storcli64
Error: invalid controller number, controller not found!

Am i missing something here? This stuff is new for me. Can you please help!
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: RAID controller monitoring

Post by scottwilkerson »

looks like your Controller ID is 0 or it is saying their are no controllers found, not sure.

Try

Code: Select all

/usr/local/nagios/libexec/check_lsi_raid -C 0 -p /opt/MegaRAID/storcli/storcli64
full disclosure we do not have this equipment in house so I cannot test it.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
bomahony
Posts: 133
Joined: Wed Jul 04, 2018 10:46 am

Re: RAID controller monitoring

Post by bomahony »

Try this to get your controller details:

bommer@XX-001 0 11:53:10 ~ $ sudo /opt/MegaRAID/storcli/storcli64 /call show
[sudo] password for bommer:
Generating detailed summary of the adapter, it may take a while to complete.

CLI Version = 007.0606.0000.0000 Mar 20, 2018
Operating system = Linux 3.10.0-693.43.1.el7.x86_64
Controller = 0
Status = Success
Description = None

Product Name = Cisco 12G SAS Modular Raid Controller
Serial Number = SK74581808
SAS Address = 570708bff885a180
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: RAID controller monitoring

Post by scottwilkerson »

Thanks for helping @bomahony
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
vy3734
Posts: 109
Joined: Tue Sep 29, 2015 4:48 pm

Re: RAID controller monitoring

Post by vy3734 »

@scottwilkerson
@bomhony
this is what i see after running the command

# ./check_lsi_raid -C 0 -p /opt/MegaRAID/storcli/storcli64
OK (CTR, LD, PD, CV)|CV_Temperature=24;70;85 ROC_Temperature=56;85;95 c0/e8/s0_Drive_Temperature=24;40;45 c0/e8/s1_Drive_Temperature=25;40;45 c0/e8/s2_Drive_Temperature=24;40;45 c0/e8/s3_Drive_Temperature=24;40;45 c0/e8/s4_Drive_Temperature=24;40;45 c0/e8/s5_Drive_Temperature=24;40;45

it seems to be giving out a set of temperature readings.

attached a file that has out of the command
/opt/MegaRAID/storcli/storcli64 /call show
Raid_info.txt
it seems to me that the plugin is working. I need to know if any of the disks has any issues and alert based on that.
Any inputs as to how to tweak this plugin to get the info i need?
You do not have the required permissions to view the files attached to this post.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: RAID controller monitoring

Post by scottwilkerson »

vy3734 wrote:it seems to me that the plugin is working.
I agree.
vy3734 wrote:I need to know if any of the disks has any issues and alert based on that.
Any inputs as to how to tweak this plugin to get the info i need?
I looked over the plugin and it does check for a whole bunch of issues automatically (you just don't have any so it is reporting OK)
It looks for the following on the controller

Code: Select all

Degraded
Offline
Critical Disks
Failed Disks
Memory Correctable Errors
Memory Uncorrectable Errors
Failed Disks
and a bunch of other items on the Physical disk, Virtual disks and the CV module. You can add -vvv to the plugin when running from the command line to see more verbose output
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
vy3734
Posts: 109
Joined: Tue Sep 29, 2015 4:48 pm

Re: RAID controller monitoring

Post by vy3734 »

ahh, so if any of the parameters are in critcal state it would return critical status and tell what exactly is the issue.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: RAID controller monitoring

Post by scottwilkerson »

vy3734 wrote:ahh, so if any of the parameters are in critcal state it would return critical status and tell what exactly is the issue.
correct, at least that is what I deduced looking over the plugin code.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked