RAID controller monitoring
RAID controller monitoring
Hi,
we have a new requirement to monitor raid controller health. Does nagios provide any plugins that can do this?
i did google for third party nagios plugins to monitor raid controller, not sure which one would work for my server.
This is the raid info i found
$ lspci | grep RAID
03:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3108 [Invader] (rev 02)
I want to know if i can leverage nagios tool to monitor this RAID device.
Can anyone help?
we have a new requirement to monitor raid controller health. Does nagios provide any plugins that can do this?
i did google for third party nagios plugins to monitor raid controller, not sure which one would work for my server.
This is the raid info i found
$ lspci | grep RAID
03:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3108 [Invader] (rev 02)
I want to know if i can leverage nagios tool to monitor this RAID device.
Can anyone help?
Re: RAID controller monitoring
I am monitoring about 2000 raid controllers using check_lsi_raid [it requires storcli, the newer version of MegaRaidUtil] from here:
https://github.com/thomas-krenn/check_lsi_raid
As long as its not Cisco hardware. That crap is a nightmare due to there non standard logs / id codes
https://github.com/thomas-krenn/check_lsi_raid
As long as its not Cisco hardware. That crap is a nightmare due to there non standard logs / id codes
Re: RAID controller monitoring
hi,
thank you so much for responding back. i was able to download the plugin, install the strocli and perl dependencies.
$ lspci | grep RAID
02:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2008 [Falcon] (rev 03)
This means that i have a RAID device on my server correct?
when i execute the following command i see
$/opt/MegaRAID/storcli/storcli64 /c0/eall/sall show all
Controller = 0
Status = Failure
Description = Controller 0 not found
i tried executing the plugin
$/usr/local/nagios/libexec/check_lsi_raid -C 1 -p /opt/MegaRAID/storcli/storcli64
Error: invalid controller number, controller not found!
Am i missing something here? This stuff is new for me. Can you please help!
thank you so much for responding back. i was able to download the plugin, install the strocli and perl dependencies.
$ lspci | grep RAID
02:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2008 [Falcon] (rev 03)
This means that i have a RAID device on my server correct?
when i execute the following command i see
$/opt/MegaRAID/storcli/storcli64 /c0/eall/sall show all
Controller = 0
Status = Failure
Description = Controller 0 not found
i tried executing the plugin
$/usr/local/nagios/libexec/check_lsi_raid -C 1 -p /opt/MegaRAID/storcli/storcli64
Error: invalid controller number, controller not found!
Am i missing something here? This stuff is new for me. Can you please help!
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: RAID controller monitoring
looks like your Controller ID is 0 or it is saying their are no controllers found, not sure.
Try
full disclosure we do not have this equipment in house so I cannot test it.
Try
Code: Select all
/usr/local/nagios/libexec/check_lsi_raid -C 0 -p /opt/MegaRAID/storcli/storcli64Re: RAID controller monitoring
Try this to get your controller details:
bommer@XX-001 0 11:53:10 ~ $ sudo /opt/MegaRAID/storcli/storcli64 /call show
[sudo] password for bommer:
Generating detailed summary of the adapter, it may take a while to complete.
CLI Version = 007.0606.0000.0000 Mar 20, 2018
Operating system = Linux 3.10.0-693.43.1.el7.x86_64
Controller = 0
Status = Success
Description = None
Product Name = Cisco 12G SAS Modular Raid Controller
Serial Number = SK74581808
SAS Address = 570708bff885a180
bommer@XX-001 0 11:53:10 ~ $ sudo /opt/MegaRAID/storcli/storcli64 /call show
[sudo] password for bommer:
Generating detailed summary of the adapter, it may take a while to complete.
CLI Version = 007.0606.0000.0000 Mar 20, 2018
Operating system = Linux 3.10.0-693.43.1.el7.x86_64
Controller = 0
Status = Success
Description = None
Product Name = Cisco 12G SAS Modular Raid Controller
Serial Number = SK74581808
SAS Address = 570708bff885a180
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: RAID controller monitoring
Thanks for helping @bomahony
Re: RAID controller monitoring
@scottwilkerson
@bomhony
this is what i see after running the command
# ./check_lsi_raid -C 0 -p /opt/MegaRAID/storcli/storcli64
OK (CTR, LD, PD, CV)|CV_Temperature=24;70;85 ROC_Temperature=56;85;95 c0/e8/s0_Drive_Temperature=24;40;45 c0/e8/s1_Drive_Temperature=25;40;45 c0/e8/s2_Drive_Temperature=24;40;45 c0/e8/s3_Drive_Temperature=24;40;45 c0/e8/s4_Drive_Temperature=24;40;45 c0/e8/s5_Drive_Temperature=24;40;45
it seems to be giving out a set of temperature readings.
attached a file that has out of the command
/opt/MegaRAID/storcli/storcli64 /call show it seems to me that the plugin is working. I need to know if any of the disks has any issues and alert based on that.
Any inputs as to how to tweak this plugin to get the info i need?
@bomhony
this is what i see after running the command
# ./check_lsi_raid -C 0 -p /opt/MegaRAID/storcli/storcli64
OK (CTR, LD, PD, CV)|CV_Temperature=24;70;85 ROC_Temperature=56;85;95 c0/e8/s0_Drive_Temperature=24;40;45 c0/e8/s1_Drive_Temperature=25;40;45 c0/e8/s2_Drive_Temperature=24;40;45 c0/e8/s3_Drive_Temperature=24;40;45 c0/e8/s4_Drive_Temperature=24;40;45 c0/e8/s5_Drive_Temperature=24;40;45
it seems to be giving out a set of temperature readings.
attached a file that has out of the command
/opt/MegaRAID/storcli/storcli64 /call show it seems to me that the plugin is working. I need to know if any of the disks has any issues and alert based on that.
Any inputs as to how to tweak this plugin to get the info i need?
You do not have the required permissions to view the files attached to this post.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: RAID controller monitoring
I agree.vy3734 wrote:it seems to me that the plugin is working.
I looked over the plugin and it does check for a whole bunch of issues automatically (you just don't have any so it is reporting OK)vy3734 wrote:I need to know if any of the disks has any issues and alert based on that.
Any inputs as to how to tweak this plugin to get the info i need?
It looks for the following on the controller
Code: Select all
Degraded
Offline
Critical Disks
Failed Disks
Memory Correctable Errors
Memory Uncorrectable Errors
Failed DisksRe: RAID controller monitoring
ahh, so if any of the parameters are in critcal state it would return critical status and tell what exactly is the issue.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: RAID controller monitoring
correct, at least that is what I deduced looking over the plugin code.vy3734 wrote:ahh, so if any of the parameters are in critcal state it would return critical status and tell what exactly is the issue.