Software Raid Check
-
Shivaramakrishnan
- Posts: 71
- Joined: Tue May 15, 2012 10:11 pm
Software Raid Check
Hi
I have a bash script to check the software raid.
What I want is to check even the Hard Drive status for me and output OK only if it has passed the smartctl test,It should check all the drives in the software raid.Can anyone help me in this regard?
The below code only checks for the software raid and does not check the hard drive status.I want to include it in my script.
Below the code.
#!/bin/bash
# Get count of raid arrays
RAID_DEVICES=`grep ^md -c /proc/mdstat`
# Get count of degraded arrays
RAID_STATUS=`grep "\[.*_.*\]" /proc/mdstat -c`
# Is an array currently recovering, get percentage of recovery
RAID_RECOVER=`grep recovery /proc/mdstat | awk '{print $4}'`
RAID_RESYNC=`grep resync /proc/mdstat | awk '{print $4}'`
# Check raid status
# RAID recovers --> Warning
if [[ $RAID_RECOVER ]]; then
STATUS="Software Raid WARNING - Checked $RAID_DEVICES arrays, recovering : $RAID_RECOVER"
EXIT=1
elif [[ $RAID_RESYNC ]]; then
STATUS="Software Raid WARNING - Checked $RAID_DEVICES arrays, resync : $RAID_RESYNC"
EXIT=1
# RAID ok
elif [[ $RAID_STATUS == "0" ]]; then
STATUS="Software Raid OK - Checked $RAID_DEVICES arrays."
EXIT=0
# All else critical, better save than sorry
else
STATUS="Software Raid CRITICAL - Checked $RAID_DEVICES arrays, $RAID_STATUS have FAILED"
EXIT=2
fi
# Status and quit
echo $STATUS
exit $EXIT
I have a bash script to check the software raid.
What I want is to check even the Hard Drive status for me and output OK only if it has passed the smartctl test,It should check all the drives in the software raid.Can anyone help me in this regard?
The below code only checks for the software raid and does not check the hard drive status.I want to include it in my script.
Below the code.
#!/bin/bash
# Get count of raid arrays
RAID_DEVICES=`grep ^md -c /proc/mdstat`
# Get count of degraded arrays
RAID_STATUS=`grep "\[.*_.*\]" /proc/mdstat -c`
# Is an array currently recovering, get percentage of recovery
RAID_RECOVER=`grep recovery /proc/mdstat | awk '{print $4}'`
RAID_RESYNC=`grep resync /proc/mdstat | awk '{print $4}'`
# Check raid status
# RAID recovers --> Warning
if [[ $RAID_RECOVER ]]; then
STATUS="Software Raid WARNING - Checked $RAID_DEVICES arrays, recovering : $RAID_RECOVER"
EXIT=1
elif [[ $RAID_RESYNC ]]; then
STATUS="Software Raid WARNING - Checked $RAID_DEVICES arrays, resync : $RAID_RESYNC"
EXIT=1
# RAID ok
elif [[ $RAID_STATUS == "0" ]]; then
STATUS="Software Raid OK - Checked $RAID_DEVICES arrays."
EXIT=0
# All else critical, better save than sorry
else
STATUS="Software Raid CRITICAL - Checked $RAID_DEVICES arrays, $RAID_STATUS have FAILED"
EXIT=2
fi
# Status and quit
echo $STATUS
exit $EXIT
Re: Software Raid Check
There are already plugins for this here and here.
If those don't help, could you give an example of how you would run smartctl from the command line to manually check your disks? Also, include some sample output. I don't use smartctl myself, but I'm fairly competent at shell scripting and could probably get it working.
If those don't help, could you give an example of how you would run smartctl from the command line to manually check your disks? Also, include some sample output. I don't use smartctl myself, but I'm fairly competent at shell scripting and could probably get it working.
-
Shivaramakrishnan
- Posts: 71
- Joined: Tue May 15, 2012 10:11 pm
Re: Software Raid Check
The usage for smartctl is shown below.
I tried to get the script working,I wanted your help on this:
SWRAID:/usr/lib/nagios/plugins# smartctl -H /dev/sda
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
Now I want to extract the result ("PASSED" in this case) here and incorporate that in the script along mdstat check.Currently I am not able to just extract the result -"PASSED" in this script.
Present Output:
Hard drive status for /dev/sda is
Hard drive status for /dev/sdb is
Software Raid WARNING - Checked 6 arrays, resync : 93.6%
CODE
#!/bin/bash
# Get count of raid arrays
RAID_DEVICES=`grep ^md -c /proc/mdstat`
#Get count of Physical Disks
DISKS="`blkid | grep sd | cut -c 1-8 | uniq`"
SMART=/usr/sbin/smartctl
ARGS="-H"
# Get count of degraded arrays
RAID_STATUS=`grep "\[.*_.*\]" /proc/mdstat -c`
# Is an array currently recovering, get percentage of recovery
RAID_RECOVER=`grep recovery /proc/mdstat | awk '{print $4}'`
RAID_RESYNC=`grep resync /proc/mdstat | awk '{print $4}'`
# Check raid status
# RAID recovers --> Warning
for disk in $DISKS
do
if [[ $RAID_RECOVER ]]; then
STATUS="Software Raid WARNING - Checked $RAID_DEVICES arrays, recovering : $RAID_RECOVER"
HD_STAT=$($SMART $ARGS $disk | awk '{print $6}' | grep -i passed)
echo "Hard drive status for $disk is $HD_STAT"
EXIT=1
elif [[ $RAID_RESYNC ]]; then
STATUS="Software Raid WARNING - Checked $RAID_DEVICES arrays, resync : $RAID_RESYNC"
HD_STAT=$($SMART $ARGS $disk | awk '{print $6}' | grep -i passed)
echo "Hard drive status for $disk is $HD_STAT"
EXIT=1
# RAID ok
elif [[ $RAID_STATUS == "0" ]]; then
STATUS="Software Raid OK - Checked $RAID_DEVICES arrays."
HD_STAT=$($SMART $ARGS $disk | awk '{print $6}' | grep -i passed)
STATUS= "Hard drive status for $disk is $HD_STAT"
EXIT=0
# All else critical, better save than sorry
else
STATUS="Software Raid CRITICAL - Checked $RAID_DEVICES arrays, $RAID_STATUS have FAILED"
HD_STAT=$($SMART $ARGS $disk | awk '{print $6}' | grep -i passed)
echo "Hard drive status for $disk is $HD_STAT"
EXIT=2
fi
done
# Status and quit
echo $STATUS
exit $EXIT
I tried to get the script working,I wanted your help on this:
SWRAID:/usr/lib/nagios/plugins# smartctl -H /dev/sda
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
Now I want to extract the result ("PASSED" in this case) here and incorporate that in the script along mdstat check.Currently I am not able to just extract the result -"PASSED" in this script.
Present Output:
Hard drive status for /dev/sda is
Hard drive status for /dev/sdb is
Software Raid WARNING - Checked 6 arrays, resync : 93.6%
CODE
#!/bin/bash
# Get count of raid arrays
RAID_DEVICES=`grep ^md -c /proc/mdstat`
#Get count of Physical Disks
DISKS="`blkid | grep sd | cut -c 1-8 | uniq`"
SMART=/usr/sbin/smartctl
ARGS="-H"
# Get count of degraded arrays
RAID_STATUS=`grep "\[.*_.*\]" /proc/mdstat -c`
# Is an array currently recovering, get percentage of recovery
RAID_RECOVER=`grep recovery /proc/mdstat | awk '{print $4}'`
RAID_RESYNC=`grep resync /proc/mdstat | awk '{print $4}'`
# Check raid status
# RAID recovers --> Warning
for disk in $DISKS
do
if [[ $RAID_RECOVER ]]; then
STATUS="Software Raid WARNING - Checked $RAID_DEVICES arrays, recovering : $RAID_RECOVER"
HD_STAT=$($SMART $ARGS $disk | awk '{print $6}' | grep -i passed)
echo "Hard drive status for $disk is $HD_STAT"
EXIT=1
elif [[ $RAID_RESYNC ]]; then
STATUS="Software Raid WARNING - Checked $RAID_DEVICES arrays, resync : $RAID_RESYNC"
HD_STAT=$($SMART $ARGS $disk | awk '{print $6}' | grep -i passed)
echo "Hard drive status for $disk is $HD_STAT"
EXIT=1
# RAID ok
elif [[ $RAID_STATUS == "0" ]]; then
STATUS="Software Raid OK - Checked $RAID_DEVICES arrays."
HD_STAT=$($SMART $ARGS $disk | awk '{print $6}' | grep -i passed)
STATUS= "Hard drive status for $disk is $HD_STAT"
EXIT=0
# All else critical, better save than sorry
else
STATUS="Software Raid CRITICAL - Checked $RAID_DEVICES arrays, $RAID_STATUS have FAILED"
HD_STAT=$($SMART $ARGS $disk | awk '{print $6}' | grep -i passed)
echo "Hard drive status for $disk is $HD_STAT"
EXIT=2
fi
done
# Status and quit
echo $STATUS
exit $EXIT
-
Shivaramakrishnan
- Posts: 71
- Joined: Tue May 15, 2012 10:11 pm
Re: Software Raid Check
I am not able to extract the result in of the smartctl in the above script.To add,
I made some changes in the above code:
for disk in $DISKS
do
if [[ $RAID_RECOVER ]]; then
STATUS="Software Raid WARNING - Checked $RAID_DEVICES arrays, recovering : $RAID_RECOVER"
HD_STAT=$($SMART $ARGS $disk | awk '{print $6}')
echo "Hard drive status for $disk is $HD_STAT"
EXIT=1
elif [[ $RAID_RESYNC ]]; then
STATUS="Software Raid WARNING - Checked $RAID_DEVICES arrays, resync : $RAID_RESYNC"
HD_STAT=$($SMART $ARGS $disk | awk '{print $6}')
echo "Hard drive status for $disk is $HD_STAT"
EXIT=1
# RAID ok
elif [[ $RAID_STATUS == "0" ]]; then
STATUS="Software Raid OK - Checked $RAID_DEVICES arrays."
HD_STAT=$($SMART $ARGS $disk | awk '{print $6}')
echo "Hard drive status for $disk is $HD_STAT"
EXIT=0
# All else critical, better save than sorry
else
STATUS="Software Raid CRITICAL - Checked $RAID_DEVICES arrays, $RAID_STATUS have FAILED"
HD_STAT=$($SMART $ARGS $disk | awk '{print $6}')
echo "Hard drive status for $disk is $HD_STAT"
EXIT=2
fi
done
Output:
As root user:
SWRAID:/usr/lib/nagios/plugins# ./check_sw_raid
Hard drive status for /dev/sda is (C)
DATA
PASSED
Hard drive status for /dev/sdb is (C)
DATA
PASSED
Software Raid WARNING - Checked 6 arrays, resync : 37.3%
AS NAGIOS:
(for 0d 0h 21m 4s)
Status Information: Hard drive status for /dev/sda is (C)
Permission
Hard drive status for /dev/sdb is (C)
Permission
Software Raid WARNING - Checked 6 arrays, resync : 37.3%
Can u help me out as soon as possible?
I made some changes in the above code:
for disk in $DISKS
do
if [[ $RAID_RECOVER ]]; then
STATUS="Software Raid WARNING - Checked $RAID_DEVICES arrays, recovering : $RAID_RECOVER"
HD_STAT=$($SMART $ARGS $disk | awk '{print $6}')
echo "Hard drive status for $disk is $HD_STAT"
EXIT=1
elif [[ $RAID_RESYNC ]]; then
STATUS="Software Raid WARNING - Checked $RAID_DEVICES arrays, resync : $RAID_RESYNC"
HD_STAT=$($SMART $ARGS $disk | awk '{print $6}')
echo "Hard drive status for $disk is $HD_STAT"
EXIT=1
# RAID ok
elif [[ $RAID_STATUS == "0" ]]; then
STATUS="Software Raid OK - Checked $RAID_DEVICES arrays."
HD_STAT=$($SMART $ARGS $disk | awk '{print $6}')
echo "Hard drive status for $disk is $HD_STAT"
EXIT=0
# All else critical, better save than sorry
else
STATUS="Software Raid CRITICAL - Checked $RAID_DEVICES arrays, $RAID_STATUS have FAILED"
HD_STAT=$($SMART $ARGS $disk | awk '{print $6}')
echo "Hard drive status for $disk is $HD_STAT"
EXIT=2
fi
done
Output:
As root user:
SWRAID:/usr/lib/nagios/plugins# ./check_sw_raid
Hard drive status for /dev/sda is (C)
DATA
PASSED
Hard drive status for /dev/sdb is (C)
DATA
PASSED
Software Raid WARNING - Checked 6 arrays, resync : 37.3%
AS NAGIOS:
(for 0d 0h 21m 4s)
Status Information: Hard drive status for /dev/sda is (C)
Permission
Hard drive status for /dev/sdb is (C)
Permission
Software Raid WARNING - Checked 6 arrays, resync : 37.3%
Can u help me out as soon as possible?
Re: Software Raid Check
This seems to work for me:
EDIT: This will probably work slightly better:
It skips the first 3 lines of output from smartctl, and with the remaining lines (which has always just been 1 for me) it prints everything after the first occurrence of ": ".
Code: Select all
smartctl -H /dev/sda | awk -F ': ' 'NR > 3 {print $2}'Code: Select all
smartctl -H /dev/sda | sed -n '4,$ s/^[^:]*: //p'-
Shivaramakrishnan
- Posts: 71
- Joined: Tue May 15, 2012 10:11 pm
Re: Software Raid Check
Thanks for the reply.
smartctl -H /dev/sda | sed -n '4,$ s/^[^:]*: //p'
I managed to fix that by,
HD_STAT=`$($SMART $ARGS $disk | awk '{print $6}' | grep -i passed)`
if [[ $HD_STAT == "PASSED" ]]; then
RESULT="Hard drive $disk is good"
echo "Hard drive $disk is $HD_STAT"
fi
But the problem here is you are defining the physical devices in the code.
I wanted to first check the physical drives present and then check its status.
So I have used blkid to determine the unique id's ,But I am only able to run as root,
As Root user:
SWRAID:~# /usr/lib/nagios/plugins/check_sw_raid
Hard drive /dev/sda is good
Hard drive /dev/sdb is good
Software Raid CRITICAL - Checked 6 arrays, 5 have FAILED
As Nagios or any other user:
SWRAID:~# su - nagios -s /bin/bash -c /usr/lib/nagios/plugins/check_sw_raid
No directory, logging in with HOME=/
/usr/lib/nagios/plugins/check_sw_raid: line 6: blkid: command not found
I fixed this by including the path for blkid,
But,now I guess its the problem with smartctl,Its able to read the unique devices,but not able to undergo the test.
OUTPUT:
As ROOT:
/dev/sda
/dev/sdb
Hard drive /dev/sda is good
Hard drive /dev/sdb is good
Software Raid CRITICAL - Checked 6 arrays, 5 have FAILED
AS ANY OTHER USER:
/dev/sda
/dev/sdb
Software Raid CRITICAL - Checked 6 arrays, 5 have FAILED
My sudoers file:
nagios ALL = (ALL) NOPASSWD: ALL
smartctl -H /dev/sda | sed -n '4,$ s/^[^:]*: //p'
I managed to fix that by,
HD_STAT=`$($SMART $ARGS $disk | awk '{print $6}' | grep -i passed)`
if [[ $HD_STAT == "PASSED" ]]; then
RESULT="Hard drive $disk is good"
echo "Hard drive $disk is $HD_STAT"
fi
But the problem here is you are defining the physical devices in the code.
I wanted to first check the physical drives present and then check its status.
So I have used blkid to determine the unique id's ,But I am only able to run as root,
As Root user:
SWRAID:~# /usr/lib/nagios/plugins/check_sw_raid
Hard drive /dev/sda is good
Hard drive /dev/sdb is good
Software Raid CRITICAL - Checked 6 arrays, 5 have FAILED
As Nagios or any other user:
SWRAID:~# su - nagios -s /bin/bash -c /usr/lib/nagios/plugins/check_sw_raid
No directory, logging in with HOME=/
/usr/lib/nagios/plugins/check_sw_raid: line 6: blkid: command not found
I fixed this by including the path for blkid,
But,now I guess its the problem with smartctl,Its able to read the unique devices,but not able to undergo the test.
OUTPUT:
As ROOT:
/dev/sda
/dev/sdb
Hard drive /dev/sda is good
Hard drive /dev/sdb is good
Software Raid CRITICAL - Checked 6 arrays, 5 have FAILED
AS ANY OTHER USER:
/dev/sda
/dev/sdb
Software Raid CRITICAL - Checked 6 arrays, 5 have FAILED
My sudoers file:
nagios ALL = (ALL) NOPASSWD: ALL
Last edited by Shivaramakrishnan on Tue Jun 26, 2012 12:06 pm, edited 1 time in total.
Re: Software Raid Check
Your command will only work if the disk passed the test. You may wish to restrict nagios' sudo permissions to a few selected programs rather than giving it root access to everything. As for listing the disks, lshw may work better, as in 'lshw -short -C disk'.
-
Shivaramakrishnan
- Posts: 71
- Joined: Tue May 15, 2012 10:11 pm
Re: Software Raid Check
Can you please look at my previous edited post?I guess the problem is with the permissioning of smartctl.
Can you advise?
Can you advise?
-
Shivaramakrishnan
- Posts: 71
- Joined: Tue May 15, 2012 10:11 pm
Re: Software Raid Check
I managed to get the script working,I wanted to check with you the exit status here.
I am first checking for software raid status.If critical ,not checking the physical drives,If the software raid is warning,checking for the hard drives as well.
But I wanted to display the degraded arrays instead of just counting them.Can u help me with that?
#!/bin/bash
# Get count of raid arrays
RAID_DEVICES=`grep ^md -c /proc/mdstat`
#Get count of Physical Disks
DISKS=`/sbin/blkid | grep sd | cut -c 1-8 | uniq`
SMART=/usr/sbin/smartctl
ARGS="-H"
# Get count of degraded arrays
RAID_STATUS=`grep "\[.*_.*\]" /proc/mdstat -c`
# Is an array currently recovering, get percentage of recovery
RAID_RECOVER=`grep recovery /proc/mdstat | awk '{print $4}'`
RAID_RESYNC=`grep resync /proc/mdstat | awk '{print $4}'`
# Check raid status
# RAID recovers --> Warning
for disk in $DISKS
do
if [[ $RAID_RECOVER ]]; then
STATUS="Software Raid WARNING - Checked $RAID_DEVICES arrays, recovering : $RAID_RECOVER"
HD_STAT=`sudo $SMART $ARGS $disk | sed -n '4,$ s/^[^:]*: //p'`
if [[ $HD_STAT != "PASSED" ]]; then
HARD_DRIVE="$disk $HD_STAT"
EXIT=2
fi
EXIT=1
elif [[ $RAID_RESYNC ]]; then
STATUS="Software Raid WARNING - Checked $RAID_DEVICES arrays, resync : $RAID_RESYNC"
HD_STAT=`sudo $SMART $ARGS $disk | sed -n '4,$ s/^[^:]*: //p'`
if [[ $HD_STAT != "PASSED" ]]; then
HARD_DRIVE="$disk $HD_STAT"
EXIT=2
fi
EXIT=1
# RAID ok
elif [[ $RAID_STATUS == "0" ]]; then
STATUS="Software Raid OK - Checked $RAID_DEVICES arrays."
HD_STAT=`sudo $SMART $ARGS $disk | sed -n '4,$ s/^[^:]*: //p'`
if [[ $HD_STAT == "PASSED" ]]; then
HARD_DRIVE="$disk $HD_STAT"
fi
EXIT=0
# All else critical, better save than sorry
else
STATUS="Software Raid CRITICAL - Checked $RAID_DEVICES arrays,$RAID_STATUS are DEGARDED"
EXIT=2
fi
done
# Status and quit
echo $STATUS
exit $EXIT
I am first checking for software raid status.If critical ,not checking the physical drives,If the software raid is warning,checking for the hard drives as well.
But I wanted to display the degraded arrays instead of just counting them.Can u help me with that?
#!/bin/bash
# Get count of raid arrays
RAID_DEVICES=`grep ^md -c /proc/mdstat`
#Get count of Physical Disks
DISKS=`/sbin/blkid | grep sd | cut -c 1-8 | uniq`
SMART=/usr/sbin/smartctl
ARGS="-H"
# Get count of degraded arrays
RAID_STATUS=`grep "\[.*_.*\]" /proc/mdstat -c`
# Is an array currently recovering, get percentage of recovery
RAID_RECOVER=`grep recovery /proc/mdstat | awk '{print $4}'`
RAID_RESYNC=`grep resync /proc/mdstat | awk '{print $4}'`
# Check raid status
# RAID recovers --> Warning
for disk in $DISKS
do
if [[ $RAID_RECOVER ]]; then
STATUS="Software Raid WARNING - Checked $RAID_DEVICES arrays, recovering : $RAID_RECOVER"
HD_STAT=`sudo $SMART $ARGS $disk | sed -n '4,$ s/^[^:]*: //p'`
if [[ $HD_STAT != "PASSED" ]]; then
HARD_DRIVE="$disk $HD_STAT"
EXIT=2
fi
EXIT=1
elif [[ $RAID_RESYNC ]]; then
STATUS="Software Raid WARNING - Checked $RAID_DEVICES arrays, resync : $RAID_RESYNC"
HD_STAT=`sudo $SMART $ARGS $disk | sed -n '4,$ s/^[^:]*: //p'`
if [[ $HD_STAT != "PASSED" ]]; then
HARD_DRIVE="$disk $HD_STAT"
EXIT=2
fi
EXIT=1
# RAID ok
elif [[ $RAID_STATUS == "0" ]]; then
STATUS="Software Raid OK - Checked $RAID_DEVICES arrays."
HD_STAT=`sudo $SMART $ARGS $disk | sed -n '4,$ s/^[^:]*: //p'`
if [[ $HD_STAT == "PASSED" ]]; then
HARD_DRIVE="$disk $HD_STAT"
fi
EXIT=0
# All else critical, better save than sorry
else
STATUS="Software Raid CRITICAL - Checked $RAID_DEVICES arrays,$RAID_STATUS are DEGARDED"
EXIT=2
fi
done
# Status and quit
echo $STATUS
exit $EXIT
-
Shivaramakrishnan
- Posts: 71
- Joined: Tue May 15, 2012 10:11 pm
Re: Software Raid Check
md5 : active raid1 sda9[0]
200242048 blocks [2/1] [U_]
md4 : active raid1 sda8[0]
4883648 blocks [2/1] [U_]
md3 : active raid1 sda7[0]
20506816 blocks [2/1] [U_]
md2 : active raid1 sda6[0]
979840 blocks [2/1] [U_]
md1 : active (auto-read-only) raid1 sda5[0] sdb5[1]
7815488 blocks [2/2] [UU]
resync=PENDING
md0 : active raid1 sda1[0]
9767424 blocks [2/1] [U_]
unused devices: <none>
---------------------------------------------------------------------------
SWRAID:~# grep "\[.*_.*\]" /proc/mdstat
200242048 blocks [2/1] [U_]
4883648 blocks [2/1] [U_]
20506816 blocks [2/1] [U_]
979840 blocks [2/1] [U_]
9767424 blocks [2/1] [U_]
I need the array name to be displayed.So in this case md0,md2,md3,md4,md5
200242048 blocks [2/1] [U_]
md4 : active raid1 sda8[0]
4883648 blocks [2/1] [U_]
md3 : active raid1 sda7[0]
20506816 blocks [2/1] [U_]
md2 : active raid1 sda6[0]
979840 blocks [2/1] [U_]
md1 : active (auto-read-only) raid1 sda5[0] sdb5[1]
7815488 blocks [2/2] [UU]
resync=PENDING
md0 : active raid1 sda1[0]
9767424 blocks [2/1] [U_]
unused devices: <none>
---------------------------------------------------------------------------
SWRAID:~# grep "\[.*_.*\]" /proc/mdstat
200242048 blocks [2/1] [U_]
4883648 blocks [2/1] [U_]
20506816 blocks [2/1] [U_]
979840 blocks [2/1] [U_]
9767424 blocks [2/1] [U_]
I need the array name to be displayed.So in this case md0,md2,md3,md4,md5