Software Raid Check

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Shivaramakrishnan
Posts: 71
Joined: Tue May 15, 2012 10:11 pm

Software Raid Check

Post by Shivaramakrishnan »

Hi
I have a bash script to check the software raid.
What I want is to check even the Hard Drive status for me and output OK only if it has passed the smartctl test,It should check all the drives in the software raid.Can anyone help me in this regard?
The below code only checks for the software raid and does not check the hard drive status.I want to include it in my script.
Below the code.


#!/bin/bash
# Get count of raid arrays
RAID_DEVICES=`grep ^md -c /proc/mdstat`

# Get count of degraded arrays
RAID_STATUS=`grep "\[.*_.*\]" /proc/mdstat -c`

# Is an array currently recovering, get percentage of recovery
RAID_RECOVER=`grep recovery /proc/mdstat | awk '{print $4}'`
RAID_RESYNC=`grep resync /proc/mdstat | awk '{print $4}'`

# Check raid status
# RAID recovers --> Warning
if [[ $RAID_RECOVER ]]; then
STATUS="Software Raid WARNING - Checked $RAID_DEVICES arrays, recovering : $RAID_RECOVER"
EXIT=1
elif [[ $RAID_RESYNC ]]; then
STATUS="Software Raid WARNING - Checked $RAID_DEVICES arrays, resync : $RAID_RESYNC"
EXIT=1

# RAID ok
elif [[ $RAID_STATUS == "0" ]]; then
STATUS="Software Raid OK - Checked $RAID_DEVICES arrays."
EXIT=0

# All else critical, better save than sorry
else
STATUS="Software Raid CRITICAL - Checked $RAID_DEVICES arrays, $RAID_STATUS have FAILED"
EXIT=2
fi

# Status and quit
echo $STATUS
exit $EXIT
agriffin
Posts: 876
Joined: Mon May 09, 2011 9:36 am

Re: Software Raid Check

Post by agriffin »

There are already plugins for this here and here.

If those don't help, could you give an example of how you would run smartctl from the command line to manually check your disks? Also, include some sample output. I don't use smartctl myself, but I'm fairly competent at shell scripting and could probably get it working.
Shivaramakrishnan
Posts: 71
Joined: Tue May 15, 2012 10:11 pm

Re: Software Raid Check

Post by Shivaramakrishnan »

The usage for smartctl is shown below.
I tried to get the script working,I wanted your help on this:

SWRAID:/usr/lib/nagios/plugins# smartctl -H /dev/sda
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
Now I want to extract the result ("PASSED" in this case) here and incorporate that in the script along mdstat check.Currently I am not able to just extract the result -"PASSED" in this script.

Present Output:
Hard drive status for /dev/sda is
Hard drive status for /dev/sdb is
Software Raid WARNING - Checked 6 arrays, resync : 93.6%


CODE
#!/bin/bash
# Get count of raid arrays
RAID_DEVICES=`grep ^md -c /proc/mdstat`

#Get count of Physical Disks
DISKS="`blkid | grep sd | cut -c 1-8 | uniq`"
SMART=/usr/sbin/smartctl
ARGS="-H"

# Get count of degraded arrays
RAID_STATUS=`grep "\[.*_.*\]" /proc/mdstat -c`
# Is an array currently recovering, get percentage of recovery
RAID_RECOVER=`grep recovery /proc/mdstat | awk '{print $4}'`
RAID_RESYNC=`grep resync /proc/mdstat | awk '{print $4}'`

# Check raid status
# RAID recovers --> Warning

for disk in $DISKS
do
if [[ $RAID_RECOVER ]]; then
STATUS="Software Raid WARNING - Checked $RAID_DEVICES arrays, recovering : $RAID_RECOVER"
HD_STAT=$($SMART $ARGS $disk | awk '{print $6}' | grep -i passed)
echo "Hard drive status for $disk is $HD_STAT"
EXIT=1

elif [[ $RAID_RESYNC ]]; then
STATUS="Software Raid WARNING - Checked $RAID_DEVICES arrays, resync : $RAID_RESYNC"
HD_STAT=$($SMART $ARGS $disk | awk '{print $6}' | grep -i passed)
echo "Hard drive status for $disk is $HD_STAT"
EXIT=1

# RAID ok
elif [[ $RAID_STATUS == "0" ]]; then
STATUS="Software Raid OK - Checked $RAID_DEVICES arrays."
HD_STAT=$($SMART $ARGS $disk | awk '{print $6}' | grep -i passed)
STATUS= "Hard drive status for $disk is $HD_STAT"
EXIT=0

# All else critical, better save than sorry
else
STATUS="Software Raid CRITICAL - Checked $RAID_DEVICES arrays, $RAID_STATUS have FAILED"
HD_STAT=$($SMART $ARGS $disk | awk '{print $6}' | grep -i passed)
echo "Hard drive status for $disk is $HD_STAT"
EXIT=2
fi
done

# Status and quit
echo $STATUS
exit $EXIT
Shivaramakrishnan
Posts: 71
Joined: Tue May 15, 2012 10:11 pm

Re: Software Raid Check

Post by Shivaramakrishnan »

I am not able to extract the result in of the smartctl in the above script.To add,
I made some changes in the above code:

for disk in $DISKS
do
if [[ $RAID_RECOVER ]]; then
STATUS="Software Raid WARNING - Checked $RAID_DEVICES arrays, recovering : $RAID_RECOVER"
HD_STAT=$($SMART $ARGS $disk | awk '{print $6}')
echo "Hard drive status for $disk is $HD_STAT"
EXIT=1

elif [[ $RAID_RESYNC ]]; then
STATUS="Software Raid WARNING - Checked $RAID_DEVICES arrays, resync : $RAID_RESYNC"
HD_STAT=$($SMART $ARGS $disk | awk '{print $6}')
echo "Hard drive status for $disk is $HD_STAT"
EXIT=1

# RAID ok
elif [[ $RAID_STATUS == "0" ]]; then
STATUS="Software Raid OK - Checked $RAID_DEVICES arrays."
HD_STAT=$($SMART $ARGS $disk | awk '{print $6}')
echo "Hard drive status for $disk is $HD_STAT"
EXIT=0

# All else critical, better save than sorry
else
STATUS="Software Raid CRITICAL - Checked $RAID_DEVICES arrays, $RAID_STATUS have FAILED"
HD_STAT=$($SMART $ARGS $disk | awk '{print $6}')
echo "Hard drive status for $disk is $HD_STAT"
EXIT=2
fi
done


Output:
As root user:
SWRAID:/usr/lib/nagios/plugins# ./check_sw_raid
Hard drive status for /dev/sda is (C)


DATA
PASSED
Hard drive status for /dev/sdb is (C)


DATA
PASSED
Software Raid WARNING - Checked 6 arrays, resync : 37.3%

AS NAGIOS:
(for 0d 0h 21m 4s)
Status Information: Hard drive status for /dev/sda is (C)


Permission
Hard drive status for /dev/sdb is (C)


Permission
Software Raid WARNING - Checked 6 arrays, resync : 37.3%
Can u help me out as soon as possible?
agriffin
Posts: 876
Joined: Mon May 09, 2011 9:36 am

Re: Software Raid Check

Post by agriffin »

This seems to work for me:

Code: Select all

smartctl -H /dev/sda | awk -F ': ' 'NR > 3 {print $2}'
EDIT: This will probably work slightly better:

Code: Select all

smartctl -H /dev/sda | sed -n '4,$ s/^[^:]*: //p'
It skips the first 3 lines of output from smartctl, and with the remaining lines (which has always just been 1 for me) it prints everything after the first occurrence of ": ".
Shivaramakrishnan
Posts: 71
Joined: Tue May 15, 2012 10:11 pm

Re: Software Raid Check

Post by Shivaramakrishnan »

Thanks for the reply.
smartctl -H /dev/sda | sed -n '4,$ s/^[^:]*: //p'
I managed to fix that by,

HD_STAT=`$($SMART $ARGS $disk | awk '{print $6}' | grep -i passed)`
if [[ $HD_STAT == "PASSED" ]]; then
RESULT="Hard drive $disk is good"
echo "Hard drive $disk is $HD_STAT"
fi

But the problem here is you are defining the physical devices in the code.
I wanted to first check the physical drives present and then check its status.
So I have used blkid to determine the unique id's ,But I am only able to run as root,
As Root user:
SWRAID:~# /usr/lib/nagios/plugins/check_sw_raid
Hard drive /dev/sda is good
Hard drive /dev/sdb is good
Software Raid CRITICAL - Checked 6 arrays, 5 have FAILED

As Nagios or any other user:
SWRAID:~# su - nagios -s /bin/bash -c /usr/lib/nagios/plugins/check_sw_raid
No directory, logging in with HOME=/
/usr/lib/nagios/plugins/check_sw_raid: line 6: blkid: command not found

I fixed this by including the path for blkid,
But,now I guess its the problem with smartctl,Its able to read the unique devices,but not able to undergo the test.

OUTPUT:
As ROOT:
/dev/sda
/dev/sdb
Hard drive /dev/sda is good
Hard drive /dev/sdb is good
Software Raid CRITICAL - Checked 6 arrays, 5 have FAILED

AS ANY OTHER USER:
/dev/sda
/dev/sdb
Software Raid CRITICAL - Checked 6 arrays, 5 have FAILED




My sudoers file:
nagios ALL = (ALL) NOPASSWD: ALL
Last edited by Shivaramakrishnan on Tue Jun 26, 2012 12:06 pm, edited 1 time in total.
agriffin
Posts: 876
Joined: Mon May 09, 2011 9:36 am

Re: Software Raid Check

Post by agriffin »

Your command will only work if the disk passed the test. You may wish to restrict nagios' sudo permissions to a few selected programs rather than giving it root access to everything. As for listing the disks, lshw may work better, as in 'lshw -short -C disk'.
Shivaramakrishnan
Posts: 71
Joined: Tue May 15, 2012 10:11 pm

Re: Software Raid Check

Post by Shivaramakrishnan »

Can you please look at my previous edited post?I guess the problem is with the permissioning of smartctl.
Can you advise?
Shivaramakrishnan
Posts: 71
Joined: Tue May 15, 2012 10:11 pm

Re: Software Raid Check

Post by Shivaramakrishnan »

I managed to get the script working,I wanted to check with you the exit status here.
I am first checking for software raid status.If critical ,not checking the physical drives,If the software raid is warning,checking for the hard drives as well.
But I wanted to display the degraded arrays instead of just counting them.Can u help me with that?


#!/bin/bash
# Get count of raid arrays
RAID_DEVICES=`grep ^md -c /proc/mdstat`

#Get count of Physical Disks

DISKS=`/sbin/blkid | grep sd | cut -c 1-8 | uniq`
SMART=/usr/sbin/smartctl
ARGS="-H"

# Get count of degraded arrays
RAID_STATUS=`grep "\[.*_.*\]" /proc/mdstat -c`

# Is an array currently recovering, get percentage of recovery
RAID_RECOVER=`grep recovery /proc/mdstat | awk '{print $4}'`
RAID_RESYNC=`grep resync /proc/mdstat | awk '{print $4}'`


# Check raid status
# RAID recovers --> Warning

for disk in $DISKS
do
if [[ $RAID_RECOVER ]]; then
STATUS="Software Raid WARNING - Checked $RAID_DEVICES arrays, recovering : $RAID_RECOVER"
HD_STAT=`sudo $SMART $ARGS $disk | sed -n '4,$ s/^[^:]*: //p'`
if [[ $HD_STAT != "PASSED" ]]; then
HARD_DRIVE="$disk $HD_STAT"
EXIT=2
fi
EXIT=1

elif [[ $RAID_RESYNC ]]; then
STATUS="Software Raid WARNING - Checked $RAID_DEVICES arrays, resync : $RAID_RESYNC"
HD_STAT=`sudo $SMART $ARGS $disk | sed -n '4,$ s/^[^:]*: //p'`
if [[ $HD_STAT != "PASSED" ]]; then
HARD_DRIVE="$disk $HD_STAT"
EXIT=2
fi
EXIT=1

# RAID ok
elif [[ $RAID_STATUS == "0" ]]; then
STATUS="Software Raid OK - Checked $RAID_DEVICES arrays."
HD_STAT=`sudo $SMART $ARGS $disk | sed -n '4,$ s/^[^:]*: //p'`
if [[ $HD_STAT == "PASSED" ]]; then
HARD_DRIVE="$disk $HD_STAT"
fi
EXIT=0

# All else critical, better save than sorry
else
STATUS="Software Raid CRITICAL - Checked $RAID_DEVICES arrays,$RAID_STATUS are DEGARDED"
EXIT=2

fi
done

# Status and quit

echo $STATUS
exit $EXIT
Shivaramakrishnan
Posts: 71
Joined: Tue May 15, 2012 10:11 pm

Re: Software Raid Check

Post by Shivaramakrishnan »

md5 : active raid1 sda9[0]
200242048 blocks [2/1] [U_]

md4 : active raid1 sda8[0]
4883648 blocks [2/1] [U_]

md3 : active raid1 sda7[0]
20506816 blocks [2/1] [U_]

md2 : active raid1 sda6[0]
979840 blocks [2/1] [U_]

md1 : active (auto-read-only) raid1 sda5[0] sdb5[1]
7815488 blocks [2/2] [UU]
resync=PENDING

md0 : active raid1 sda1[0]
9767424 blocks [2/1] [U_]

unused devices: <none>
---------------------------------------------------------------------------

SWRAID:~# grep "\[.*_.*\]" /proc/mdstat
200242048 blocks [2/1] [U_]
4883648 blocks [2/1] [U_]
20506816 blocks [2/1] [U_]
979840 blocks [2/1] [U_]
9767424 blocks [2/1] [U_]

I need the array name to be displayed.So in this case md0,md2,md3,md4,md5
Locked