Software Raid Check

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
agriffin
Posts: 876
Joined: Mon May 09, 2011 9:36 am

Re: Software Raid Check

Post by agriffin »

Code: Select all

awk 'BEGIN{RS="\n\n"} $NF ~ /_/ {print $1}' /proc/mdstat
Shivaramakrishnan
Posts: 71
Joined: Tue May 15, 2012 10:11 pm

Re: Software Raid Check

Post by Shivaramakrishnan »

I managed to tweek around a bit and got the script working,
But last advise required.

Current Output:
/dev/md0 State:clean,degraded
/dev/md2 State:clean,degraded
/dev/md3 State:clean,degraded
/dev/md4 State:clean,degraded
/dev/md5 State:clean,degraded
Raid WARNING - Checked 6 arrays,resync=PENDING

I wanted something like
Raid WARNING - Checked 6 arrays,resync=PENDING
/dev/md0 State:clean,degraded
/dev/md2 State:clean,degraded
/dev/md3 State:clean,degraded
/dev/md4 State:clean,degraded
/dev/md5 State:clean,degraded

Is it possible to modify the below script a bit and obtain this result,


#!/bin/bash
# Get count of raid arrays
RAID_DEVICES=`grep ^md -c /proc/mdstat`

#echo "$RAID_DEVICES"
#Get count of Physical Disks

DISKS=`/sbin/blkid | grep sd | cut -c 1-8 | uniq`
MDISKS=`/sbin/blkid | grep /dev/md | cut -c 1-8 | uniq`

SMART=/usr/sbin/smartctl
ARGS="-H"

MARGS="--detail"
MDADM=/sbin/mdadm

RAID_RECOVER=`grep recovery /proc/mdstat | awk '{print $1}'`
RAID_RESYNC=`grep resync /proc/mdstat | awk '{print $1}'`


for disk in $DISKS
do
HD_STAT=`sudo $SMART $ARGS $disk | sed -n '4,$ s/^[^:]*: //p'`
if [[ $HD_STAT != "PASSED" ]]; then
echo "$disk $HD_STAT"
EXIT=2
fi
done

for mpart in $MDISKS
do
if [[ $RAID_RECOVER ]]; then
STATUS="Raid WARNING - Checked $RAID_DEVICES arrays,$RAID_RECOVER"
ARRAY=`sudo $MDADM $MARGS $mpart |grep -i "State :" |awk '{print $1 $2 $3 $4}'`
if [[ $ARRAY != "State:clean" ]]; then
echo "$mpart $ARRAY"
fi
EXIT=1

elif [[ $RAID_RESYNC ]]; then
STATUS="Raid WARNING - Checked $RAID_DEVICES arrays,$RAID_RESYNC"
ARRAY=`sudo $MDADM $MARGS $mpart |grep -i "State :" |awk '{print $1 $2 $3 $4}'`
if [[ $ARRAY != "State:clean" ]]; then
echo "$mpart $ARRAY"
fi
EXIT=1

elif [[ $RAID_STATUS == "0" ]]; then
STATUS="Raid OK - Checked $RAID_DEVICES arrays."
ARRAY=`sudo $MDADM $MARGS $mpart |grep -i "State :" |awk '{print $1 $2 $3 $4}'`
if [[ $ARRAY != "State:clean" ]]; then
echo "$mpart $ARRAY"
fi
EXIT=0
else
STATUS="Raid CRITICAL - Checked $RAID_DEVICES arrays,$RAID_STATUS are DEGARDED"
ARRAY=`sudo $MDADM $MARGS $mpart |grep -i "State :" |awk '{print $1 $2 $3 $4}'`
if [[ $ARRAY != "State:clean" ]]; then
echo "$mpart $ARRAY"
fi
EXIT=2
fi

done

# quit
echo $STATUS
exit $EXIT
Shivaramakrishnan
Posts: 71
Joined: Tue May 15, 2012 10:11 pm

Re: Software Raid Check

Post by Shivaramakrishnan »

I changed the code and got it working.

for disk in $DISKS
do
if [[ $RAID_RECOVER ]]; then
STATUS="Software Raid WARNING - Checked $RAID_DEVICES arrays, recovering : $RAID_RECOVER"
HD_STAT=`sudo $SMART $ARGS $disk | sed -n '4,$ s/^[^:]*: //p'`
if [[ $HD_STAT != "PASSED" ]]; then
echo "$disk $HD_STAT"
EXIT=2
fi
EXIT=1
elif [[ $RAID_RESYNC ]]; then
STATUS="Raid WARNING - Checked $RAID_DEVICES arrays,$RAID_RESYNC"
HD_STAT=`sudo $SMART $ARGS $disk | sed -n '4,$ s/^[^:]*: //p'`
if [[ $HD_STAT != "PASSED" ]]; then
echo "$disk $HD_STAT"
EXIT=2
fi
EXIT=1

elif [[ $RAID_STATUS == "0" ]]; then
STATUS="Raid OK - Checked $RAID_DEVICES arrays."
HD_STAT=`sudo $SMART $ARGS $disk | sed -n '4,$ s/^[^:]*: //p'`
if [[ $HD_STAT != "PASSED" ]]; then
echo "$disk $HD_STAT"
EXIT=2
fi
EXIT=0
else
STATUS="Raid CRITICAL - Checked $RAID_DEVICES arrays,$RAID_STATUS are DEGARDED"
EXIT=2
fi
done

echo $STATUS
for mpart in $MDISKS
do
ARRAY=`sudo $MDADM $MARGS $mpart |grep -i "State :" |awk '{print $1 $2 $3 $4}'`
if [[ $ARRAY != "State:clean" ]]; then
echo "$mpart $ARRAY"
fi
done


Its giving me the output:But want to clarify if the exit status are fine here?
agriffin
Posts: 876
Joined: Mon May 09, 2011 9:36 am

Re: Software Raid Check

Post by agriffin »

They look fine to me. Is there something specific you were confused about or did you just need confirmation that you did it right?
Shivaramakrishnan
Posts: 71
Joined: Tue May 15, 2012 10:11 pm

Re: Software Raid Check

Post by Shivaramakrishnan »

Thanks for the response.I just wanted to cross verify.

I wanted the following for the OK Status:
Status OK - State Clean: md0 md1 md2 md3 md4 md5

Currently my script gives something like this which is looking odd in GUI
OK-6
/dev/md0 Clean
/dev/md1 Clean
/dev/md2 Clean
/dev/md3 Clean
/dev/md4 Clean
/dev/md5 Clean

Is it possible?
agriffin
Posts: 876
Joined: Mon May 09, 2011 9:36 am

Re: Software Raid Check

Post by agriffin »

Yeah, I could make that work for you pretty easily, but I'm confused about what your script looks like right now. Could you post it in full again?
Shivaramakrishnan
Posts: 71
Joined: Tue May 15, 2012 10:11 pm

Re: Software Raid Check

Post by Shivaramakrishnan »

Below the complete script.I managed to convert them into a single line using tr command,but not able to have displayed something like:
RAID OK - State Clean: md0 md1 md2 md3 md4 md5

Current Output of the below script:
For OK STATUS:
OK - 6
md0md1md2md3md4md5

For OTHERs,this is fine but like to have space between "md0State:clean,degraded" to be liked to changed as "md0 State:clean,degraded"
WARNING - Checked 6 arrays,resync=PENDING
md0State:clean,degraded
md2State:clean,degraded
md3State:clean,degraded
md4State:clean,degraded
md5State:clean,degraded


#!/bin/bash
# Get count of raid arrays
RAID_DEVICES=`grep ^md -c /proc/mdstat`

# Get count of degraded arrays
RAID_STATUS=`grep "\[.*_.*\]" /proc/mdstat -c`

#echo "$RAID_DEVICES"
#Get count of Physical Disks

DISKS=`/sbin/blkid | grep sd | cut -c 1-8 | uniq`
MDISKS=`/sbin/blkid | grep /dev/md | cut -c 1-8 | uniq`
SMART=/usr/sbin/smartctl
ARGS="-H"

MARGS="--detail"
MDADM=/sbin/mdadm

RAID_RECOVER=`grep recovery /proc/mdstat | awk '{print $1}'`
RAID_RESYNC=`grep resync /proc/mdstat | awk '{print $1}'`


for disk in $DISKS
do
if [[ $RAID_RECOVER ]]; then
STATUS=" WARNING - $RAID_DEVICES ,recovering : $RAID_RECOVER"
HD_STAT=`sudo $SMART $ARGS $disk | sed -n '4,$ s/^[^:]*: //p'`
if [[ $HD_STAT != "PASSED" ]]; then
echo "$disk $HD_STAT"
EXIT=2
fi
EXIT=1
elif [[ $RAID_RESYNC ]]; then
STATUS=" WARNING - $RAID_DEVICES,$RAID_RESYNC"
HD_STAT=`sudo $SMART $ARGS $disk | sed -n '4,$ s/^[^:]*: //p'`
if [[ $HD_STAT != "PASSED" ]]; then
echo "$disk $HD_STAT"
EXIT=2
fi
EXIT=1

elif [[ $RAID_STATUS == "0" ]]; then
STATUS=" OK - $RAID_DEVICES"
HD_STAT=`sudo $SMART $ARGS $disk | sed -n '4,$ s/^[^:]*: //p'`
if [[ $HD_STAT != "PASSED" ]]; then
echo "$disk $HD_STAT"
fi
EXIT=0
else
STATUS=" CRITICAL - Checked $RAID_DEVICES arrays,$RAID_STATUS are DEGARDED"
EXIT=2
fi
done

echo $STATUS
for mpart in $MDISKS
do
ARRAY=`sudo $MDADM $MARGS $mpart |grep -i "State :" |awk '{print $1 $2 $3 $4}'`
if [[ $ARRAY == "State:clean" ]]; then
echo $mpart | cut -c6-8 |tr -d '\n'
elif [[ $ARRAY != "State:clean" ]]; then
echo $mpart | cut -c6-8 |tr -d '\n'
echo $ARRAY
fi
done

# quit
exit $EXIT
agriffin
Posts: 876
Joined: Mon May 09, 2011 9:36 am

Re: Software Raid Check

Post by agriffin »

I only changed enough to give the correct output for OK states, but it should be obvious how to fix the rest of them too.

Code: Select all

#!/bin/bash
# Get count of raid arrays
RAID_DEVICES=`grep ^md -c /proc/mdstat`

# Get count of degraded arrays
RAID_STATUS=`grep "\[.*_.*\]" /proc/mdstat -c`

#echo "$RAID_DEVICES"
#Get count of Physical Disks

DISKS=`/sbin/blkid | grep sd | cut -c 1-8 | uniq`
MDISKS=`/sbin/blkid | grep /dev/md | cut -c 6-8 | uniq`
SMART=/usr/sbin/smartctl
ARGS="-H"

MARGS="--detail"
MDADM=/sbin/mdadm

RAID_RECOVER=`grep recovery /proc/mdstat | awk '{print $1}'`
RAID_RESYNC=`grep resync /proc/mdstat | awk '{print $1}'`


for disk in $DISKS
do
    if [[ $RAID_RECOVER ]]; then
        STATUS=" WARNING - $RAID_DEVICES ,recovering : $RAID_RECOVER"
        HD_STAT=`sudo $SMART $ARGS $disk | sed -n '4,$ s/^[^:]*: //p'`
        if [[ $HD_STAT != "PASSED" ]]; then
                echo "$disk $HD_STAT"
                EXIT=2
        fi
   EXIT=1
   elif [[ $RAID_RESYNC ]]; then
        STATUS=" WARNING - $RAID_DEVICES,$RAID_RESYNC"
        HD_STAT=`sudo $SMART $ARGS $disk | sed -n '4,$ s/^[^:]*: //p'`
        if [[ $HD_STAT != "PASSED" ]]; then
                echo "$disk $HD_STAT"
                EXIT=2
        fi
   EXIT=1

   elif [[ $RAID_STATUS == "0" ]]; then
        STATUS=" OK - State Clean:"
        HD_STAT=`sudo $SMART $ARGS $disk | sed -n '4,$ s/^[^:]*: //p'`
        if [[ $HD_STAT != "PASSED" ]]; then
                echo "$disk $HD_STAT"
        fi
   EXIT=0
   else
        STATUS=" CRITICAL - Checked $RAID_DEVICES arrays,$RAID_STATUS are DEGARDED"
   EXIT=2
   fi
done

echo -n "$STATUS"
for mpart in $MDISKS
do
        ARRAY=`sudo $MDADM $MARGS /dev/$mpart |grep -i "State :" |awk '{print $1 $2 $3 $4}'`
        if [[ $ARRAY == "State:clean" ]]; then
        echo "/dev/$mpart $ARRAY"
        elif  [[ $ARRAY != "State:clean" ]]; then
        echo -n " $mpart"
        echo $ARRAY
        fi
done
echo

# quit
exit $EXIT
Notice that I changed the MDISK and mpart variables, and made use of echo's -n flag.
Shivaramakrishnan
Posts: 71
Joined: Tue May 15, 2012 10:11 pm

Re: Software Raid Check

Post by Shivaramakrishnan »

A question here,
If the physical drives are good but the arrays are bad then wil the above script take care of that scenario meaning Will it produce State Clean ?
Should I consider having Exit status for the second loop as well?Can you please check the second loop and Can u suggest appropriate one's?

I added a EXIT=1 if the $ARRAY != "State:clean to be on safer side

the modified second loop:


for mpart in $MDISKS
do
ARRAY=`sudo $MDADM $MARGS $mpart |grep -i "State :" |awk '{print $1 $2 $3 $4}'`
if [[ $ARRAY == "State:clean" ]]; then
echo $mpart | cut -c6-8 |tr -d '\n'|sed 's/\(.\{3\}\)/\1 /g'
EXIT=0
elif [[ $ARRAY != "State:clean" ]]; then
echo $mpart | cut -c6-8 |tr -d '\n'
echo $ARRAY
EXIT=1
fi

done


Is this fine wrt exit status
Locked