Okay I ran the "nohup tcpdump -Z root -s 0 -i any port 161 and host a.b.c.d -C 10 -W 5 -w output.pcap &" command, the response I got was nohup:ignoring input and appending output to 'nohup.out'cdienger wrote:Just long enough to see a timeout message like this in the logs:
[12-26-2018 ********] SERVICE ALERT: DC_*****;Global Health Status;CRITICAL;SOFT;1;(Service check timed out after 180.04 seconds)
It should be a small file and I wouldn't be too concerned with it growing large. Feel free to stop and restart it though if the problem doesn't occur. You can also set up a rotating capture:
nohup tcpdump -Z root -s 0 -i any port 161 and host a.b.c.d -C 10 -W 5 -w output.pcap &
The above will start tcpdump and run it in the background - only storing the last 50 megs of data captured in 5 10 meg files(output.pcap0, output.pcap1, etc...). To stop the trace:
pkill tcpdump
check_snmp_synology - False Positives
-
- Posts: 75
- Joined: Wed Dec 26, 2018 2:31 pm
Re: check_snmp_synology - False Positives
Re: check_snmp_synology - False Positives
It would be in the location where the command was run. If you logged in as root and ran it without changing directories, it would likely be under /root. You can also run "pwd" on the command line and "ls" to get the present working directory and a listing of its contents.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: check_snmp_synology - False Positives
The "nohup:ignoring input and appending output to 'nohup.out'" message is expected and you can just hit enter to get back to a command line and run "ps aux | grep tcpdump" to verify that it is running.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
-
- Posts: 75
- Joined: Wed Dec 26, 2018 2:31 pm
Re: check_snmp_synology - False Positives
Excellent, I will run this and start the capturing now! I didn't want to let it run without validating the output was okay from you. I tried googling it but didn't get a great understanding of what it meant.
-
- Posts: 75
- Joined: Wed Dec 26, 2018 2:31 pm
Re: check_snmp_synology - False Positives
I also commented out some useless aspects of the plugin like HD temp, DSM version, etc to try and make it as light as possible. Not sure how this will all work out but its progress.
-
- Posts: 75
- Joined: Wed Dec 26, 2018 2:31 pm
Re: check_snmp_synology - False Positives
(No output on stdout) stderr: /usr/local/nagios/libexec/check_snmp_synology: line 410: syntax error: unexpected end of file
Got an error, I removed the comment from the last "fi" in the file. Here is what I have currently:
Got an error, I removed the comment from the last "fi" in the file. Here is what I have currently:
Code: Select all
SNMPWALK=$(which snmpwalk)
SNMPGET=$(which snmpget)
SNMPVersion="3"
SNMPV2Community="public"
SNMPTimeout="15"
warningTemperature="50"
criticalTemperature="60"
warningStorage="80"
criticalStorage="95"
hostname=""
healthWarningStatus=0
healthCriticalStatus=0
healthString=""
verbose="no"
checkDSMUpdate="yes"
ups="no"
#OID declarations
OID_syno="1.3.6.1.4.1.6574"
OID_model="1.3.6.1.4.1.6574.1.5.1.0"
OID_serialNumber="1.3.6.1.4.1.6574.1.5.2.0"
OID_DSMVersion="1.3.6.1.4.1.6574.1.5.3.0"
OID_DSMUpgradeAvailable="1.3.6.1.4.1.6574.1.5.4.0"
OID_systemStatus="1.3.6.1.4.1.6574.1.1.0"
OID_temperature="1.3.6.1.4.1.6574.1.2.0"
OID_powerStatus="1.3.6.1.4.1.6574.1.3.0"
OID_systemFanStatus="1.3.6.1.4.1.6574.1.4.1.0"
OID_CPUFanStatus="1.3.6.1.4.1.6574.1.4.2.0"
OID_disk=""
OID_disk2=""
OID_diskID="1.3.6.1.4.1.6574.2.1.1.2"
OID_diskModel="1.3.6.1.4.1.6574.2.1.1.3"
OID_diskStatus="1.3.6.1.4.1.6574.2.1.1.5"
OID_diskTemp="1.3.6.1.4.1.6574.2.1.1.6"
OID_RAID=""
OID_RAIDName="1.3.6.1.4.1.6574.3.1.1.2"
OID_RAIDStatus="1.3.6.1.4.1.6574.3.1.1.3"
OID_Storage="1.3.6.1.2.1.25.2.3.1"
OID_StorageDesc="1.3.6.1.2.1.25.2.3.1.3"
OID_StorageAllocationUnits="1.3.6.1.2.1.25.2.3.1.4"
OID_StorageSize="1.3.6.1.2.1.25.2.3.1.5"
OID_StorageSizeUsed="1.3.6.1.2.1.25.2.3.1.6"
OID_UpsModel="1.3.6.1.4.1.6574.4.1.1.0"
OID_UpsSN="1.3.6.1.4.1.6574.4.1.3.0"
OID_UpsStatus="1.3.6.1.4.1.6574.4.2.1.0"
OID_UpsLoad="1.3.6.1.4.1.6574.4.2.12.1.0"
OID_UpsBatteryCharge="1.3.6.1.4.1.6574.4.3.1.1.0"
OID_UpsBatteryChargeWarning="1.3.6.1.4.1.6574.4.3.1.4.0"
usage()
{
echo "usage: ./check_snmp_synology [OPTIONS] -u [user] -p [pass] -h [hostname]"
echo "options:"
echo " -u [snmp username] Username for SNMPv3"
echo " -p [snmp password] Password for SNMPv3"
echo ""
echo " -2 [community name] Use SNMPv2 (no need user/password) & define community name (ex: public)"
echo ""
echo " -h [hostname or IP](:port) Hostname or IP. You can also define a different port"
echo ""
echo " -W [warning temp] Warning temperature (for disks & synology) (default $warningTemperature)"
echo " -C [critical temp] Critical temperature (for disks & synology) (default $criticalTemperature)"
echo ""
echo " -w [warning %] Warning storage usage percentage (default $warningStorage)"
echo " -c [critical %] Critical storage usage percentage (default $criticalStorage)"
echo ""
echo " -i Ignore DSM updates"
echo " -U Show informations about the connected UPS (only information, no control)"
echo " -v Verbose - print all informations about your Synology"
echo ""
echo ""
echo "examples: ./check_snmp_synology -u admin -p 1234 -h nas.intranet"
echo " ./check_snmp_synology -u admin -p 1234 -h nas.intranet -v"
echo " ./check_snmp_synology -2 public -h nas.intranet"
echo " ./check_snmp_synology -2 public -h nas.intranet:10161"
exit 3
}
if [ "$1" == "--help" ]; then
usage; exit 0
fi
while getopts 2:W:C:w:c:u:p:h:iUv OPTNAME; do
case "$OPTNAME" in
u) SNMPUser="$OPTARG";;
p) SNMPPassword="$OPTARG";;
h) hostname="$OPTARG";;
v) verbose="yes";;
2) SNMPVersion="2"
SNMPV2Community="$OPTARG";;
w) warningStorage="$OPTARG";;
c) criticalStorage="$OPTARG";;
W) warningTemperature="$OPTARG";;
C) criticalTemperature="$OPTARG";;
i) checkDSMUpdate="no";;
U) ups="yes";;
*) usage;;
esac
done
if [ "$warningTemperature" -gt "$criticalTemperature" ] ; then
echo "Critical temperature must be higher than warning temperature"
echo "Warning temperature: $warningTemperature"
echo "Critical temperature: $criticalTemperature"
echo ""
echo "For more information: ./${0##*/} --help"
exit 1
fi
if [ "$warningStorage" -gt "$criticalStorage" ] ; then
echo "The Critical storage usage percentage must be higher than the warning storage usage percentage"
echo "Warning: $warningStorage"
echo "Critical: $criticalStorage"
echo ""
echo "For more information: ./${0##*/} --help"
exit 1
fi
if [ "$hostname" = "" ] || ([ "$SNMPVersion" = "3" ] && [ "$SNMPUser" = "" ]) || ([ "$SNMPVersion" = "3" ] && [ "$SNMPPassword" = "" ]) ; then
usage
else
if [ "$SNMPVersion" = "2" ] ; then
SNMPArgs=" -OQne -v 2c -c $SNMPV2Community -t $SNMPTimeout"
else
SNMPArgs=" -OQne -v 3 -u $SNMPUser -A $SNMPPassword -l authNoPriv -a MD5 -t $SNMPTimeout"
if [ ${#SNMPPassword} -lt "8" ] ; then
echo "snmpwalk: (The supplied password is too short.)"
exit 1
fi
fi
tmpRequest=`$SNMPWALK $SNMPArgs $hostname $OID_syno 2> /dev/null`
if [ "$?" != "0" ] ; then
echo "CRITICAL - Problem with SNMP request, check user/password/host"
exit 2
fi
nbDisk=$(echo "$tmpRequest" | grep $OID_diskID | wc -l)
nbRAID=$(echo "$tmpRequest" | grep $OID_RAIDName | wc -l)
for i in `seq 1 $nbDisk`;
do
if [ $i -lt 25 ] ; then
OID_disk="$OID_disk $OID_diskID.$(($i-1)) $OID_diskModel.$(($i-1)) $OID_diskStatus.$(($i-1)) $OID_diskTemp.$(($i-1)) "
else
OID_disk2="$OID_disk2 $OID_diskID.$(($i-1)) $OID_diskModel.$(($i-1)) $OID_diskStatus.$(($i-1)) $OID_diskTemp.$(($i-1)) "
fi
done
for i in `seq 1 $nbRAID`;
do
OID_RAID="$OID_RAID $OID_RAIDName.$(($i-1)) $OID_RAIDStatus.$(($i-1))"
done
if [ "$ups" = "yes" ] ; then
syno=`$SNMPGET $SNMPArgs $hostname $OID_model $OID_serialNumber $OID_DSMVersion $OID_systemStatus $OID_temperature $OID_powerStatus $OID_systemFanStatus $OID_CPUFanStatus $OID_disk $OID_RAID $OID_DSMUpgradeAvailable $OID_UpsModel $OID_UpsSN $OID_UpsStatus $OID_UpsLoad $OID_UpsBatteryCharge $OID_UpsBatteryChargeWarning 2> /dev/null`
else
syno=`$SNMPGET $SNMPArgs $hostname $OID_model $OID_serialNumber $OID_DSMVersion $OID_systemStatus $OID_temperature $OID_powerStatus $OID_systemFanStatus $OID_CPUFanStatus $OID_disk $OID_RAID $OID_DSMUpgradeAvailable 2> /dev/null`
fi
if [ "$OID_disk2" != "" ]; then
syno2=`$SNMPGET $SNMPArgs $hostname $OID_disk2 2> /dev/null`
syno=$(echo "$syno";echo "$syno2";)
fi
model=$(echo "$syno" | grep $OID_model | cut -d "=" -f2)
serialNumber=$(echo "$syno" | grep $OID_serialNumber | cut -d "=" -f2)
DSMVersion=$(echo "$syno" | grep $OID_DSMVersion | cut -d "=" -f2)
healthString="Synology $model (s/n:$serialNumber, $DSMVersion)"
#DSMUpgradeAvailable=$(echo "$syno" | grep $OID_DSMUpgradeAvailable | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
#case $DSMUpgradeAvailable in
#"1") DSMUpgradeAvailable="Available"; if [ "$checkDSMUpdate" = "yes" ]; then healthWarningStatus=1; healthString="$healthString, DSM update available"; fi ;;
#"2") DSMUpgradeAvailable="Unavailable";;
#"3") DSMUpgradeAvailable="Connecting";;
#"4") DSMUpgradeAvailable="Disconnected"; if [ "$checkDSMUpdate" = "yes" ]; then healthWarningStatus=1; healthString="$healthString, DSM Update Disconnected"; fi ;;
#"5") DSMUpgradeAvailable="Others"; if [ "$checkDSMUpdate" = "yes" ]; then healthWarningStatus=1; healthString="$healthString, Check DSM Update"; fi ;;
#esac
RAIDName=$(echo "$syno" | grep $OID_RAIDName | cut -d "=" -f2)
RAIDStatus=$(echo "$syno" | grep $OID_RAIDStatus | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
systemStatus=$(echo "$syno" | grep $OID_systemStatus | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
#temperature=$(echo "$syno" | grep $OID_temperature | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
powerStatus=$(echo "$syno" | grep $OID_powerStatus | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
systemFanStatus=$(echo "$syno" | grep $OID_systemFanStatus | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
CPUFanStatus=$(echo "$syno" | grep $OID_CPUFanStatus | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
#Check system status
if [ "$systemStatus" = "1" ] ; then
systemStatus="Normal"
else
systemStatus="Failed"
healthCriticalStatus=1
healthString="$healthString, System status: $systemStatus "
fi
#Check system temperature
#if [ "$temperature" -gt "$warningTemperature" ] ; then
#if [ "$temperature" -gt "$criticalTemperature" ] ; then
#temperature="$temperature (CRITICAL)"
# healthCriticalStatus=1
# healthString="$healthString, temperature: $temperature "
#else
#temperature="$temperature (WARNING)"
#healthWarningStatus=1
# healthString="$healthString, temperature: $temperature "
#fi
#else
#temperature="$temperature (Normal)"
# fi
#Check power status
if [ "$powerStatus" = "1" ] ; then
powerStatus="Normal"
else
powerStatus="Failed";
healthCriticalStatus=1
healthString="$healthString, Power status: $powerStatus "
fi
#Check system fan status
if [ "$systemFanStatus" = "1" ] ; then
systemFanStatus="Normal"
else
systemFanStatus="Failed";
healthCriticalStatus=1
healthString="$healthString, System fan status: $systemFanStatus "
fi
#Check CPU fan status
if [ "$CPUFanStatus" = "1" ] ; then
CPUFanStatus="Normal"
else
CPUFanStatus="Failed";
healthCriticalStatus=1
healthString="$healthString, CPU fan status: $CPUFanStatus "
fi
#Check all disk status
for i in `seq 1 $nbDisk`;
do
diskID[$i]=$(echo "$syno" | grep "$OID_diskID.$(($i-1)) " | cut -d "=" -f2)
diskModel[$i]=$(echo "$syno" | grep "$OID_diskModel.$(($i-1)) " | cut -d "=" -f2 )
diskStatus[$i]=$(echo "$syno" | grep "$OID_diskStatus.$(($i-1)) " | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
diskTemp[$i]=$(echo "$syno" | grep "$OID_diskTemp.$(($i-1)) " | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
case ${diskStatus[$i]} in
"1") diskStatus[$i]="Normal"; ;;
"2") diskStatus[$i]="Initialized"; ;;
"3") diskStatus[$i]="NotInitialized"; ;;
"4") diskStatus[$i]="SystemPartitionFailed"; healthCriticalStatus=1; healthString="$healthString, problem with ${diskID[$i]} (model:${diskModel[$i]}) status:${diskStatus[$i]} temperature:${diskTemp[$i]}";;
"5") diskStatus[$i]="Crashed"; healthCriticalStatus=1; healthString="$healthString, problem with ${diskID[$i]} (model:${diskModel[$i]}) status:${diskStatus[$i]} temperature:${diskTemp[$i]}";;
esac
if [ "${diskTemp[$i]}" -gt "$warningTemperature" ] ; then
if [ "${diskTemp[$i]}" -gt "$criticalTemperature" ] ; then
diskTemp[$i]="${diskTemp[$i]} (CRITICAL)"
healthCriticalStatus=1;
healthString="$healthString, ${diskID[$i]} temperature: ${diskTemp[$i]}"
else
diskTemp[$i]="${diskTemp[$i]} (WARNING)"
healthWarningStatus=1;
healthString="$healthString, ${diskID[$i]} temperature: ${diskTemp[$i]}"
fi
fi
done
syno_diskspace=`$SNMPWALK $SNMPArgs $hostname $OID_Storage 2> /dev/null`
#Check all RAID volume status
for i in `seq 1 $nbRAID`;
do
RAIDName[$i]=$(echo "$syno" | grep $OID_RAIDName.$(($i-1)) | cut -d "=" -f2)
RAIDStatus[$i]=$(echo "$syno" | grep $OID_RAIDStatus.$(($i-1)) | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
storageName[$i]=$(echo "${RAIDName[$i]}" | sed -e 's/[[:blank:]]//g' | sed -e 's/\"//g' | sed 's/.*/\L&/')
# modified by Tobias Schenke
# "timebackup" (when backup-job runs) and the "docker-feature" (since dsm 6.0, and if installed) mount volumes as a substructure of /"volume1/..." or "/.../volume1/..."
# in this case the former grep failed with more then one result.
# modified script to look for a line with '= "/volume1"' instead of 'volume1'
#storageID[$i]=$(echo "$syno_diskspace" | grep ${storageName[$i]} | cut -d "=" -f1 | rev | cut -d "." -f1 | rev)
storageID[$i]=$(echo "$syno_diskspace" | grep "= \"\?/${storageName[$i]}\"\?" | cut -d "=" -f1 | rev | cut -d "." -f1 | rev)
if [ "${storageID[$i]}" != "" ] ; then
storageSize[$i]=$(echo "$syno_diskspace" | grep "$OID_StorageSize.${storageID[$i]}" | cut -d "=" -f2 )
storageSizeUsed[$i]=$(echo "$syno_diskspace" | grep "$OID_StorageSizeUsed.${storageID[$i]}" | cut -d "=" -f2 )
storageAllocationUnits[$i]=$(echo "$syno_diskspace" | grep "$OID_StorageAllocationUnits.${storageID[$i]}" | cut -d "=" -f2 )
storagePercentUsed[$i]=$((${storageSizeUsed[$i]} * 100 / ${storageSize[$i]}))
storagePercentUsedString[$i]="${storagePercentUsed[$i]}% used"
if [ "${storagePercentUsed[$i]}" -gt "$warningStorage" ] ; then
if [ "${storagePercentUsed[$i]}" -gt "$criticalStorage" ] ; then
healthCriticalStatus=1;
storagePercentUsedString[$i]="${storagePercentUsedString[$i]} CRITICAL"
healthString="$healthString,${RAIDName[$i]}: ${storagePercentUsedString[$i]}"
else
healthWarningStatus=1;
storagePercentUsedString[$i]="${storagePercentUsedString[$i]} WARNING"
healthString="$healthString,${RAIDName[$i]}: ${storagePercentUsedString[$i]}"
fi
fi
fi
case ${RAIDStatus[$i]} in
"1") RAIDStatus[$i]="Normal"; ;;
"2") RAIDStatus[$i]="Repairing"; healthWarningStatus=1; healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
"3") RAIDStatus[$i]="Migrating"; healthWarningStatus=1; healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
"4") RAIDStatus[$i]="Expanding"; healthWarningStatus=1; healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
"5") RAIDStatus[$i]="Deleting"; healthWarningStatus=1; healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
"6") RAIDStatus[$i]="Creating"; healthWarningStatus=1; healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
"7") RAIDStatus[$i]="RaidSyncing"; healthWarningStatus=1; healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
"8") RAIDStatus[$i]="RaidParityChecking"; healthWarningStatus=1; healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
"9") RAIDStatus[$i]="RaidAssembling"; healthWarningStatus=1; healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
"10") RAIDStatus[$i]="Canceling"; healthWarningStatus=1; healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
"11") RAIDStatus[$i]="Degrade"; healthCriticalStatus=1; healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
"12") RAIDStatus[$i]="Crashed"; healthCriticalStatus=1; healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
esac
done
if [ "$verbose" = "yes" ] ; then
echo "Synology model: $model" ;
echo "Synology s/n: $serialNumber" ;
echo "DSM Version: $DSMVersion" ;
echo "DSM update: $DSMUpgradeAvailable" ;
echo "System Status: $systemStatus" ;
echo "Temperature: $temperature" ;
echo "Power Status: $powerStatus" ;
echo "System Fan Status: $systemFanStatus" ;
echo "CPU Fan Status: $CPUFanStatus" ;
echo "Number of disks: $nbDisk" ;
for i in `seq 1 $nbDisk`;
do
echo " ${diskID[$i]} (model:${diskModel[$i]}) status:${diskStatus[$i]} temperature:${diskTemp[$i]}" ;
done
echo "Number of RAID volume: $nbRAID" ;
for i in `seq 1 $nbRAID`;
do
echo " ${RAIDName[$i]} status:${RAIDStatus[$i]} ${storagePercentUsedString[$i]}" ;
done
# Display UPS information
# if [ "$ups" = "yes" ] ; then
# upsModel=$(echo "$syno" | grep $OID_UpsModel | cut -d "=" -f2)
# upsSN=$(echo "$syno" | grep $OID_UpsSN | cut -d "=" -f2)
# upsStatus=$(echo "$syno" | grep $OID_UpsStatus | cut -d "=" -f2)
# upsLoad=$(echo "$syno" | grep $OID_UpsLoad | cut -d "=" -f2)
# upsBatteryCharge=$(echo "$syno" | grep $OID_UpsBatteryCharge | cut -d "=" -f2)
# upsBatteryChargeWarning=$(echo "$syno" | grep $OID_UpsBatteryChargeWarning | cut -d "=" -f2)
# echo "UPS:"
# echo " Model: $upsModel"
# echo " s/n: $upsSN"
# echo " Status: $upsStatus"
# echo " Load: $upsLoad"
# echo " Battery charge: $upsBatteryCharge"
# echo " Battery charge warning:$upsBatteryChargeWarning"
# fi
# echo "";
# fi
# if [ "$healthCriticalStatus" = "1" ] ; then
# echo "CRITICAL - $healthString"
# exit 2
# fi
# if [ "$healthWarningStatus" = "1" ] ; then
# echo "WARNING - $healthString"
# exit 1
# fi
# if [ "$healthCriticalStatus" = "0" ] && [ "$healthWarningStatus" = "0" ] ; then
# echo "OK - $healthString is in good health"
# exit 0
# fi
fi
-
- Posts: 75
- Joined: Wed Dec 26, 2018 2:31 pm
Re: check_snmp_synology - False Positives
Here is the original file any suggestions would be greatly appreciated:
Code: Select all
SNMPWALK=$(which snmpwalk)
SNMPGET=$(which snmpget)
SNMPVersion="3"
SNMPV2Community="public"
SNMPTimeout="15"
warningTemperature="50"
criticalTemperature="60"
warningStorage="80"
criticalStorage="95"
hostname=""
healthWarningStatus=0
healthCriticalStatus=0
healthString=""
verbose="no"
checkDSMUpdate="yes"
ups="no"
#OID declarations
OID_syno="1.3.6.1.4.1.6574"
OID_model="1.3.6.1.4.1.6574.1.5.1.0"
OID_serialNumber="1.3.6.1.4.1.6574.1.5.2.0"
OID_DSMVersion="1.3.6.1.4.1.6574.1.5.3.0"
OID_DSMUpgradeAvailable="1.3.6.1.4.1.6574.1.5.4.0"
OID_systemStatus="1.3.6.1.4.1.6574.1.1.0"
OID_temperature="1.3.6.1.4.1.6574.1.2.0"
OID_powerStatus="1.3.6.1.4.1.6574.1.3.0"
OID_systemFanStatus="1.3.6.1.4.1.6574.1.4.1.0"
OID_CPUFanStatus="1.3.6.1.4.1.6574.1.4.2.0"
OID_disk=""
OID_disk2=""
OID_diskID="1.3.6.1.4.1.6574.2.1.1.2"
OID_diskModel="1.3.6.1.4.1.6574.2.1.1.3"
OID_diskStatus="1.3.6.1.4.1.6574.2.1.1.5"
OID_diskTemp="1.3.6.1.4.1.6574.2.1.1.6"
OID_RAID=""
OID_RAIDName="1.3.6.1.4.1.6574.3.1.1.2"
OID_RAIDStatus="1.3.6.1.4.1.6574.3.1.1.3"
OID_Storage="1.3.6.1.2.1.25.2.3.1"
OID_StorageDesc="1.3.6.1.2.1.25.2.3.1.3"
OID_StorageAllocationUnits="1.3.6.1.2.1.25.2.3.1.4"
OID_StorageSize="1.3.6.1.2.1.25.2.3.1.5"
OID_StorageSizeUsed="1.3.6.1.2.1.25.2.3.1.6"
OID_UpsModel="1.3.6.1.4.1.6574.4.1.1.0"
OID_UpsSN="1.3.6.1.4.1.6574.4.1.3.0"
OID_UpsStatus="1.3.6.1.4.1.6574.4.2.1.0"
OID_UpsLoad="1.3.6.1.4.1.6574.4.2.12.1.0"
OID_UpsBatteryCharge="1.3.6.1.4.1.6574.4.3.1.1.0"
OID_UpsBatteryChargeWarning="1.3.6.1.4.1.6574.4.3.1.4.0"
usage()
{
echo "usage: ./check_snmp_synology [OPTIONS] -u [user] -p [pass] -h [hostname]"
echo "options:"
echo " -u [snmp username] Username for SNMPv3"
echo " -p [snmp password] Password for SNMPv3"
echo ""
echo " -2 [community name] Use SNMPv2 (no need user/password) & define community name (ex: public)"
echo ""
echo " -h [hostname or IP](:port) Hostname or IP. You can also define a different port"
echo ""
echo " -W [warning temp] Warning temperature (for disks & synology) (default $warningTemperature)"
echo " -C [critical temp] Critical temperature (for disks & synology) (default $criticalTemperature)"
echo ""
echo " -w [warning %] Warning storage usage percentage (default $warningStorage)"
echo " -c [critical %] Critical storage usage percentage (default $criticalStorage)"
echo ""
echo " -i Ignore DSM updates"
echo " -U Show informations about the connected UPS (only information, no control)"
echo " -v Verbose - print all informations about your Synology"
echo ""
echo ""
echo "examples: ./check_snmp_synology -u admin -p 1234 -h nas.intranet"
echo " ./check_snmp_synology -u admin -p 1234 -h nas.intranet -v"
echo " ./check_snmp_synology -2 public -h nas.intranet"
echo " ./check_snmp_synology -2 public -h nas.intranet:10161"
exit 3
}
if [ "$1" == "--help" ]; then
usage; exit 0
fi
while getopts 2:W:C:w:c:u:p:h:iUv OPTNAME; do
case "$OPTNAME" in
u) SNMPUser="$OPTARG";;
p) SNMPPassword="$OPTARG";;
h) hostname="$OPTARG";;
v) verbose="yes";;
2) SNMPVersion="2"
SNMPV2Community="$OPTARG";;
w) warningStorage="$OPTARG";;
c) criticalStorage="$OPTARG";;
W) warningTemperature="$OPTARG";;
C) criticalTemperature="$OPTARG";;
i) checkDSMUpdate="no";;
U) ups="yes";;
*) usage;;
esac
done
if [ "$warningTemperature" -gt "$criticalTemperature" ] ; then
echo "Critical temperature must be higher than warning temperature"
echo "Warning temperature: $warningTemperature"
echo "Critical temperature: $criticalTemperature"
echo ""
echo "For more information: ./${0##*/} --help"
exit 1
fi
if [ "$warningStorage" -gt "$criticalStorage" ] ; then
echo "The Critical storage usage percentage must be higher than the warning storage usage percentage"
echo "Warning: $warningStorage"
echo "Critical: $criticalStorage"
echo ""
echo "For more information: ./${0##*/} --help"
exit 1
fi
if [ "$hostname" = "" ] || ([ "$SNMPVersion" = "3" ] && [ "$SNMPUser" = "" ]) || ([ "$SNMPVersion" = "3" ] && [ "$SNMPPassword" = "" ]) ; then
usage
else
if [ "$SNMPVersion" = "2" ] ; then
SNMPArgs=" -OQne -v 2c -c $SNMPV2Community -t $SNMPTimeout"
else
SNMPArgs=" -OQne -v 3 -u $SNMPUser -A $SNMPPassword -l authNoPriv -a MD5 -t $SNMPTimeout"
if [ ${#SNMPPassword} -lt "8" ] ; then
echo "snmpwalk: (The supplied password is too short.)"
exit 1
fi
fi
tmpRequest=`$SNMPWALK $SNMPArgs $hostname $OID_syno 2> /dev/null`
if [ "$?" != "0" ] ; then
echo "CRITICAL - Problem with SNMP request, check user/password/host"
exit 2
fi
nbDisk=$(echo "$tmpRequest" | grep $OID_diskID | wc -l)
nbRAID=$(echo "$tmpRequest" | grep $OID_RAIDName | wc -l)
for i in `seq 1 $nbDisk`;
do
if [ $i -lt 25 ] ; then
OID_disk="$OID_disk $OID_diskID.$(($i-1)) $OID_diskModel.$(($i-1)) $OID_diskStatus.$(($i-1)) $OID_diskTemp.$(($i-1)) "
else
OID_disk2="$OID_disk2 $OID_diskID.$(($i-1)) $OID_diskModel.$(($i-1)) $OID_diskStatus.$(($i-1)) $OID_diskTemp.$(($i-1)) "
fi
done
for i in `seq 1 $nbRAID`;
do
OID_RAID="$OID_RAID $OID_RAIDName.$(($i-1)) $OID_RAIDStatus.$(($i-1))"
done
if [ "$ups" = "yes" ] ; then
syno=`$SNMPGET $SNMPArgs $hostname $OID_model $OID_serialNumber $OID_DSMVersion $OID_systemStatus $OID_temperature $OID_powerStatus $OID_systemFanStatus $OID_CPUFanStatus $OID_disk $OID_RAID $OID_DSMUpgradeAvailable $OID_UpsModel $OID_UpsSN $OID_UpsStatus $OID_UpsLoad $OID_UpsBatteryCharge $OID_UpsBatteryChargeWarning 2> /dev/null`
else
syno=`$SNMPGET $SNMPArgs $hostname $OID_model $OID_serialNumber $OID_DSMVersion $OID_systemStatus $OID_temperature $OID_powerStatus $OID_systemFanStatus $OID_CPUFanStatus $OID_disk $OID_RAID $OID_DSMUpgradeAvailable 2> /dev/null`
fi
if [ "$OID_disk2" != "" ]; then
syno2=`$SNMPGET $SNMPArgs $hostname $OID_disk2 2> /dev/null`
syno=$(echo "$syno";echo "$syno2";)
fi
model=$(echo "$syno" | grep $OID_model | cut -d "=" -f2)
serialNumber=$(echo "$syno" | grep $OID_serialNumber | cut -d "=" -f2)
DSMVersion=$(echo "$syno" | grep $OID_DSMVersion | cut -d "=" -f2)
healthString="Synology $model (s/n:$serialNumber, $DSMVersion)"
DSMUpgradeAvailable=$(echo "$syno" | grep $OID_DSMUpgradeAvailable | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
case $DSMUpgradeAvailable in
"1") DSMUpgradeAvailable="Available"; if [ "$checkDSMUpdate" = "yes" ]; then healthWarningStatus=1; healthString="$healthString, DSM update available"; fi ;;
"2") DSMUpgradeAvailable="Unavailable";;
"3") DSMUpgradeAvailable="Connecting";;
"4") DSMUpgradeAvailable="Disconnected"; if [ "$checkDSMUpdate" = "yes" ]; then healthWarningStatus=1; healthString="$healthString, DSM Update Disconnected"; fi ;;
"5") DSMUpgradeAvailable="Others"; if [ "$checkDSMUpdate" = "yes" ]; then healthWarningStatus=1; healthString="$healthString, Check DSM Update"; fi ;;
esac
RAIDName=$(echo "$syno" | grep $OID_RAIDName | cut -d "=" -f2)
RAIDStatus=$(echo "$syno" | grep $OID_RAIDStatus | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
systemStatus=$(echo "$syno" | grep $OID_systemStatus | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
temperature=$(echo "$syno" | grep $OID_temperature | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
powerStatus=$(echo "$syno" | grep $OID_powerStatus | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
systemFanStatus=$(echo "$syno" | grep $OID_systemFanStatus | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
CPUFanStatus=$(echo "$syno" | grep $OID_CPUFanStatus | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
#Check system status
if [ "$systemStatus" = "1" ] ; then
systemStatus="Normal"
else
systemStatus="Failed"
healthCriticalStatus=1
healthString="$healthString, System status: $systemStatus "
fi
#Check system temperature
if [ "$temperature" -gt "$warningTemperature" ] ; then
if [ "$temperature" -gt "$criticalTemperature" ] ; then
temperature="$temperature (CRITICAL)"
healthCriticalStatus=1
healthString="$healthString, temperature: $temperature "
else
temperature="$temperature (WARNING)"
healthWarningStatus=1
healthString="$healthString, temperature: $temperature "
fi
else
temperature="$temperature (Normal)"
fi
#Check power status
if [ "$powerStatus" = "1" ] ; then
powerStatus="Normal"
else
powerStatus="Failed";
healthCriticalStatus=1
healthString="$healthString, Power status: $powerStatus "
fi
#Check system fan status
if [ "$systemFanStatus" = "1" ] ; then
systemFanStatus="Normal"
else
systemFanStatus="Failed";
healthCriticalStatus=1
healthString="$healthString, System fan status: $systemFanStatus "
fi
#Check CPU fan status
if [ "$CPUFanStatus" = "1" ] ; then
CPUFanStatus="Normal"
else
CPUFanStatus="Failed";
healthCriticalStatus=1
healthString="$healthString, CPU fan status: $CPUFanStatus "
fi
#Check all disk status
for i in `seq 1 $nbDisk`;
do
diskID[$i]=$(echo "$syno" | grep "$OID_diskID.$(($i-1)) " | cut -d "=" -f2)
diskModel[$i]=$(echo "$syno" | grep "$OID_diskModel.$(($i-1)) " | cut -d "=" -f2 )
diskStatus[$i]=$(echo "$syno" | grep "$OID_diskStatus.$(($i-1)) " | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
diskTemp[$i]=$(echo "$syno" | grep "$OID_diskTemp.$(($i-1)) " | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
case ${diskStatus[$i]} in
"1") diskStatus[$i]="Normal"; ;;
"2") diskStatus[$i]="Initialized"; ;;
"3") diskStatus[$i]="NotInitialized"; ;;
"4") diskStatus[$i]="SystemPartitionFailed"; healthCriticalStatus=1; healthString="$healthString, problem with ${diskID[$i]} (model:${diskModel[$i]}) status:${diskStatus[$i]} temperature:${diskTemp[$i]}";;
"5") diskStatus[$i]="Crashed"; healthCriticalStatus=1; healthString="$healthString, problem with ${diskID[$i]} (model:${diskModel[$i]}) status:${diskStatus[$i]} temperature:${diskTemp[$i]}";;
esac
if [ "${diskTemp[$i]}" -gt "$warningTemperature" ] ; then
if [ "${diskTemp[$i]}" -gt "$criticalTemperature" ] ; then
diskTemp[$i]="${diskTemp[$i]} (CRITICAL)"
healthCriticalStatus=1;
healthString="$healthString, ${diskID[$i]} temperature: ${diskTemp[$i]}"
else
diskTemp[$i]="${diskTemp[$i]} (WARNING)"
healthWarningStatus=1;
healthString="$healthString, ${diskID[$i]} temperature: ${diskTemp[$i]}"
fi
fi
done
syno_diskspace=`$SNMPWALK $SNMPArgs $hostname $OID_Storage 2> /dev/null`
#Check all RAID volume status
for i in `seq 1 $nbRAID`;
do
RAIDName[$i]=$(echo "$syno" | grep $OID_RAIDName.$(($i-1)) | cut -d "=" -f2)
RAIDStatus[$i]=$(echo "$syno" | grep $OID_RAIDStatus.$(($i-1)) | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
storageName[$i]=$(echo "${RAIDName[$i]}" | sed -e 's/[[:blank:]]//g' | sed -e 's/\"//g' | sed 's/.*/\L&/')
# modified by Tobias Schenke
# "timebackup" (when backup-job runs) and the "docker-feature" (since dsm 6.0, and if installed) mount volumes as a substructure of /"volume1/..." or "/.../volume1/..."
# in this case the former grep failed with more then one result.
# modified script to look for a line with '= "/volume1"' instead of 'volume1'
#storageID[$i]=$(echo "$syno_diskspace" | grep ${storageName[$i]} | cut -d "=" -f1 | rev | cut -d "." -f1 | rev)
storageID[$i]=$(echo "$syno_diskspace" | grep "= \"\?/${storageName[$i]}\"\?" | cut -d "=" -f1 | rev | cut -d "." -f1 | rev)
if [ "${storageID[$i]}" != "" ] ; then
storageSize[$i]=$(echo "$syno_diskspace" | grep "$OID_StorageSize.${storageID[$i]}" | cut -d "=" -f2 )
storageSizeUsed[$i]=$(echo "$syno_diskspace" | grep "$OID_StorageSizeUsed.${storageID[$i]}" | cut -d "=" -f2 )
storageAllocationUnits[$i]=$(echo "$syno_diskspace" | grep "$OID_StorageAllocationUnits.${storageID[$i]}" | cut -d "=" -f2 )
storagePercentUsed[$i]=$((${storageSizeUsed[$i]} * 100 / ${storageSize[$i]}))
storagePercentUsedString[$i]="${storagePercentUsed[$i]}% used"
if [ "${storagePercentUsed[$i]}" -gt "$warningStorage" ] ; then
if [ "${storagePercentUsed[$i]}" -gt "$criticalStorage" ] ; then
healthCriticalStatus=1;
storagePercentUsedString[$i]="${storagePercentUsedString[$i]} CRITICAL"
healthString="$healthString,${RAIDName[$i]}: ${storagePercentUsedString[$i]}"
else
healthWarningStatus=1;
storagePercentUsedString[$i]="${storagePercentUsedString[$i]} WARNING"
healthString="$healthString,${RAIDName[$i]}: ${storagePercentUsedString[$i]}"
fi
fi
fi
case ${RAIDStatus[$i]} in
"1") RAIDStatus[$i]="Normal"; ;;
"2") RAIDStatus[$i]="Repairing"; healthWarningStatus=1; healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
"3") RAIDStatus[$i]="Migrating"; healthWarningStatus=1; healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
"4") RAIDStatus[$i]="Expanding"; healthWarningStatus=1; healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
"5") RAIDStatus[$i]="Deleting"; healthWarningStatus=1; healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
"6") RAIDStatus[$i]="Creating"; healthWarningStatus=1; healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
"7") RAIDStatus[$i]="RaidSyncing"; healthWarningStatus=1; healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
"8") RAIDStatus[$i]="RaidParityChecking"; healthWarningStatus=1; healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
"9") RAIDStatus[$i]="RaidAssembling"; healthWarningStatus=1; healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
"10") RAIDStatus[$i]="Canceling"; healthWarningStatus=1; healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
"11") RAIDStatus[$i]="Degrade"; healthCriticalStatus=1; healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
"12") RAIDStatus[$i]="Crashed"; healthCriticalStatus=1; healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
esac
done
if [ "$verbose" = "yes" ] ; then
echo "Synology model: $model" ;
echo "Synology s/n: $serialNumber" ;
echo "DSM Version: $DSMVersion" ;
echo "DSM update: $DSMUpgradeAvailable" ;
echo "System Status: $systemStatus" ;
echo "Temperature: $temperature" ;
echo "Power Status: $powerStatus" ;
echo "System Fan Status: $systemFanStatus" ;
echo "CPU Fan Status: $CPUFanStatus" ;
echo "Number of disks: $nbDisk" ;
for i in `seq 1 $nbDisk`;
do
echo " ${diskID[$i]} (model:${diskModel[$i]}) status:${diskStatus[$i]} temperature:${diskTemp[$i]}" ;
done
echo "Number of RAID volume: $nbRAID" ;
for i in `seq 1 $nbRAID`;
do
echo " ${RAIDName[$i]} status:${RAIDStatus[$i]} ${storagePercentUsedString[$i]}" ;
done
# Display UPS information
if [ "$ups" = "yes" ] ; then
upsModel=$(echo "$syno" | grep $OID_UpsModel | cut -d "=" -f2)
upsSN=$(echo "$syno" | grep $OID_UpsSN | cut -d "=" -f2)
upsStatus=$(echo "$syno" | grep $OID_UpsStatus | cut -d "=" -f2)
upsLoad=$(echo "$syno" | grep $OID_UpsLoad | cut -d "=" -f2)
upsBatteryCharge=$(echo "$syno" | grep $OID_UpsBatteryCharge | cut -d "=" -f2)
upsBatteryChargeWarning=$(echo "$syno" | grep $OID_UpsBatteryChargeWarning | cut -d "=" -f2)
echo "UPS:"
echo " Model: $upsModel"
echo " s/n: $upsSN"
echo " Status: $upsStatus"
echo " Load: $upsLoad"
echo " Battery charge: $upsBatteryCharge"
echo " Battery charge warning:$upsBatteryChargeWarning"
fi
echo "";
fi
if [ "$healthCriticalStatus" = "1" ] ; then
echo "CRITICAL - $healthString"
exit 2
fi
if [ "$healthWarningStatus" = "1" ] ; then
echo "WARNING - $healthString"
exit 1
fi
if [ "$healthCriticalStatus" = "0" ] && [ "$healthWarningStatus" = "0" ] ; then
echo "OK - $healthString is in good health"
exit 0
fi
fi
Re: check_snmp_synology - False Positives
I would keep the original file as is. It shouldn't add any additional traffic.
Where were you seeing the timeout messages? You can keep an eye out for them by running:
tail -f /usr/local/nagios/var/nagios.log | grep "DC_****"
Where were you seeing the timeout messages? You can keep an eye out for them by running:
tail -f /usr/local/nagios/var/nagios.log | grep "DC_****"
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
-
- Posts: 75
- Joined: Wed Dec 26, 2018 2:31 pm
Re: check_snmp_synology - False Positives
The global timeout is 180s. Consistently the synology throws false positives for example:
This morning
[12-28-2018 02:52:52] SERVICE ALERT: DC_SAN;Global Health Status;CRITICAL;HARD;3;(Service check timed out after 180.02 seconds)
Service Notification[12-28-2018 02:52:52] SERVICE NOTIFICATION: ccichon;DC_SAN;Global Health Status;CRITICAL;notify-service-by-email;(Service check timed out after 180.02 seconds)
Service Notification[12-28-2018 02:52:52] SERVICE NOTIFICATION: nagiosadmin;DC_SAN;Global Health Status;CRITICAL;notify-service-by-email;(Service check timed out after 180.02 seconds)
Informational Message[12-28-2018 02:52:52] Warning: Check of service 'Global Health Status' on host 'DC_SAN' timed out after 180.019s!
Informational Message[12-28-2018 02:52:52] wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Informational Message[12-28-2018 02:52:52] wproc: host=DC_SAN; service=Global Health Status;
Informational Message[12-28-2018 02:52:52] wproc: CHECK job 137580 from worker Core Worker 14080 timed out after 180.02s
Informational Message[12-28-2018 02:52:52] wproc: Core Worker 14080: job 137580 (pid=58305) timed out. Killing it
Informational Message[12-28-2018 02:47:52] wproc: Core Worker 14081: job 137537 (pid=52055): Dormant child reaped
Service Critical[12-28-2018 02:47:52] SERVICE ALERT: DC_SAN;Global Health Status;CRITICAL;SOFT;2;(Service check timed out after 180.04 seconds)
Informational Message[12-28-2018 02:47:52] Warning: Check of service 'Global Health Status' on host 'DC_SAN' timed out after 180.035s!
Informational Message[12-28-2018 02:47:52] wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Informational Message[12-28-2018 02:47:52] wproc: host=DC_SAN; service=Global Health Status;
Informational Message[12-28-2018 02:47:52] wproc: CHECK job 137537 from worker Core Worker 14081 timed out after 180.04s
Informational Message[12-28-2018 02:47:52] wproc: Core Worker 14081: job 137537 (pid=52055) timed out. Killing it
Informational Message[12-28-2018 02:46:18] Auto-save of retention data completed successfully.
Informational Message[12-28-2018 02:42:52] wproc: Core Worker 14078: job 137492 (pid=45830): Dormant child reaped
Service Critical[12-28-2018 02:42:52] SERVICE ALERT: DC_SAN;Global Health Status;CRITICAL;SOFT;1;(Service check timed out after 180.02 seconds)
Informational Message[12-28-2018 02:42:52] Warning: Check of service 'Global Health Status' on host 'DC_SAN' timed out after 180.017s!
Informational Message[12-28-2018 02:42:52] wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Informational Message[12-28-2018 02:42:52] wproc: host=DC_SAN; service=Global Health Status;
Informational Message[12-28-2018 02:42:52] wproc: CHECK job 137492 from worker Core Worker 14078 timed out after 180.02s
Informational Message[12-28-2018 02:42:52] wproc: Core Worker 14078: job 137492 (pid=45830) timed out. Killing it
This morning
[12-28-2018 02:52:52] SERVICE ALERT: DC_SAN;Global Health Status;CRITICAL;HARD;3;(Service check timed out after 180.02 seconds)
Service Notification[12-28-2018 02:52:52] SERVICE NOTIFICATION: ccichon;DC_SAN;Global Health Status;CRITICAL;notify-service-by-email;(Service check timed out after 180.02 seconds)
Service Notification[12-28-2018 02:52:52] SERVICE NOTIFICATION: nagiosadmin;DC_SAN;Global Health Status;CRITICAL;notify-service-by-email;(Service check timed out after 180.02 seconds)
Informational Message[12-28-2018 02:52:52] Warning: Check of service 'Global Health Status' on host 'DC_SAN' timed out after 180.019s!
Informational Message[12-28-2018 02:52:52] wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Informational Message[12-28-2018 02:52:52] wproc: host=DC_SAN; service=Global Health Status;
Informational Message[12-28-2018 02:52:52] wproc: CHECK job 137580 from worker Core Worker 14080 timed out after 180.02s
Informational Message[12-28-2018 02:52:52] wproc: Core Worker 14080: job 137580 (pid=58305) timed out. Killing it
Informational Message[12-28-2018 02:47:52] wproc: Core Worker 14081: job 137537 (pid=52055): Dormant child reaped
Service Critical[12-28-2018 02:47:52] SERVICE ALERT: DC_SAN;Global Health Status;CRITICAL;SOFT;2;(Service check timed out after 180.04 seconds)
Informational Message[12-28-2018 02:47:52] Warning: Check of service 'Global Health Status' on host 'DC_SAN' timed out after 180.035s!
Informational Message[12-28-2018 02:47:52] wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Informational Message[12-28-2018 02:47:52] wproc: host=DC_SAN; service=Global Health Status;
Informational Message[12-28-2018 02:47:52] wproc: CHECK job 137537 from worker Core Worker 14081 timed out after 180.04s
Informational Message[12-28-2018 02:47:52] wproc: Core Worker 14081: job 137537 (pid=52055) timed out. Killing it
Informational Message[12-28-2018 02:46:18] Auto-save of retention data completed successfully.
Informational Message[12-28-2018 02:42:52] wproc: Core Worker 14078: job 137492 (pid=45830): Dormant child reaped
Service Critical[12-28-2018 02:42:52] SERVICE ALERT: DC_SAN;Global Health Status;CRITICAL;SOFT;1;(Service check timed out after 180.02 seconds)
Informational Message[12-28-2018 02:42:52] Warning: Check of service 'Global Health Status' on host 'DC_SAN' timed out after 180.017s!
Informational Message[12-28-2018 02:42:52] wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Informational Message[12-28-2018 02:42:52] wproc: host=DC_SAN; service=Global Health Status;
Informational Message[12-28-2018 02:42:52] wproc: CHECK job 137492 from worker Core Worker 14078 timed out after 180.02s
Informational Message[12-28-2018 02:42:52] wproc: Core Worker 14078: job 137492 (pid=45830) timed out. Killing it
-
- Posts: 75
- Joined: Wed Dec 26, 2018 2:31 pm
Re: check_snmp_synology - False Positives
I will stop the TCPDump on Monday and compare to the logs, I will probably require some assistance to find the root cause. I am currently checking all of the backup schedules to try and space them out, I wonder if the Synology just cannot handle the backup/replication load and causes the snmp timeouts due to being overloaded with backups? Or is this not possible the way Snmp works.