check_snmp_synology - False Positives

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
chris1337c
Posts: 75
Joined: Wed Dec 26, 2018 2:31 pm

Re: check_snmp_synology - False Positives

Post by chris1337c »

cdienger wrote:Just long enough to see a timeout message like this in the logs:

[12-26-2018 ********] SERVICE ALERT: DC_*****;Global Health Status;CRITICAL;SOFT;1;(Service check timed out after 180.04 seconds)

It should be a small file and I wouldn't be too concerned with it growing large. Feel free to stop and restart it though if the problem doesn't occur. You can also set up a rotating capture:

nohup tcpdump -Z root -s 0 -i any port 161 and host a.b.c.d -C 10 -W 5 -w output.pcap &

The above will start tcpdump and run it in the background - only storing the last 50 megs of data captured in 5 10 meg files(output.pcap0, output.pcap1, etc...). To stop the trace:

pkill tcpdump
Okay I ran the "nohup tcpdump -Z root -s 0 -i any port 161 and host a.b.c.d -C 10 -W 5 -w output.pcap &" command, the response I got was nohup:ignoring input and appending output to 'nohup.out'
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: check_snmp_synology - False Positives

Post by cdienger »

It would be in the location where the command was run. If you logged in as root and ran it without changing directories, it would likely be under /root. You can also run "pwd" on the command line and "ls" to get the present working directory and a listing of its contents.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: check_snmp_synology - False Positives

Post by cdienger »

The "nohup:ignoring input and appending output to 'nohup.out'" message is expected and you can just hit enter to get back to a command line and run "ps aux | grep tcpdump" to verify that it is running.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
chris1337c
Posts: 75
Joined: Wed Dec 26, 2018 2:31 pm

Re: check_snmp_synology - False Positives

Post by chris1337c »

Excellent, I will run this and start the capturing now! I didn't want to let it run without validating the output was okay from you. I tried googling it but didn't get a great understanding of what it meant.
chris1337c
Posts: 75
Joined: Wed Dec 26, 2018 2:31 pm

Re: check_snmp_synology - False Positives

Post by chris1337c »

I also commented out some useless aspects of the plugin like HD temp, DSM version, etc to try and make it as light as possible. Not sure how this will all work out but its progress.
chris1337c
Posts: 75
Joined: Wed Dec 26, 2018 2:31 pm

Re: check_snmp_synology - False Positives

Post by chris1337c »

(No output on stdout) stderr: /usr/local/nagios/libexec/check_snmp_synology: line 410: syntax error: unexpected end of file

Got an error, I removed the comment from the last "fi" in the file. Here is what I have currently:

Code: Select all

SNMPWALK=$(which snmpwalk)
SNMPGET=$(which snmpget)

SNMPVersion="3"
SNMPV2Community="public"
SNMPTimeout="15"
warningTemperature="50"
criticalTemperature="60"
warningStorage="80"
criticalStorage="95"
hostname=""
healthWarningStatus=0
healthCriticalStatus=0
healthString=""
verbose="no"
checkDSMUpdate="yes"
ups="no"

#OID declarations
OID_syno="1.3.6.1.4.1.6574"
OID_model="1.3.6.1.4.1.6574.1.5.1.0"
OID_serialNumber="1.3.6.1.4.1.6574.1.5.2.0"
OID_DSMVersion="1.3.6.1.4.1.6574.1.5.3.0"
OID_DSMUpgradeAvailable="1.3.6.1.4.1.6574.1.5.4.0"
OID_systemStatus="1.3.6.1.4.1.6574.1.1.0"
OID_temperature="1.3.6.1.4.1.6574.1.2.0"
OID_powerStatus="1.3.6.1.4.1.6574.1.3.0"
OID_systemFanStatus="1.3.6.1.4.1.6574.1.4.1.0"
OID_CPUFanStatus="1.3.6.1.4.1.6574.1.4.2.0"

OID_disk=""
OID_disk2=""
OID_diskID="1.3.6.1.4.1.6574.2.1.1.2"
OID_diskModel="1.3.6.1.4.1.6574.2.1.1.3"
OID_diskStatus="1.3.6.1.4.1.6574.2.1.1.5"
OID_diskTemp="1.3.6.1.4.1.6574.2.1.1.6"

OID_RAID=""
OID_RAIDName="1.3.6.1.4.1.6574.3.1.1.2"
OID_RAIDStatus="1.3.6.1.4.1.6574.3.1.1.3"

OID_Storage="1.3.6.1.2.1.25.2.3.1"
OID_StorageDesc="1.3.6.1.2.1.25.2.3.1.3"
OID_StorageAllocationUnits="1.3.6.1.2.1.25.2.3.1.4"
OID_StorageSize="1.3.6.1.2.1.25.2.3.1.5"
OID_StorageSizeUsed="1.3.6.1.2.1.25.2.3.1.6"

OID_UpsModel="1.3.6.1.4.1.6574.4.1.1.0"
OID_UpsSN="1.3.6.1.4.1.6574.4.1.3.0"
OID_UpsStatus="1.3.6.1.4.1.6574.4.2.1.0"
OID_UpsLoad="1.3.6.1.4.1.6574.4.2.12.1.0"
OID_UpsBatteryCharge="1.3.6.1.4.1.6574.4.3.1.1.0"
OID_UpsBatteryChargeWarning="1.3.6.1.4.1.6574.4.3.1.4.0"

usage()
{
        echo "usage: ./check_snmp_synology [OPTIONS] -u [user] -p [pass] -h [hostname]"
        echo "options:"
        echo "            -u [snmp username]   	Username for SNMPv3"
        echo "            -p [snmp password]   	Password for SNMPv3"
        echo ""
        echo "            -2 [community name]	  	Use SNMPv2 (no need user/password) & define community name (ex: public)"
        echo ""
        echo "            -h [hostname or IP](:port)	Hostname or IP. You can also define a different port"
        echo ""
        echo "            -W [warning temp]		Warning temperature (for disks & synology) (default $warningTemperature)"
        echo "            -C [critical temp]		Critical temperature (for disks & synology) (default $criticalTemperature)"
        echo ""
        echo "            -w [warning %]		Warning storage usage percentage (default $warningStorage)"
        echo "            -c [critical %]		Critical storage usage percentage (default $criticalStorage)"
        echo ""
        echo "            -i   			Ignore DSM updates"
        echo "            -U   			Show informations about the connected UPS (only information, no control)"
        echo "            -v   			Verbose - print all informations about your Synology"
        echo ""
        echo ""
        echo "examples:	./check_snmp_synology -u admin -p 1234 -h nas.intranet"	
        echo "	     	./check_snmp_synology -u admin -p 1234 -h nas.intranet -v"	
        echo "		./check_snmp_synology -2 public -h nas.intranet"	
        echo "		./check_snmp_synology -2 public -h nas.intranet:10161"
        exit 3
}

if [ "$1" == "--help" ]; then
    usage; exit 0
fi

while getopts 2:W:C:w:c:u:p:h:iUv OPTNAME; do
        case "$OPTNAME" in
	u)	SNMPUser="$OPTARG";;
        p)	SNMPPassword="$OPTARG";;
        h)	hostname="$OPTARG";;
        v)	verbose="yes";;
	2)	SNMPVersion="2"
		SNMPV2Community="$OPTARG";;
	w)	warningStorage="$OPTARG";;
        c)      criticalStorage="$OPTARG";;
	W)	warningTemperature="$OPTARG";;
	C)	criticalTemperature="$OPTARG";;
	i)	checkDSMUpdate="no";;
	U)	ups="yes";;
        *)	usage;;
        esac
done

if [ "$warningTemperature" -gt "$criticalTemperature" ] ; then
    echo "Critical temperature must be higher than warning temperature"
    echo "Warning temperature: $warningTemperature"
    echo "Critical temperature: $criticalTemperature"
    echo ""
    echo "For more information:  ./${0##*/} --help" 
    exit 1 
fi

if [ "$warningStorage" -gt "$criticalStorage" ] ; then
    echo "The Critical storage usage percentage  must be higher than the warning storage usage percentage"
    echo "Warning: $warningStorage"
    echo "Critical: $criticalStorage"
    echo ""
    echo "For more information:  ./${0##*/} --help"
    exit 1
fi

if [ "$hostname" = "" ] || ([ "$SNMPVersion" = "3" ] && [ "$SNMPUser" = "" ]) || ([ "$SNMPVersion" = "3" ] && [ "$SNMPPassword" = "" ]) ; then
    usage
else
    if [ "$SNMPVersion" = "2" ] ; then
	SNMPArgs=" -OQne -v 2c -c $SNMPV2Community -t $SNMPTimeout"
    else
	SNMPArgs=" -OQne -v 3 -u $SNMPUser -A $SNMPPassword -l authNoPriv -a MD5 -t $SNMPTimeout"
	if [ ${#SNMPPassword} -lt "8" ] ; then
	    echo "snmpwalk:  (The supplied password is too short.)"
	    exit 1
	fi
    fi
    tmpRequest=`$SNMPWALK $SNMPArgs $hostname $OID_syno 2> /dev/null`
    if [ "$?" != "0" ] ; then
        echo "CRITICAL - Problem with SNMP request, check user/password/host"
        exit 2
    fi 
    nbDisk=$(echo "$tmpRequest" | grep $OID_diskID | wc -l)
    nbRAID=$(echo "$tmpRequest" | grep $OID_RAIDName | wc -l)

    for i in `seq 1 $nbDisk`;
    do
	if [ $i -lt 25 ] ; then
	    OID_disk="$OID_disk $OID_diskID.$(($i-1)) $OID_diskModel.$(($i-1)) $OID_diskStatus.$(($i-1)) $OID_diskTemp.$(($i-1)) " 
	else
	    OID_disk2="$OID_disk2 $OID_diskID.$(($i-1)) $OID_diskModel.$(($i-1)) $OID_diskStatus.$(($i-1)) $OID_diskTemp.$(($i-1)) "
	fi   
    done

    for i in `seq 1 $nbRAID`;
    do
      OID_RAID="$OID_RAID $OID_RAIDName.$(($i-1)) $OID_RAIDStatus.$(($i-1))" 
    done

    if [ "$ups" = "yes" ] ; then
	syno=`$SNMPGET $SNMPArgs $hostname $OID_model $OID_serialNumber $OID_DSMVersion $OID_systemStatus $OID_temperature $OID_powerStatus $OID_systemFanStatus $OID_CPUFanStatus $OID_disk $OID_RAID $OID_DSMUpgradeAvailable $OID_UpsModel $OID_UpsSN $OID_UpsStatus $OID_UpsLoad $OID_UpsBatteryCharge $OID_UpsBatteryChargeWarning 2> /dev/null`
    else
	syno=`$SNMPGET $SNMPArgs $hostname $OID_model $OID_serialNumber $OID_DSMVersion $OID_systemStatus $OID_temperature $OID_powerStatus $OID_systemFanStatus $OID_CPUFanStatus $OID_disk $OID_RAID $OID_DSMUpgradeAvailable 2> /dev/null`
    fi

    if [ "$OID_disk2" != "" ]; then
	syno2=`$SNMPGET $SNMPArgs $hostname $OID_disk2 2> /dev/null`
	syno=$(echo "$syno";echo "$syno2";)
    fi

    model=$(echo "$syno" | grep $OID_model | cut -d "=" -f2)
    serialNumber=$(echo "$syno" | grep $OID_serialNumber | cut -d "=" -f2)
    DSMVersion=$(echo "$syno" | grep $OID_DSMVersion | cut -d "=" -f2)

    healthString="Synology $model (s/n:$serialNumber, $DSMVersion)"

    #DSMUpgradeAvailable=$(echo "$syno" | grep $OID_DSMUpgradeAvailable | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
    #case $DSMUpgradeAvailable in
	#"1")	DSMUpgradeAvailable="Available";	if [ "$checkDSMUpdate" = "yes" ]; then healthWarningStatus=1;		healthString="$healthString, DSM update available"; fi ;;
	#"2")	DSMUpgradeAvailable="Unavailable";;
	#"3")	DSMUpgradeAvailable="Connecting";;					
	#"4")	DSMUpgradeAvailable="Disconnected";	if [ "$checkDSMUpdate" = "yes" ]; then healthWarningStatus=1;		healthString="$healthString, DSM Update Disconnected"; fi ;;
	#"5")	DSMUpgradeAvailable="Others";		if [ "$checkDSMUpdate" = "yes" ]; then healthWarningStatus=1;		healthString="$healthString, Check DSM Update"; fi ;;
    #esac

    RAIDName=$(echo "$syno" | grep $OID_RAIDName | cut -d "=" -f2)
    RAIDStatus=$(echo "$syno" | grep $OID_RAIDStatus | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
    systemStatus=$(echo "$syno" | grep $OID_systemStatus | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
    #temperature=$(echo "$syno" | grep $OID_temperature | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
    powerStatus=$(echo "$syno" | grep $OID_powerStatus | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
    systemFanStatus=$(echo "$syno" | grep $OID_systemFanStatus | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
    CPUFanStatus=$(echo "$syno" | grep $OID_CPUFanStatus | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')


    #Check system status
    if [ "$systemStatus" = "1" ] ; then
	systemStatus="Normal"
    else
 	systemStatus="Failed"
        healthCriticalStatus=1
        healthString="$healthString, System status: $systemStatus "
    fi

    #Check system temperature
    #if [ "$temperature" -gt "$warningTemperature" ] ; then
    	#if [ "$temperature" -gt "$criticalTemperature" ] ; then
        	#temperature="$temperature (CRITICAL)"
	       # healthCriticalStatus=1
	       # healthString="$healthString, temperature: $temperature "
	#else
	        #temperature="$temperature (WARNING)"
        	#healthWarningStatus=1
	       # healthString="$healthString, temperature: $temperature "
	#fi
    #else
	#temperature="$temperature (Normal)"
   # fi


    #Check power status
    if [ "$powerStatus" = "1" ] ; then
        powerStatus="Normal"
    else
       	powerStatus="Failed";
        healthCriticalStatus=1
        healthString="$healthString, Power status: $powerStatus "
    fi


    #Check system fan status
    if [ "$systemFanStatus" = "1" ] ; then
        systemFanStatus="Normal"
    else
        systemFanStatus="Failed";		
        healthCriticalStatus=1
        healthString="$healthString, System fan status: $systemFanStatus "
    fi
    

    #Check CPU fan status
    if [ "$CPUFanStatus" = "1" ] ; then
	CPUFanStatus="Normal"
    else
        CPUFanStatus="Failed";		
        healthCriticalStatus=1
        healthString="$healthString, CPU fan status: $CPUFanStatus "
    fi
 

    #Check all disk status
    for i in `seq 1 $nbDisk`;
    do
    	diskID[$i]=$(echo "$syno" | grep "$OID_diskID.$(($i-1)) " | cut -d "=" -f2)
    	diskModel[$i]=$(echo "$syno" | grep "$OID_diskModel.$(($i-1)) " | cut -d "=" -f2 )
    	diskStatus[$i]=$(echo "$syno" | grep "$OID_diskStatus.$(($i-1)) " | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
    	diskTemp[$i]=$(echo "$syno" | grep "$OID_diskTemp.$(($i-1)) " | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')

	case ${diskStatus[$i]} in
		"1")	diskStatus[$i]="Normal";		;;
		"2")	diskStatus[$i]="Initialized";		;;
		"3")	diskStatus[$i]="NotInitialized";	;;
		"4")	diskStatus[$i]="SystemPartitionFailed";	healthCriticalStatus=1; healthString="$healthString, problem with ${diskID[$i]} (model:${diskModel[$i]}) status:${diskStatus[$i]} temperature:${diskTemp[$i]}";;
		"5")	diskStatus[$i]="Crashed";		healthCriticalStatus=1;	healthString="$healthString, problem with ${diskID[$i]} (model:${diskModel[$i]}) status:${diskStatus[$i]} temperature:${diskTemp[$i]}";;
	esac

	if [ "${diskTemp[$i]}" -gt "$warningTemperature" ] ; then
	    if [ "${diskTemp[$i]}" -gt "$criticalTemperature" ] ; then
		diskTemp[$i]="${diskTemp[$i]} (CRITICAL)"
		healthCriticalStatus=1;
		healthString="$healthString, ${diskID[$i]} temperature: ${diskTemp[$i]}"
	    else
		diskTemp[$i]="${diskTemp[$i]} (WARNING)"
		healthWarningStatus=1;
		healthString="$healthString, ${diskID[$i]} temperature: ${diskTemp[$i]}"
	    fi
	fi

     done  

    syno_diskspace=`$SNMPWALK $SNMPArgs $hostname $OID_Storage 2> /dev/null`

    #Check all RAID volume status
    for i in `seq 1 $nbRAID`;
    do
	RAIDName[$i]=$(echo "$syno" | grep $OID_RAIDName.$(($i-1)) | cut -d "=" -f2)
	RAIDStatus[$i]=$(echo "$syno" | grep $OID_RAIDStatus.$(($i-1)) | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')

	storageName[$i]=$(echo "${RAIDName[$i]}" | sed -e 's/[[:blank:]]//g' | sed -e 's/\"//g' | sed 's/.*/\L&/')

	# modified by Tobias Schenke
	# "timebackup" (when backup-job runs) and the "docker-feature" (since dsm 6.0, and if installed) mount volumes as a substructure of /"volume1/..." or "/.../volume1/..."
	# in this case the former grep failed with more then one result.
	# modified script to look for a line with '= "/volume1"' instead of 'volume1'
	#storageID[$i]=$(echo "$syno_diskspace" | grep ${storageName[$i]} | cut -d "=" -f1 | rev | cut -d "." -f1 | rev)
	storageID[$i]=$(echo "$syno_diskspace" | grep "= \"\?/${storageName[$i]}\"\?" | cut -d "=" -f1 | rev | cut -d "." -f1 | rev)

	if [ "${storageID[$i]}" != "" ] ; then
	    storageSize[$i]=$(echo "$syno_diskspace" | grep "$OID_StorageSize.${storageID[$i]}" | cut -d "=" -f2 )
	    storageSizeUsed[$i]=$(echo "$syno_diskspace" | grep "$OID_StorageSizeUsed.${storageID[$i]}" | cut -d "=" -f2 )
	    storageAllocationUnits[$i]=$(echo "$syno_diskspace" | grep "$OID_StorageAllocationUnits.${storageID[$i]}" | cut -d "=" -f2 )
	    storagePercentUsed[$i]=$((${storageSizeUsed[$i]} * 100 / ${storageSize[$i]}))
	    storagePercentUsedString[$i]="${storagePercentUsed[$i]}% used"

	    if [ "${storagePercentUsed[$i]}" -gt "$warningStorage" ] ; then
	    	if [ "${storagePercentUsed[$i]}" -gt "$criticalStorage" ] ; then
                    healthCriticalStatus=1;
		    storagePercentUsedString[$i]="${storagePercentUsedString[$i]} CRITICAL"
                    healthString="$healthString,${RAIDName[$i]}: ${storagePercentUsedString[$i]}"
            	else
                    healthWarningStatus=1;
		    storagePercentUsedString[$i]="${storagePercentUsedString[$i]} WARNING"
                    healthString="$healthString,${RAIDName[$i]}: ${storagePercentUsedString[$i]}"
            	fi
            fi
	fi

        case ${RAIDStatus[$i]} in
		"1")	RAIDStatus[$i]="Normal";		;;
		"2")	RAIDStatus[$i]="Repairing";		healthWarningStatus=1;		healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
		"3")	RAIDStatus[$i]="Migrating";		healthWarningStatus=1;		healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
		"4")	RAIDStatus[$i]="Expanding";		healthWarningStatus=1;		healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
		"5")	RAIDStatus[$i]="Deleting";		healthWarningStatus=1;		healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
		"6")	RAIDStatus[$i]="Creating";		healthWarningStatus=1;		healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
		"7")	RAIDStatus[$i]="RaidSyncing";		healthWarningStatus=1;		healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
		"8")	RAIDStatus[$i]="RaidParityChecking";	healthWarningStatus=1;		healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
		"9")	RAIDStatus[$i]="RaidAssembling";	healthWarningStatus=1;		healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
		"10")	RAIDStatus[$i]="Canceling";		healthWarningStatus=1;		healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
		"11")	RAIDStatus[$i]="Degrade";		healthCriticalStatus=1;		healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
		"12")	RAIDStatus[$i]="Crashed";		healthCriticalStatus=1;		healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
        esac
    done

    if [ "$verbose" = "yes" ] ; then    
	echo "Synology model:		$model" ; 
	echo "Synology s/n:		$serialNumber" ;
	echo "DSM Version:		$DSMVersion" ; 
	echo "DSM update:		 $DSMUpgradeAvailable" ; 
	echo "System Status:		 $systemStatus" ;
	echo "Temperature:		 $temperature" ;
	echo "Power Status:		 $powerStatus" ; 
	echo "System Fan Status:	 $systemFanStatus" ;
	echo "CPU Fan Status:		 $CPUFanStatus" ; 
	echo "Number of disks:         $nbDisk" ;
 	for i in `seq 1 $nbDisk`;
    	do
		echo " ${diskID[$i]} (model:${diskModel[$i]}) status:${diskStatus[$i]} temperature:${diskTemp[$i]}" ;
	done 
	echo "Number of RAID volume:   $nbRAID" ;
 	for i in `seq 1 $nbRAID`;
    	do
		echo " ${RAIDName[$i]} status:${RAIDStatus[$i]} ${storagePercentUsedString[$i]}" ;
	done 

	# Display UPS information
    	# if [ "$ups" = "yes" ] ; then
           #  upsModel=$(echo "$syno" | grep $OID_UpsModel | cut -d "=" -f2)
            # upsSN=$(echo "$syno" | grep $OID_UpsSN | cut -d "=" -f2)
           #  upsStatus=$(echo "$syno" | grep $OID_UpsStatus | cut -d "=" -f2)
           #  upsLoad=$(echo "$syno" | grep $OID_UpsLoad | cut -d "=" -f2)
         #    upsBatteryCharge=$(echo "$syno" | grep $OID_UpsBatteryCharge | cut -d "=" -f2)
         #    upsBatteryChargeWarning=$(echo "$syno" | grep $OID_UpsBatteryChargeWarning | cut -d "=" -f2)

	   #  echo "UPS:"
	   #  echo "  Model:		$upsModel"
	   #  echo "  s/n:			$upsSN"
	  #   echo "  Status:		$upsStatus"
	   #  echo "  Load:			$upsLoad"
	  #   echo "  Battery charge:	$upsBatteryCharge"
	   #  echo "  Battery charge warning:$upsBatteryChargeWarning"
	# fi

	# echo "";
   #  fi

   #  if [ "$healthCriticalStatus" = "1" ] ; then
	# echo "CRITICAL - $healthString"
	# exit 2
   #  fi
   #  if [ "$healthWarningStatus" = "1" ] ; then
	# echo "WARNING - $healthString"
	# exit 1
   #  fi
   #  if [ "$healthCriticalStatus" = "0" ] && [ "$healthWarningStatus" = "0" ] ; then
	# echo "OK - $healthString is in good health"
	# exit 0
  #   fi
 fi
chris1337c
Posts: 75
Joined: Wed Dec 26, 2018 2:31 pm

Re: check_snmp_synology - False Positives

Post by chris1337c »

Here is the original file any suggestions would be greatly appreciated:

Code: Select all

SNMPWALK=$(which snmpwalk)
SNMPGET=$(which snmpget)

SNMPVersion="3"
SNMPV2Community="public"
SNMPTimeout="15"
warningTemperature="50"
criticalTemperature="60"
warningStorage="80"
criticalStorage="95"
hostname=""
healthWarningStatus=0
healthCriticalStatus=0
healthString=""
verbose="no"
checkDSMUpdate="yes"
ups="no"

#OID declarations
OID_syno="1.3.6.1.4.1.6574"
OID_model="1.3.6.1.4.1.6574.1.5.1.0"
OID_serialNumber="1.3.6.1.4.1.6574.1.5.2.0"
OID_DSMVersion="1.3.6.1.4.1.6574.1.5.3.0"
OID_DSMUpgradeAvailable="1.3.6.1.4.1.6574.1.5.4.0"
OID_systemStatus="1.3.6.1.4.1.6574.1.1.0"
OID_temperature="1.3.6.1.4.1.6574.1.2.0"
OID_powerStatus="1.3.6.1.4.1.6574.1.3.0"
OID_systemFanStatus="1.3.6.1.4.1.6574.1.4.1.0"
OID_CPUFanStatus="1.3.6.1.4.1.6574.1.4.2.0"

OID_disk=""
OID_disk2=""
OID_diskID="1.3.6.1.4.1.6574.2.1.1.2"
OID_diskModel="1.3.6.1.4.1.6574.2.1.1.3"
OID_diskStatus="1.3.6.1.4.1.6574.2.1.1.5"
OID_diskTemp="1.3.6.1.4.1.6574.2.1.1.6"

OID_RAID=""
OID_RAIDName="1.3.6.1.4.1.6574.3.1.1.2"
OID_RAIDStatus="1.3.6.1.4.1.6574.3.1.1.3"

OID_Storage="1.3.6.1.2.1.25.2.3.1"
OID_StorageDesc="1.3.6.1.2.1.25.2.3.1.3"
OID_StorageAllocationUnits="1.3.6.1.2.1.25.2.3.1.4"
OID_StorageSize="1.3.6.1.2.1.25.2.3.1.5"
OID_StorageSizeUsed="1.3.6.1.2.1.25.2.3.1.6"

OID_UpsModel="1.3.6.1.4.1.6574.4.1.1.0"
OID_UpsSN="1.3.6.1.4.1.6574.4.1.3.0"
OID_UpsStatus="1.3.6.1.4.1.6574.4.2.1.0"
OID_UpsLoad="1.3.6.1.4.1.6574.4.2.12.1.0"
OID_UpsBatteryCharge="1.3.6.1.4.1.6574.4.3.1.1.0"
OID_UpsBatteryChargeWarning="1.3.6.1.4.1.6574.4.3.1.4.0"

usage()
{
        echo "usage: ./check_snmp_synology [OPTIONS] -u [user] -p [pass] -h [hostname]"
        echo "options:"
        echo "            -u [snmp username]   	Username for SNMPv3"
        echo "            -p [snmp password]   	Password for SNMPv3"
        echo ""
        echo "            -2 [community name]	  	Use SNMPv2 (no need user/password) & define community name (ex: public)"
        echo ""
        echo "            -h [hostname or IP](:port)	Hostname or IP. You can also define a different port"
        echo ""
        echo "            -W [warning temp]		Warning temperature (for disks & synology) (default $warningTemperature)"
        echo "            -C [critical temp]		Critical temperature (for disks & synology) (default $criticalTemperature)"
        echo ""
        echo "            -w [warning %]		Warning storage usage percentage (default $warningStorage)"
        echo "            -c [critical %]		Critical storage usage percentage (default $criticalStorage)"
        echo ""
        echo "            -i   			Ignore DSM updates"
        echo "            -U   			Show informations about the connected UPS (only information, no control)"
        echo "            -v   			Verbose - print all informations about your Synology"
        echo ""
        echo ""
        echo "examples:	./check_snmp_synology -u admin -p 1234 -h nas.intranet"	
        echo "	     	./check_snmp_synology -u admin -p 1234 -h nas.intranet -v"	
        echo "		./check_snmp_synology -2 public -h nas.intranet"	
        echo "		./check_snmp_synology -2 public -h nas.intranet:10161"
        exit 3
}

if [ "$1" == "--help" ]; then
    usage; exit 0
fi

while getopts 2:W:C:w:c:u:p:h:iUv OPTNAME; do
        case "$OPTNAME" in
	u)	SNMPUser="$OPTARG";;
        p)	SNMPPassword="$OPTARG";;
        h)	hostname="$OPTARG";;
        v)	verbose="yes";;
	2)	SNMPVersion="2"
		SNMPV2Community="$OPTARG";;
	w)	warningStorage="$OPTARG";;
        c)      criticalStorage="$OPTARG";;
	W)	warningTemperature="$OPTARG";;
	C)	criticalTemperature="$OPTARG";;
	i)	checkDSMUpdate="no";;
	U)	ups="yes";;
        *)	usage;;
        esac
done

if [ "$warningTemperature" -gt "$criticalTemperature" ] ; then
    echo "Critical temperature must be higher than warning temperature"
    echo "Warning temperature: $warningTemperature"
    echo "Critical temperature: $criticalTemperature"
    echo ""
    echo "For more information:  ./${0##*/} --help" 
    exit 1 
fi

if [ "$warningStorage" -gt "$criticalStorage" ] ; then
    echo "The Critical storage usage percentage  must be higher than the warning storage usage percentage"
    echo "Warning: $warningStorage"
    echo "Critical: $criticalStorage"
    echo ""
    echo "For more information:  ./${0##*/} --help"
    exit 1
fi

if [ "$hostname" = "" ] || ([ "$SNMPVersion" = "3" ] && [ "$SNMPUser" = "" ]) || ([ "$SNMPVersion" = "3" ] && [ "$SNMPPassword" = "" ]) ; then
    usage
else
    if [ "$SNMPVersion" = "2" ] ; then
	SNMPArgs=" -OQne -v 2c -c $SNMPV2Community -t $SNMPTimeout"
    else
	SNMPArgs=" -OQne -v 3 -u $SNMPUser -A $SNMPPassword -l authNoPriv -a MD5 -t $SNMPTimeout"
	if [ ${#SNMPPassword} -lt "8" ] ; then
	    echo "snmpwalk:  (The supplied password is too short.)"
	    exit 1
	fi
    fi
    tmpRequest=`$SNMPWALK $SNMPArgs $hostname $OID_syno 2> /dev/null`
    if [ "$?" != "0" ] ; then
        echo "CRITICAL - Problem with SNMP request, check user/password/host"
        exit 2
    fi 
    nbDisk=$(echo "$tmpRequest" | grep $OID_diskID | wc -l)
    nbRAID=$(echo "$tmpRequest" | grep $OID_RAIDName | wc -l)

    for i in `seq 1 $nbDisk`;
    do
	if [ $i -lt 25 ] ; then
	    OID_disk="$OID_disk $OID_diskID.$(($i-1)) $OID_diskModel.$(($i-1)) $OID_diskStatus.$(($i-1)) $OID_diskTemp.$(($i-1)) " 
	else
	    OID_disk2="$OID_disk2 $OID_diskID.$(($i-1)) $OID_diskModel.$(($i-1)) $OID_diskStatus.$(($i-1)) $OID_diskTemp.$(($i-1)) "
	fi   
    done

    for i in `seq 1 $nbRAID`;
    do
      OID_RAID="$OID_RAID $OID_RAIDName.$(($i-1)) $OID_RAIDStatus.$(($i-1))" 
    done

    if [ "$ups" = "yes" ] ; then
	syno=`$SNMPGET $SNMPArgs $hostname $OID_model $OID_serialNumber $OID_DSMVersion $OID_systemStatus $OID_temperature $OID_powerStatus $OID_systemFanStatus $OID_CPUFanStatus $OID_disk $OID_RAID $OID_DSMUpgradeAvailable $OID_UpsModel $OID_UpsSN $OID_UpsStatus $OID_UpsLoad $OID_UpsBatteryCharge $OID_UpsBatteryChargeWarning 2> /dev/null`
    else
	syno=`$SNMPGET $SNMPArgs $hostname $OID_model $OID_serialNumber $OID_DSMVersion $OID_systemStatus $OID_temperature $OID_powerStatus $OID_systemFanStatus $OID_CPUFanStatus $OID_disk $OID_RAID $OID_DSMUpgradeAvailable 2> /dev/null`
    fi

    if [ "$OID_disk2" != "" ]; then
	syno2=`$SNMPGET $SNMPArgs $hostname $OID_disk2 2> /dev/null`
	syno=$(echo "$syno";echo "$syno2";)
    fi

    model=$(echo "$syno" | grep $OID_model | cut -d "=" -f2)
    serialNumber=$(echo "$syno" | grep $OID_serialNumber | cut -d "=" -f2)
    DSMVersion=$(echo "$syno" | grep $OID_DSMVersion | cut -d "=" -f2)

    healthString="Synology $model (s/n:$serialNumber, $DSMVersion)"

    DSMUpgradeAvailable=$(echo "$syno" | grep $OID_DSMUpgradeAvailable | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
    case $DSMUpgradeAvailable in
	"1")	DSMUpgradeAvailable="Available";	if [ "$checkDSMUpdate" = "yes" ]; then healthWarningStatus=1;		healthString="$healthString, DSM update available"; fi ;;
	"2")	DSMUpgradeAvailable="Unavailable";;
	"3")	DSMUpgradeAvailable="Connecting";;					
	"4")	DSMUpgradeAvailable="Disconnected";	if [ "$checkDSMUpdate" = "yes" ]; then healthWarningStatus=1;		healthString="$healthString, DSM Update Disconnected"; fi ;;
	"5")	DSMUpgradeAvailable="Others";		if [ "$checkDSMUpdate" = "yes" ]; then healthWarningStatus=1;		healthString="$healthString, Check DSM Update"; fi ;;
    esac

    RAIDName=$(echo "$syno" | grep $OID_RAIDName | cut -d "=" -f2)
    RAIDStatus=$(echo "$syno" | grep $OID_RAIDStatus | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
    systemStatus=$(echo "$syno" | grep $OID_systemStatus | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
    temperature=$(echo "$syno" | grep $OID_temperature | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
    powerStatus=$(echo "$syno" | grep $OID_powerStatus | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
    systemFanStatus=$(echo "$syno" | grep $OID_systemFanStatus | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
    CPUFanStatus=$(echo "$syno" | grep $OID_CPUFanStatus | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')


    #Check system status
    if [ "$systemStatus" = "1" ] ; then
	systemStatus="Normal"
    else
 	systemStatus="Failed"
        healthCriticalStatus=1
        healthString="$healthString, System status: $systemStatus "
    fi

    #Check system temperature
    if [ "$temperature" -gt "$warningTemperature" ] ; then
    	if [ "$temperature" -gt "$criticalTemperature" ] ; then
        	temperature="$temperature (CRITICAL)"
	        healthCriticalStatus=1
	        healthString="$healthString, temperature: $temperature "
	else
	        temperature="$temperature (WARNING)"
        	healthWarningStatus=1
	        healthString="$healthString, temperature: $temperature "
	fi
    else
	temperature="$temperature (Normal)"
    fi


    #Check power status
    if [ "$powerStatus" = "1" ] ; then
        powerStatus="Normal"
    else
       	powerStatus="Failed";
        healthCriticalStatus=1
        healthString="$healthString, Power status: $powerStatus "
    fi


    #Check system fan status
    if [ "$systemFanStatus" = "1" ] ; then
        systemFanStatus="Normal"
    else
        systemFanStatus="Failed";		
        healthCriticalStatus=1
        healthString="$healthString, System fan status: $systemFanStatus "
    fi
    

    #Check CPU fan status
    if [ "$CPUFanStatus" = "1" ] ; then
	CPUFanStatus="Normal"
    else
        CPUFanStatus="Failed";		
        healthCriticalStatus=1
        healthString="$healthString, CPU fan status: $CPUFanStatus "
    fi
 

    #Check all disk status
    for i in `seq 1 $nbDisk`;
    do
    	diskID[$i]=$(echo "$syno" | grep "$OID_diskID.$(($i-1)) " | cut -d "=" -f2)
    	diskModel[$i]=$(echo "$syno" | grep "$OID_diskModel.$(($i-1)) " | cut -d "=" -f2 )
    	diskStatus[$i]=$(echo "$syno" | grep "$OID_diskStatus.$(($i-1)) " | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')
    	diskTemp[$i]=$(echo "$syno" | grep "$OID_diskTemp.$(($i-1)) " | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')

	case ${diskStatus[$i]} in
		"1")	diskStatus[$i]="Normal";		;;
		"2")	diskStatus[$i]="Initialized";		;;
		"3")	diskStatus[$i]="NotInitialized";	;;
		"4")	diskStatus[$i]="SystemPartitionFailed";	healthCriticalStatus=1; healthString="$healthString, problem with ${diskID[$i]} (model:${diskModel[$i]}) status:${diskStatus[$i]} temperature:${diskTemp[$i]}";;
		"5")	diskStatus[$i]="Crashed";		healthCriticalStatus=1;	healthString="$healthString, problem with ${diskID[$i]} (model:${diskModel[$i]}) status:${diskStatus[$i]} temperature:${diskTemp[$i]}";;
	esac

	if [ "${diskTemp[$i]}" -gt "$warningTemperature" ] ; then
	    if [ "${diskTemp[$i]}" -gt "$criticalTemperature" ] ; then
		diskTemp[$i]="${diskTemp[$i]} (CRITICAL)"
		healthCriticalStatus=1;
		healthString="$healthString, ${diskID[$i]} temperature: ${diskTemp[$i]}"
	    else
		diskTemp[$i]="${diskTemp[$i]} (WARNING)"
		healthWarningStatus=1;
		healthString="$healthString, ${diskID[$i]} temperature: ${diskTemp[$i]}"
	    fi
	fi

     done  

    syno_diskspace=`$SNMPWALK $SNMPArgs $hostname $OID_Storage 2> /dev/null`

    #Check all RAID volume status
    for i in `seq 1 $nbRAID`;
    do
	RAIDName[$i]=$(echo "$syno" | grep $OID_RAIDName.$(($i-1)) | cut -d "=" -f2)
	RAIDStatus[$i]=$(echo "$syno" | grep $OID_RAIDStatus.$(($i-1)) | cut -d "=" -f2 | sed 's/^[ \t]*//;s/[ \t]*$//')

	storageName[$i]=$(echo "${RAIDName[$i]}" | sed -e 's/[[:blank:]]//g' | sed -e 's/\"//g' | sed 's/.*/\L&/')

	# modified by Tobias Schenke
	# "timebackup" (when backup-job runs) and the "docker-feature" (since dsm 6.0, and if installed) mount volumes as a substructure of /"volume1/..." or "/.../volume1/..."
	# in this case the former grep failed with more then one result.
	# modified script to look for a line with '= "/volume1"' instead of 'volume1'
	#storageID[$i]=$(echo "$syno_diskspace" | grep ${storageName[$i]} | cut -d "=" -f1 | rev | cut -d "." -f1 | rev)
	storageID[$i]=$(echo "$syno_diskspace" | grep "= \"\?/${storageName[$i]}\"\?" | cut -d "=" -f1 | rev | cut -d "." -f1 | rev)

	if [ "${storageID[$i]}" != "" ] ; then
	    storageSize[$i]=$(echo "$syno_diskspace" | grep "$OID_StorageSize.${storageID[$i]}" | cut -d "=" -f2 )
	    storageSizeUsed[$i]=$(echo "$syno_diskspace" | grep "$OID_StorageSizeUsed.${storageID[$i]}" | cut -d "=" -f2 )
	    storageAllocationUnits[$i]=$(echo "$syno_diskspace" | grep "$OID_StorageAllocationUnits.${storageID[$i]}" | cut -d "=" -f2 )
	    storagePercentUsed[$i]=$((${storageSizeUsed[$i]} * 100 / ${storageSize[$i]}))
	    storagePercentUsedString[$i]="${storagePercentUsed[$i]}% used"

	    if [ "${storagePercentUsed[$i]}" -gt "$warningStorage" ] ; then
	    	if [ "${storagePercentUsed[$i]}" -gt "$criticalStorage" ] ; then
                    healthCriticalStatus=1;
		    storagePercentUsedString[$i]="${storagePercentUsedString[$i]} CRITICAL"
                    healthString="$healthString,${RAIDName[$i]}: ${storagePercentUsedString[$i]}"
            	else
                    healthWarningStatus=1;
		    storagePercentUsedString[$i]="${storagePercentUsedString[$i]} WARNING"
                    healthString="$healthString,${RAIDName[$i]}: ${storagePercentUsedString[$i]}"
            	fi
            fi
	fi

        case ${RAIDStatus[$i]} in
		"1")	RAIDStatus[$i]="Normal";		;;
		"2")	RAIDStatus[$i]="Repairing";		healthWarningStatus=1;		healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
		"3")	RAIDStatus[$i]="Migrating";		healthWarningStatus=1;		healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
		"4")	RAIDStatus[$i]="Expanding";		healthWarningStatus=1;		healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
		"5")	RAIDStatus[$i]="Deleting";		healthWarningStatus=1;		healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
		"6")	RAIDStatus[$i]="Creating";		healthWarningStatus=1;		healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
		"7")	RAIDStatus[$i]="RaidSyncing";		healthWarningStatus=1;		healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
		"8")	RAIDStatus[$i]="RaidParityChecking";	healthWarningStatus=1;		healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
		"9")	RAIDStatus[$i]="RaidAssembling";	healthWarningStatus=1;		healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
		"10")	RAIDStatus[$i]="Canceling";		healthWarningStatus=1;		healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
		"11")	RAIDStatus[$i]="Degrade";		healthCriticalStatus=1;		healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
		"12")	RAIDStatus[$i]="Crashed";		healthCriticalStatus=1;		healthString="$healthString, RAID status (${RAIDName[$i]}): ${RAIDStatus[$i]} ";;
        esac
    done

    if [ "$verbose" = "yes" ] ; then    
	echo "Synology model:		$model" ; 
	echo "Synology s/n:		$serialNumber" ;
	echo "DSM Version:		$DSMVersion" ; 
	echo "DSM update:		 $DSMUpgradeAvailable" ; 
	echo "System Status:		 $systemStatus" ;
	echo "Temperature:		 $temperature" ;
	echo "Power Status:		 $powerStatus" ; 
	echo "System Fan Status:	 $systemFanStatus" ;
	echo "CPU Fan Status:		 $CPUFanStatus" ; 
	echo "Number of disks:         $nbDisk" ;
 	for i in `seq 1 $nbDisk`;
    	do
		echo " ${diskID[$i]} (model:${diskModel[$i]}) status:${diskStatus[$i]} temperature:${diskTemp[$i]}" ;
	done 
	echo "Number of RAID volume:   $nbRAID" ;
 	for i in `seq 1 $nbRAID`;
    	do
		echo " ${RAIDName[$i]} status:${RAIDStatus[$i]} ${storagePercentUsedString[$i]}" ;
	done 

	# Display UPS information
    	if [ "$ups" = "yes" ] ; then
            upsModel=$(echo "$syno" | grep $OID_UpsModel | cut -d "=" -f2)
            upsSN=$(echo "$syno" | grep $OID_UpsSN | cut -d "=" -f2)
            upsStatus=$(echo "$syno" | grep $OID_UpsStatus | cut -d "=" -f2)
            upsLoad=$(echo "$syno" | grep $OID_UpsLoad | cut -d "=" -f2)
            upsBatteryCharge=$(echo "$syno" | grep $OID_UpsBatteryCharge | cut -d "=" -f2)
            upsBatteryChargeWarning=$(echo "$syno" | grep $OID_UpsBatteryChargeWarning | cut -d "=" -f2)

	    echo "UPS:"
	    echo "  Model:		$upsModel"
	    echo "  s/n:			$upsSN"
	    echo "  Status:		$upsStatus"
	    echo "  Load:			$upsLoad"
	    echo "  Battery charge:	$upsBatteryCharge"
	    echo "  Battery charge warning:$upsBatteryChargeWarning"
	fi

	echo "";
    fi

    if [ "$healthCriticalStatus" = "1" ] ; then
	echo "CRITICAL - $healthString"
	exit 2
    fi
    if [ "$healthWarningStatus" = "1" ] ; then
	echo "WARNING - $healthString"
	exit 1
    fi
    if [ "$healthCriticalStatus" = "0" ] && [ "$healthWarningStatus" = "0" ] ; then
	echo "OK - $healthString is in good health"
	exit 0
    fi
fi
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: check_snmp_synology - False Positives

Post by cdienger »

I would keep the original file as is. It shouldn't add any additional traffic.

Where were you seeing the timeout messages? You can keep an eye out for them by running:

tail -f /usr/local/nagios/var/nagios.log | grep "DC_****"
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
chris1337c
Posts: 75
Joined: Wed Dec 26, 2018 2:31 pm

Re: check_snmp_synology - False Positives

Post by chris1337c »

The global timeout is 180s. Consistently the synology throws false positives for example:

This morning
[12-28-2018 02:52:52] SERVICE ALERT: DC_SAN;Global Health Status;CRITICAL;HARD;3;(Service check timed out after 180.02 seconds)
Service Notification[12-28-2018 02:52:52] SERVICE NOTIFICATION: ccichon;DC_SAN;Global Health Status;CRITICAL;notify-service-by-email;(Service check timed out after 180.02 seconds)
Service Notification[12-28-2018 02:52:52] SERVICE NOTIFICATION: nagiosadmin;DC_SAN;Global Health Status;CRITICAL;notify-service-by-email;(Service check timed out after 180.02 seconds)
Informational Message[12-28-2018 02:52:52] Warning: Check of service 'Global Health Status' on host 'DC_SAN' timed out after 180.019s!
Informational Message[12-28-2018 02:52:52] wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Informational Message[12-28-2018 02:52:52] wproc: host=DC_SAN; service=Global Health Status;
Informational Message[12-28-2018 02:52:52] wproc: CHECK job 137580 from worker Core Worker 14080 timed out after 180.02s
Informational Message[12-28-2018 02:52:52] wproc: Core Worker 14080: job 137580 (pid=58305) timed out. Killing it
Informational Message[12-28-2018 02:47:52] wproc: Core Worker 14081: job 137537 (pid=52055): Dormant child reaped
Service Critical[12-28-2018 02:47:52] SERVICE ALERT: DC_SAN;Global Health Status;CRITICAL;SOFT;2;(Service check timed out after 180.04 seconds)
Informational Message[12-28-2018 02:47:52] Warning: Check of service 'Global Health Status' on host 'DC_SAN' timed out after 180.035s!
Informational Message[12-28-2018 02:47:52] wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Informational Message[12-28-2018 02:47:52] wproc: host=DC_SAN; service=Global Health Status;
Informational Message[12-28-2018 02:47:52] wproc: CHECK job 137537 from worker Core Worker 14081 timed out after 180.04s
Informational Message[12-28-2018 02:47:52] wproc: Core Worker 14081: job 137537 (pid=52055) timed out. Killing it
Informational Message[12-28-2018 02:46:18] Auto-save of retention data completed successfully.
Informational Message[12-28-2018 02:42:52] wproc: Core Worker 14078: job 137492 (pid=45830): Dormant child reaped
Service Critical[12-28-2018 02:42:52] SERVICE ALERT: DC_SAN;Global Health Status;CRITICAL;SOFT;1;(Service check timed out after 180.02 seconds)
Informational Message[12-28-2018 02:42:52] Warning: Check of service 'Global Health Status' on host 'DC_SAN' timed out after 180.017s!
Informational Message[12-28-2018 02:42:52] wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Informational Message[12-28-2018 02:42:52] wproc: host=DC_SAN; service=Global Health Status;
Informational Message[12-28-2018 02:42:52] wproc: CHECK job 137492 from worker Core Worker 14078 timed out after 180.02s
Informational Message[12-28-2018 02:42:52] wproc: Core Worker 14078: job 137492 (pid=45830) timed out. Killing it
chris1337c
Posts: 75
Joined: Wed Dec 26, 2018 2:31 pm

Re: check_snmp_synology - False Positives

Post by chris1337c »

I will stop the TCPDump on Monday and compare to the logs, I will probably require some assistance to find the root cause. I am currently checking all of the backup schedules to try and space them out, I wonder if the Synology just cannot handle the backup/replication load and causes the snmp timeouts due to being overloaded with backups? Or is this not possible the way Snmp works.
Locked