Page 1 of 6

check_snmp_synology - False Positives

Posted: Wed Dec 26, 2018 2:42 pm
by chris1337c
Hi there,

I recently started a new position with a company and am just getting started with NAGIOS. Essentially the issue I am having is that our Synology diskstation at our colo constantly throws false positives that have the error:

[12-26-2018 ********] SERVICE ALERT: DC_*****;Global Health Status;CRITICAL;SOFT;1;(Service check timed out after 180.04 seconds)

I looked at the Synology box and there are no settings to be adjusted it is using SNMPv1, SNMPv2c service

When logging in with putty and inputting: ./check_snmp_synology -2 public -h ***********.1 -v

I get this output on average after about 30 seconds:

Synology model: "RS2414rp+"
Synology s/n: "*******************"
DSM Version: "DSM 6.0-8754"
DSM update: Unavailable
System Status: Normal
Temperature: 39 (Normal)
Power Status: Normal
System Fan Status: Normal
CPU Fan Status: Normal
Number of disks: 12
"Disk 1" (model: "ST4000VN003-1T5168 ") status:Normal temperature:26
"Disk 2" (model: "ST4000VN003-1T5168 ") status:Normal temperature:26
"Disk 3" (model: "ST4000VN003-1T5168 ") status:Normal temperature:25
"Disk 4" (model: "ST4000VN003-1T5168 ") status:Normal temperature:25
"Disk 5" (model: "ST4000VN003-1T5168 ") status:Normal temperature:26
"Disk 6" (model: "ST4000VN003-1T5168 ") status:Normal temperature:26
"Disk 7" (model: "ST4000VN003-1T5168 ") status:Normal temperature:26
"Disk 8" (model: "ST4000VN008-2DR166 ") status:Normal temperature:24
"Disk 9" (model: "ST4000VN003-1T5168 ") status:Normal temperature:26
"Disk 10" (model: "ST4000VN003-1T5168 ") status:Normal temperature:26
"Disk 11" (model: "ST4000VN003-1T5168 ") status:Normal temperature:26
"Disk 12" (model: "ST4000VN003-1T5168 ") status:Normal temperature:25
Number of RAID volume: 2
"HYPERV-LUN-1" status:Normal
"Volume 1" status:Normal 79% used

Someone suggested I use SNMP Tools to find out where it is timing out etc. I just am not sure where to start with this, has anyone encountered this? Should I start simple with a reboot of the Synology box? Is there any locations of logs that would show a more advanced readout of what is happening? I would love to troubleshoot this I just do not know enough on where to begin. Appreciate any and all help.

Chris

Re: check_snmp_synology - False Positives

Posted: Wed Dec 26, 2018 5:39 pm
by cdienger
A reboot of the Synology box may help. 30 seconds seems like a long time for a response and a reboot may clear some things up and get faster response times.

What is the logging like on the Synology machine? Does it show snmp requests?

If it is constantly timing out then I would run a tcpdump on the Nagios machine and let the dump run long enough to capture a timeout alert. This can be done with:

yum -y install tcpdump (assuming a CentOS install)
tcpdump -s 0 -i any port 161 and host a.b.c.d -w output.pcap

Use CTRL+C to stop the tcpdump after getting a timeout alert. The output.pcap can be viewed with wireshark and would show to communication between the Nagios and Synology machine.

Re: check_snmp_synology - False Positives

Posted: Wed Dec 26, 2018 6:55 pm
by chris1337c
I rebooted the Synology Box and ran another time command:

OK - Synology "RS2414rp+" (s/n: "******************", "DSM 6.0-8754") is in good health

real 0m49.916s
user 0m0.579s
sys 0m0.838s

Now the request is showing 49 seconds (I will attribute the gain to the box running checks on itself). Tomorrow when I have time I will run the TCDump and see what I can find and post the results, thank you for the insight!

Chris

Re: check_snmp_synology - False Positives

Posted: Thu Dec 27, 2018 8:59 am
by chris1337c
OK - Synology "RS2414rp+" (s/n: "**************, "DSM 6.0-8754") is in good health

real 0m23.054s
user 0m0.597s
sys 0m0.832s

Still seems awfully long.

Re: check_snmp_synology - False Positives

Posted: Thu Dec 27, 2018 10:39 am
by chris1337c
cdienger wrote:A reboot of the Synology box may help. 30 seconds seems like a long time for a response and a reboot may clear some things up and get faster response times.

What is the logging like on the Synology machine? Does it show snmp requests?

If it is constantly timing out then I would run a tcpdump on the Nagios machine and let the dump run long enough to capture a timeout alert. This can be done with:

yum -y install tcpdump (assuming a CentOS install)
tcpdump -s 0 -i any port 161 and host a.b.c.d -w output.pcap

Use CTRL+C to stop the tcpdump after getting a timeout alert. The output.pcap can be viewed with wireshark and would show to communication between the Nagios and Synology machine.
How long should I let this run, are there any implications on hard drive space on the NAGIOS for this or will this generate a fairly small file?

I am trying this now.

Re: check_snmp_synology - False Positives

Posted: Thu Dec 27, 2018 11:47 am
by cdienger
Just long enough to see a timeout message like this in the logs:

[12-26-2018 ********] SERVICE ALERT: DC_*****;Global Health Status;CRITICAL;SOFT;1;(Service check timed out after 180.04 seconds)

It should be a small file and I wouldn't be too concerned with it growing large. Feel free to stop and restart it though if the problem doesn't occur. You can also set up a rotating capture:

nohup tcpdump -Z root -s 0 -i any port 161 and host a.b.c.d -C 10 -W 5 -w output.pcap &

The above will start tcpdump and run it in the background - only storing the last 50 megs of data captured in 5 10 meg files(output.pcap0, output.pcap1, etc...). To stop the trace:

pkill tcpdump

Re: check_snmp_synology - False Positives

Posted: Thu Dec 27, 2018 12:59 pm
by chris1337c
Logging on the synology is useless and doesn't show SNMP, I am putting out another fire then will give this a shot. Thank you much

Re: check_snmp_synology - False Positives

Posted: Thu Dec 27, 2018 2:33 pm
by cdienger
No problem. Keep us posted :)

Re: check_snmp_synology - False Positives

Posted: Thu Dec 27, 2018 5:27 pm
by chris1337c
Okay I ran the command from your first post, currently it says tcpdump: listening on any, link-type linux_sll, capture size 262144 bytes, not sure how long to let this run. I know you mentioned it will say timeout, will I see this in putty or in the output log?

I suspect I will have to also use the scheduled tcpdump to capture the timeouts as they happen consistently later at night (I am assuming this is related to the load potentially when backups are replicating, but in this line of work I want to assume nothing).

Re: check_snmp_synology - False Positives

Posted: Thu Dec 27, 2018 5:34 pm
by chris1337c
Any idea where the default Output.pcap location is I am searching in WINSCP currently for it.