Uptime Alarming

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
acentek
Posts: 123
Joined: Thu Jul 27, 2017 2:00 pm

Uptime Alarming

Post by acentek »

Hello all.

We're currently looking to implement uptime alarming on a number of different devices. As mentioned in numerous other posts, the problem with the SysUpTime OID is that it loops after ~495 days. We are a telco, so a lot of our networking gear isn't normally rebooted/upgraded in that low of a time frame.

I've done an OID search, and a lot of our devices don't implement the snmpEngineTime (.1.3.6.1.6.3.10.2.1.3) OID, which is used as a SysUpTime alternative that doesn't loop for many years. I'm wondering what other people do as a solution for something like this? Wondering if maybe anyone implements the SysUpTime OID in a script with some sort of logic that says if my previous value was near the max, then I won't alarm. I'm not much of a linux guy, so don't really have the know how to build something like this from scratch.

We'd like to avoid some type of telnet/ssh script that logs into each device to check uptime, as that could get fairly intensive pretty quick. Our previously alarming platform would buckle under the pressure from this.

Thoughts? Thanks!
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Uptime Alarming

Post by scottwilkerson »

Unfortunately, you cannot use the SNMP value directly as you note they roll over at 497.1 days.

If installing an agent on the systems is not an option, the only thing left to do would be to write some sort of wrapper plugin that does some processing of previous values and checks if they were close to the timetick limit value of 4294967296 and if so, assume that it wasn't a reboot.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked