5 minute power failure support

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
richie7tech
Posts: 14
Joined: Mon Sep 28, 2020 7:29 am

5 minute power failure support

Post by richie7tech »

Hi Everyone

i am a noob with Nagios and i am struggling to write a trap or service for the following scenario.

i am monitoring remote sites ping checks every minute and passive traps. (this poll rate is a requirement so cannot be changed)

The site will send a power alert passive trap (below) if AC power fails.
snmptrap -v 2c -c public 192.168.2.xxx '' netSnmpExampleHeartbeatNotification netSnmpExampleHeartbeatName s "SITE_POWER_CRITICAL."
and the site then has a 5 minute battery backup , if power is restored then the following trap is sent.
snmptrap -v 2c -c public 192.168.2.xxx '' netSnmpExampleHeartbeatNotification netSnmpExampleHeartbeatName s "SITE_POWER_OK."

what i then need is to have a service that would flag on alert if the above trap has not be reset after a 5 minute period

or potential other way ?

if the ping check, (every 1 minute) when it goes CRITICAL. again if this has not reset after 5 minutes to OK then raise an service alert.

or if you know of any other way .

ie
Looking at the Service Status screen if the Duration of a Power CRITICAL Alarm is more than 5 minutes i want to raise a permanent power failure alarm


any help would be greatly appreciated.

regards
Richie
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: 5 minute power failure support

Post by ssax »

For the power trap reset thing, do you have all traps going to a SNMP Traps service on that host or do you have a separate service just for those power traps?

EDIT: I just saw your other post that shows you have separate services (which helps in this case).

The only way that I can think to do this would be if you to write an event handler on your service that would check the current state and the duration and if the problem state duration is longer than 5 minutes it would submit a passive result (or a notification) to send another notification:

https://assets.nagios.com/downloads/nag ... ios-XI.pdf
https://assets.nagios.com/downloads/nag ... dlers.html
https://assets.nagios.com/downloads/nag ... olist.html
https://assets.nagios.com/downloads/nag ... and_id=114
https://assets.nagios.com/downloads/nag ... and_id=135

For the ping check you would set it like this:

check_interval: 1
retry_interval: 1
max_check_attempts: 5

That way it will only alert if it's been down for 5 minutes. (1 minute check interval with 5 attempts every 1 minute equals 5 minutes and would do what you want).

You could technically set the first_notification_delay on the service to 5 minutes as an alternative.
richie7tech
Posts: 14
Joined: Mon Sep 28, 2020 7:29 am

Re: 5 minute power failure support

Post by richie7tech »

Thanks ssax for the quick response.

The event handler option is the most logical fit to my scenario i believe and i'll look through the docs you supplied with the view to writing a event handler.
where on the POWER service if the CRITICAL state exceeds 5 minutes then it will submit a new passive trap for "irrecoverable power loss"

---
---

The PING option i dont think will work as i have and added problem we are using the SNMP sender to send SNMP traps back to the VMS to raise an ALARM within the VMS display.
the lines below which you suggested , i find that the ping service turns CRITICAL on the first minute , at which point the SNMP sender forwards the TRAP / ALERT to the VMS
( I believe the lines below, is only relevant for notifications via email)
====
check_interval: 1
retry_interval: 1
max_check_attempts: 5
====

again
Thanks for the docs reference the event handler which definately seems to be the way forward.

Cheers
Richie
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: 5 minute power failure support

Post by ssax »

Are those ping ones passive or active checks? Do you have is_volatile set on it? (I'm thinking you may have is_volatile set on there which would impact it).

Please PM me a copy of your profile.zip, you can download it from Admin > System Profile by clicking the Download Profile button.

PM me the host/service name in question for the ping check so I can see how its setup.

Thank you!
Locked