Page 1 of 1

Configure a delay to notif. until state is bad for x time

Posted: Wed Oct 03, 2018 5:01 pm
by alsoszaa
Here is what I am trying to do (I must preface this question by saying that I am an absolute Nagios beginner)...

I am trying to quiet alerts from defaults. In this process, I would like to configure notifications for certain services to not notify unless they have been in a warning/critical state for more than 10 min.

I get CPU utilization and memory threshold alerts constantly. I realize I need to tune them still, but while tuning them, I want to only receive emails when the CPU is at 90% for at least 5 min. Currently the emails are sent every time the CPU hits 90% and I am bogged down with alerts. some servers just have one spike in a day of 90% or more and its only for 10 seconds or less. I don't want to receive these alerts in email. I only care if its at 90% for 5 min or more.

How can I accomplish this?

Much appreciated in advance!

Re: Configure a delay to notif. until state is bad for x tim

Posted: Thu Oct 04, 2018 11:39 am
by jforcier
You can accomplish this by increasing the max check attempts.

Go to Configure > Core Config Manager > Services

Then open the relevant service and click on the Check Settings tab.

These are the relevant setting:

Check Interval: The number of minutes between regularly scheduled checks of the host/service.

Retry Interval: The number of minutes of minutes to wait before scheduling a re-check of the hosts. Hosts/services are rescheduled at the retry interval when they have changed to a non-UP/non-OKAY state.

Max Check Attempts: The number of times that Nagios will retry the host/service check command if it returns any state other than an OK state. Setting this value to 1 will cause Nagios to generate an alert without retrying the host check.



So for your specific example you could set the check interval to 5, the retry interval to 1, and the max check attempts to 10.

Re: Configure a delay to notif. until state is bad for x tim

Posted: Thu Oct 04, 2018 4:15 pm
by alsoszaa
I just set this on a service then tested it by placing it in alert. It sent me the initial "Down" alert immediately on the initial check. This is what I don't want. I want to ignore all alerts (do not send emails) initially until they have been a problem for x time. So unfortunately, this solution wasn't what I needed, but I really appreciate the info provided. maybe I can use that elsewhere here.

You can achieve this in SCOM, but I want to achieve this in Nagios:

http://skaraaslan.blogspot.com/2012/03/ ... ption.html

Re: Configure a delay to notif. until state is bad for x tim

Posted: Fri Oct 05, 2018 10:16 am
by scottwilkerson
If you go to the CCM -> Services
Edit this service
Click Alert Settings Tab
Enter number of minutes you want it to be down before notifications are sent in the "First notification delay" field
Save
Apply Configuration

Re: Configure a delay to notif. until state is bad for x tim

Posted: Wed Oct 10, 2018 12:12 pm
by alsoszaa
I believe this is what I need. This is kind of a strange time based work around. I was looking for a check based delay. I can still work with it. We can close this post as resolved.

Here is what I was looking for:
With a normal check period of 5 min (default), and a retry of 1 min (default):
Nagios does an initial 5 min check and sees a service is in alert. Nagios logs are flagged that the service is in alert. when you look in the web based GUI, you see the service is in a "non-UP" state. BUT, the notification is NOT sent to users via email.
The retry check occurs after 1 minute to see if the state is still in "non-UP" or if it is in an OK state. the service is found to still be in alert, but no notification is sent out still.
The second retry atempt checks and sees the service is now in OK state. There is no need to notify since the service has corrected itself.

2 days later, same scenerio, 5 min check, service is non-up, no notification. 1st retry after a min, still non-up. 2nd retry, still non-up. This time, there is a notification, becasue its still in non-up after the 2nd retry.

I could set the delay first notification to 2 min in this case, but in my opinion, it would be better to have the first notification delay to be based on the number of retries. That's what I was trying to do.

Re: Configure a delay to notif. until state is bad for x tim

Posted: Wed Oct 10, 2018 12:14 pm
by scottwilkerson
alsoszaa wrote:I believe this is what I need. This is kind of a strange time based work around. I was looking for a check based delay. I can still work with it. We can close this post as resolved.

Here is what I was looking for:
With a normal check period of 5 min (default), and a retry of 1 min (default):
Nagios does an initial 5 min check and sees a service is in alert. Nagios logs are flagged that the service is in alert. when you look in the web based GUI, you see the service is in a "non-UP" state. BUT, the notification is NOT sent to users via email.
The retry check occurs after 1 minute to see if the state is still in "non-UP" or if it is in an OK state. the service is found to still be in alert, but no notification is sent out still.
The second retry atempt checks and sees the service is now in OK state. There is no need to notify since the service has corrected itself.

2 days later, same scenerio, 5 min check, service is non-up, no notification. 1st retry after a min, still non-up. 2nd retry, still non-up. This time, there is a notification, becasue its still in non-up after the 2nd retry.

I could set the delay first notification to 2 min in this case, but in my opinion, it would be better to have the first notification delay to be based on the number of retries. That's what I was trying to do.
Marking resolved