Reducing notifications
-
amprantino
- Posts: 140
- Joined: Thu Apr 18, 2013 8:25 am
- Location: libexec
Reducing notifications
Hello all,
I am trying to reduce the number of notifications admins are receiving.
My problem is this:
Assume that a host has about 10 services check.
When a host is down, I get one critical notification for host down & one notification for each service in critical state.
How can I configure nagios, so:
when a service is down to automatically check host state.
If host state is down (ping), then only one notification is sent for the host and not for all services.
When a service comes online, again Nagios should check host state.
If host state is up, then should send a notification only for host recovery/up state.
Later, when all services are re-checked if any service remain in critical state, a notification should be sent
Any idea how can I achieve the above?
Thank you
I am trying to reduce the number of notifications admins are receiving.
My problem is this:
Assume that a host has about 10 services check.
When a host is down, I get one critical notification for host down & one notification for each service in critical state.
How can I configure nagios, so:
when a service is down to automatically check host state.
If host state is down (ping), then only one notification is sent for the host and not for all services.
When a service comes online, again Nagios should check host state.
If host state is up, then should send a notification only for host recovery/up state.
Later, when all services are re-checked if any service remain in critical state, a notification should be sent
Any idea how can I achieve the above?
Thank you
Re: Reducing notifications
If configured correctly, Nagios will only check the host if a service on the host comes back as non-OK. If the host responds non-OK, then all services on that host will be ignored because Nagios assumes that the host is down. So it sounds like your Nagios is not configured properly.
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
-
chris.fixter
- Posts: 22
- Joined: Wed Jun 18, 2014 4:15 am
Re: Reducing notifications
hi Eloyd,
Could you point out how this is done or where I could read further information about this behavior ? In my experience this isn't what my nagios' doing.
Could you point out how this is done or where I could read further information about this behavior ? In my experience this isn't what my nagios' doing.
Re: Reducing notifications
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
-
chris.fixter
- Posts: 22
- Joined: Wed Jun 18, 2014 4:15 am
Re: Reducing notifications
Sorry for my ignorance. I can't find where says host down status would suppress service notification or service checking. I have been looking for such solution for a while because we monitor a lot of sites over Internet and we have frequent false service alert due to internet interruption.
As workaround now, I have to use a distributed approach for each site, where each site has its own nagios running active checks, and reports to a central nagios who accepts passive checks.
As workaround now, I have to use a distributed approach for each site, where each site has its own nagios running active checks, and reports to a central nagios who accepts passive checks.
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: Reducing notifications
It all depends on your host and service directives for check_interval, max_check_attempts and retry_interval. For example:
Host
check_interval = 5
max_check_attempts = 3
retry_interval = 2
Service(s)
check_interval = 2
max_check_attempts = 3
retry_interval = 1
1:10pm - Host is checked and detected as UP, next check is 1.15pm
1.11pm - Host goes down, nagios does not know about it yet
1.12pm - Service check fails, retry interval is 1 so next attempt is 1.13pm (soft state)
1.13pm - Service check retry fails, retry interval is 1 so next attempt is 1.14pm (soft state)
1.14pm - Service check fails, max_check_attempts reached so alert is sent (hard state)
1.15pm - Host check fails, retry interval is 2 so next attempt is 1.17pm (soft state)
more service checks happening / retrying / alerting
1.17pm - Host check fails, retry interval is 2 so next attempt is 1.19pm (soft state)
more service checks happening / retrying / alerting
1.19pm - Host check fails, max_check_attempts reached so alert is sent (hard state)
No more service alerts will be sent until the host recovers
Basically, service notifications will continue to be sent until it's host goes into a hard state.
Does this help?
This link has a lot of explanations on how notifications work:
http://nagios.sourceforge.net/docs/3_0/ ... tions.html
And some information about hard and soft states:
http://nagios.sourceforge.net/docs/3_0/statetypes.html
Host
check_interval = 5
max_check_attempts = 3
retry_interval = 2
Service(s)
check_interval = 2
max_check_attempts = 3
retry_interval = 1
1:10pm - Host is checked and detected as UP, next check is 1.15pm
1.11pm - Host goes down, nagios does not know about it yet
1.12pm - Service check fails, retry interval is 1 so next attempt is 1.13pm (soft state)
1.13pm - Service check retry fails, retry interval is 1 so next attempt is 1.14pm (soft state)
1.14pm - Service check fails, max_check_attempts reached so alert is sent (hard state)
1.15pm - Host check fails, retry interval is 2 so next attempt is 1.17pm (soft state)
more service checks happening / retrying / alerting
1.17pm - Host check fails, retry interval is 2 so next attempt is 1.19pm (soft state)
more service checks happening / retrying / alerting
1.19pm - Host check fails, max_check_attempts reached so alert is sent (hard state)
No more service alerts will be sent until the host recovers
Basically, service notifications will continue to be sent until it's host goes into a hard state.
Does this help?
This link has a lot of explanations on how notifications work:
http://nagios.sourceforge.net/docs/3_0/ ... tions.html
And some information about hard and soft states:
http://nagios.sourceforge.net/docs/3_0/statetypes.html
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
-
chris.fixter
- Posts: 22
- Joined: Wed Jun 18, 2014 4:15 am
Re: Reducing notifications
Interesting. Thanks for the explanation.
-
amprantino
- Posts: 140
- Joined: Thu Apr 18, 2013 8:25 am
- Location: libexec
Re: Reducing notifications
This is exactly the problem I have!
What I want to avoid is the service notification at 1.14pm
When a service is down (soft state), I would like to force a host check.
If host is down, then send only notification for host and not for services.
One solution is a host state to come in hard non-OK state sooner than a service non-OK but this isnt the case always.
For example, a search might be critical and be checked every 1 minutes and host every 5 minutes....
What I want to avoid is the service notification at 1.14pm
When a service is down (soft state), I would like to force a host check.
If host is down, then send only notification for host and not for services.
One solution is a host state to come in hard non-OK state sooner than a service non-OK but this isnt the case always.
For example, a search might be critical and be checked every 1 minutes and host every 5 minutes....
Re: Reducing notifications
You could change your max check attempts to be one which would effectively make zero soft states and go instantly to a hard state. No intermediate checks done at that point.
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
Re: Reducing notifications
Former Nagios employee