Server issues when multiple hosts were down

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Server issues when multiple hosts were down

Post by BanditBBS »

To all those thinking about using this patch, here is a "gotcha"(and a work around) that you need to think about.

Setup: a list of hosts/services are set as dependent on a ping check service of a host so no checks and/or notifications are performed if that ping check failed.
Problem: I left the check_icmp on the host as well when I added the ping service. Well, with this patch(feature) that would mean the ping service would never go critical because once the host icmp check failed the service would not be checked.
Fix: Change the host check to check_dummy and make it always be green/up
Real cause: I was lazy when I added the ping service check for the dependency, lol

Just wanted to point this issue and resolution out to anyone that is going to use this and happens to use dependencies as well.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Server issues when multiple hosts were down

Post by tmcdonald »

For what it's worth, I made a pull request to Nagios Core proper, so here's hoping:

https://github.com/NagiosEnterprises/nagioscore/pull/44

Update:

From the pull request:
tmcnag wrote:
I am considering extending the behavior, but I want to be careful with this. I am thinking of setting the option like so:

host_down_set_service_state=u/w/c/o/n

One option would be set. Selecting "n" (null, nothing) would be the same behavior that I am requesting be pulled - the service state and output remains the same, the check is not performed but is rescheduled.

Selecting one of u/w/c/o would set the state to UNKNOWN / WARNING / CRITICAL / OK and set the output to "Host is down!" or something. The output would in all likelihood not be user-configurable in nagios.cfg. This is cool and all, but has the downside of not reducing the load as much as just not running the check (and reducing the load was the whole reason for this patch in the first place).

Thoughts?
Anyone have any opinion on this or how it might be better implemented?
Former Nagios employee
User avatar
snapon_admin
Posts: 952
Joined: Mon Jun 10, 2013 10:39 am
Location: Kenosha, WI
Contact:

Re: Server issues when multiple hosts were down

Post by snapon_admin »

Having the option to set what the state would be would be helpful I suppose, but I agree that it would sort of defeat the purpose of reducing the load since, well, it wouldn't. If it were implemented with those options, I can say that I would be leaving it set to "do nothing" so that the service checks stay the same and just stop running until the host is back up.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Server issues when multiple hosts were down

Post by BanditBBS »

If it's a global thing, the current setting is all I'd use. I'd never want to set them all to some other state other than that last checked state returned. I guess having the option would be good, but I'm with snapon, I would be using the "n" setting.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Server issues when multiple hosts were down

Post by tmcdonald »

I only bring it up because it was suggested for reporting purposes that the services have the option to go down with the host. If you run a report and all your hosts are down but all your services are up, that doesn't really line up.
Former Nagios employee
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Server issues when multiple hosts were down

Post by BanditBBS »

Yes, but in my view host being down outweighs service being down...and if both show as down, thats sort of like a reporting double-whammy when really it was one issue causing all the others. So, yes, some admins may choose to do so...but not me :)
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
User avatar
snapon_admin
Posts: 952
Joined: Mon Jun 10, 2013 10:39 am
Location: Kenosha, WI
Contact:

Re: Server issues when multiple hosts were down

Post by snapon_admin »

Yep, agreed with Bandit on this. We would be more concerned with the host up/down time when looking at reports. I do see your point though, and still think having that option would make sense (more options are better than less options). As long as there's still an option for nothing, I'm happy with that.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Server issues when multiple hosts were down

Post by tmcdonald »

Alright, I will work on getting that in there. Not a top priority right now, and it will be a little more involved, but it's on my list.

And thanks for all the feedback!
Former Nagios employee
Fred Kroeger
Posts: 588
Joined: Wed Oct 19, 2011 11:36 pm
Location: Perth, Western Australia
Contact:

Re: Server issues when multiple hosts were down

Post by Fred Kroeger »

Ummm... It appears that I don't follow instructions very well.....
I installed the patch and it all worked as I reported - Service Checks stop (retry count doesn't advance) when the host is down and they keep getting rescheduled.
However , I missed the step to add the host_down.... line to the nagios.cfg file.
So my installation is doing what I expected without the flag in the nagios.cfg file.
Can anyone confirm if this happens for them if they actually remove the line from the nagios.cfg file?

Fred
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Server issues when multiple hosts were down

Post by BanditBBS »

Fred - I can't confirm this. Mine didn't skip service checks until I added the config variable.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Locked