Re: [Nagios-devel] Why distinguish hosts from services?

Guest · Post by **Guest** » Sat Aug 09, 2008 6:22 am

* Andreas Ericsson [2008-08-09 14:35]:
> Holger Weiss wrote:
> > We use separate host definitions for separate interfaces (so for us, the
> > "host" keyword should really be named "interface" ;-]). For each host,
> > there's a "primary" interface which all other interfaces depend on using
> > host dependencies. Now, for example, if we upgrade a system, we'd like
> > to just specify a downtime for the primary interface to make sure that
> > no host or service notifications will be generated whatsoever. If we
> > just reboot the host, things work as expected. But during an upgrade,
> > some services will usually go into a hard problem state while the system
> > is still UP. In this case, only the notifications for the services
> > running on the primary interface will be suppressed, because Nagios does
> > suppress service notifications if the host the service runs on is in a
> > downtime, but not if only a host this host depends on is in a downtime.
> >
> > Similar problems can occur with parents: if a parent is in a downtime,
> > but the parent's host check returns an UP because the parent still pings
> > although it stopped routing already, notifications for the child(s)
> > won't be suppressed. Or for service dependencies (though maybe less
> > likely): if the dependent-upon service is in a downtime and the
> > dependent service is stopped before the dependent-upon service is
> > stopped, notifications for the dependent service won't be suppressed.
> >
> > Apart from that, it would be nice if objects which directly or
> > indirectly depend on an object which is in a downtime would also have
> > some "downtime" status flag set, so that tools such as the web interface
> > could easily mark them as such. But that's just cosmetic.
> >
> > To fix such problems once and forever, I'd have to implement various
> > logics at different places in the code: (1) don't notify on a host if a
> > directly or indirectly dependent-upon host is in a downtime; (2) don't
> > notify on the services running on this host; (3) don't notify on a
> > service if a directly or indirectly dependent-upon service is in a
> > downtime; (4) don't notify on a host if a direct or indirect parent is
> > in a downtime (with redundant paths accounted for); (5) maybe don't
> > notify on the services running on this host, either, just to make sure.
> > My dream is that with generic object types and dependencies, I could
> > implement a recursive check for downtimes of dependent-upon objects at a
> > single place in the code and be done with it, which would be much
> > simpler and less error-prone.
>
> A much simpler way of doing it is to set the "notification_options" field
> in the host and service-objects to flags (well, everything that could be
> flags should be flags, really), then it becomes a matter of doing bitfield
> comparisons to see if a notification should be suppressed or not,
> regardless of which type of object it is.

If it were done this way, I'd still have to implement the various checks
I mentioned in order to set the "dependent-upon object is in a downtime"
flag. So, while your suggestion would save some memory and allow for
using generic macros to compare the current state of an object with the
configured notification_options, it wouldn't really solve my problem.

> One trouble is that to make this generic regardless of which type of object
> you're checking it against means both hosts and services would need to
> understand the same sort of check results

Yes, I just fail to see the trouble.

> as well as the same kind of notification options and everything that
> gets affected by such things

Same here.

> the data structs for both types of objects would need to be identical, which
> would waste memory on a O(n) scale, rather than the fixed-price overhead of
> almost duplicating some of the code.
>
> Now consider this instead:
> if ((host->notification_options & contact->notification_options) & (1 status))
> send_notification;
>
> And then think you've got a macro for it, which goes like this:
> #define sho

...[email truncated]...

This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]