Re: [Nagios-devel] Why separate hosts and services

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] Why separate hosts and services

Post by Guest »

Chris Wilson wrote:
> Hi Andreas,
>
>
>>I can think of at least two good reasons.
>>
>>1) Problem localisation. When a service fails, someone has to fix it. If
>>they don't know what machine it's on the purpose of a monitoring system
>>is soundly defeated.
>>
>>Ofcourse, you could type in the host_address and host_alias in every
>>service-description, but keeping things the way they are really saves a
>>lot of typing compared to that.
>
>
> OK, that's a good point, but it could also be handled by inheriting
> hostname from service to dependent service, unless overridden by the
> dependent service.
>
Not a very good idea, since many servicedependancies have relations
between several hosts (switch interface operability connects to db
loadbalancer connects to database servers).

> Another way would be to report the "path" through the "service tree" to
> the failed service in the notification message. This might actually help
> fault diagnosis. For example, if you receive separate notifications that 4
> machines behind the same router have gone down at the same time, then you
> might assume that the router might be at fault.
>
Great idea. By simply adding the macro $PARENTS$, this can easily be
accomplished, while not modifying any core logic.

> At the moment, with the current notification architecture, I don't think
> you can have enough information to do that, without looking at the status
> CGIs or knowing from memory that the hosts are all behind the same router
> (which doesn't scale well :-)
>
In larger networks there are usually different people handling different
parts of it, and with a proper naming-standard (with a little help from
the 'alias' variable in the host object definition), this has never been
a problem for any of our customers. Some of them have really huge networks.

>
>>2) Notification suppression. If a service fails, nagios immediately
>>checks if the host is down. If it is, no more service checks will be
>>scheduled until the host pops back up.
>
>
> But we already do the same thing for dependent services, don't we? I don't
> understand why the logic is different, and why they can't be combined into
> a single, simple if-down-then-check-parent-service algorithm.
>
Check out the 'parents' variable in host object definition.

>
>>Check out (host- and service-) dependancies. It's all properly documented.
>
>
> To my mind, service dependency is not the same as meta-services (which is
> what I'm talking about).
>
> For example, let's assume we have three services, A, B and C. A is a
> meta-service, and B and C "depend" on it. A does not have any check of its
> own; its state is entirely determined from the states of its dependent
> services. If B and C both fail, then A is determined to have failed, and
> not otherwise.
>
This can be done today, using service dependancies.

> This is not the same as B and C both depending on A, because if B and C
> both fail, then how does one make A fail automatically in Nagios? I don't
> think it's possible, do you?

Yes. What you're talking about is modifications to the core logic.
Having plugins checking this would be 'the long way around'.

> I guess it might involve writing a plugin to
> check the status of all children, and I don't know if Nagios would update
> the status.sav quickly enough that we would be able to determine this
> reliably in the parent check. Do you know if it does?
>
status.sav is the default state retention file, so we can't even count
on it being there. status.log gets updated about 1 second after a state
changes and should be more interesting for something like this.

> Besides which, we would have to parse both the configuration files and
> status.sav to determine this, and neither of those is easy to do.

Not a problem, really. Especially considering the fact that all the code
to do both is right under the nose of anybody who cares to download the
sources.

>
> Cheers, Chris.

--
Mvh / Best Regards
Sourcerer / Andreas Ericsson
OP5 AB
+46 (0)733 709032
[email protected]





This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked