[Nagios-devel] Dependencies in redundant networks and services: a

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

[Nagios-devel] Dependencies in redundant networks and services: a

Post by Guest »


Searching back through the archives it seems that the issue of handing =
service and host dependencies on redundant services or hosts comes up =
from time to time (actually, far less often than I would have expected) =
and nobody seems to have a really good solution to the problem.=20

Imagine a web service (call it W) which depends on two separate =
databases (call them database A and database B), where both databases =
have redundant backups and the web service can contact either the =
primary or backup for each database and still do its job (A1, A2, B1, =
B2). Doing this without a more flexible dependency system requires =
either some very complicated combinatorial setup where we have W1 =
dependent on A1,B1, W2 dependent on A1,B2, W3 on A2,B1, etc. or one =
very complicated custom check script which implements the dependencies =
itself.

I've been thinking about this a fair bit over the last couple of weeks =
since I manage a network and suite of services where nearly everything =
is redundant, and almost no single outage of any component results in an =
'unreachable' state for any other component. I'd very much like to =
avoid having to run all kinds of duplicate checks and train the rest of =
my staff to ignore alerts unless they arrive in pairs.

I think I've hit upon an idea, but it's a fairly significant change to =
the way service and host dependencies work today, and so I don't think =
it's reasonable to pursue it any earlier than Nagios 4.0, but I'd like =
to get some feedback to see if others think this might be the right way =
to go (and I'm hoping I don't get too many TL;DRs).

In a nutshell, my idea is to separate the definition of the master =
service/host from the association to it by the dependent service/host, =
and make the association by reference from the dependent service or host =
definition... much the same way as a service is associated to a host by =
reference. =20

There are two big wins from doing this:
1) If the dependency is created by reference from the service or host =
definition, that opens the door to using a boolean syntax in that =
reference, allowing both simple *and* complex dependencies.
2) Moving the dependency association into the service or host definition =
also allows the association to be applied to services or hosts by =
servicegroup/hostgroup which simplifies configuration file authoring.

Here's one example where using a hostgroup for the master service (or a =
list of hosts) contains the implicit assumption that all of the services =
referenced in a single servicedependency definition are redundancies of =
each other. I don't like doing anything by implication, but this =
provides a match to the current implication that all master services =
referenced by a dependent are not redundancies of each other, and keeps =
the configuration very simple.


define service {
host_name web-host
service_description Web Service W
dependencies db-a-dependency,db-b-dependency
}

define hostgroup {
hostgroup_name database-hosts
members db-host-1,db-host-2
}

define service {
hostgroup_name database-hosts
service_desription Database A
}

define service {
hostgroup_name database-hosts
service_desription Database B
}

define servicedependency {
servicedependency_name db-a-dependency
hostgroup_name database-hosts
service_description Database A
notification_failure_criteria w,u,c,p
dependency_period 24x7
}

define servicedependency {
servicedependency_name db-b-dependency
hostgroup_name database-hosts
service_description Database B
notification_failure_criteria w,u,c,p
dependency_period 24x7
}

Since the implication by using a hostgroup_name or a list of hosts in =
the servicedependency definition is that the referenced services are =
redundant, the servicedependency doesn't 'fail' until all of the =
referenced services meet *any* of the notifcation_failure_criteria (e.g. =
one being w, and another being u means the servicedependency f

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked