I'll try to get you on the same page without throwing out too much of our internal private network. And let me preface this with saying that this is the first time since 2004 that we have not been able to get Nagios Core to do what we think it should do.
Hosts A, B, and C are in a hostgroup called "private." Nagios host is X. Service checks 1, 2, and 3 are in a service group called NRPE, and are checked from X->A, X->B, and X->C via NRPE. In real life, add some zeros to the number of checks and hosts, but the idea is the same. Now, X also checks A, B, and C via a direct check (via check_nrpe with no arguments) to ensure that NRPE is running. This check is called NRPE-service.
All the NRPE service checks on A need to be dependent the NRPE-service check being OK. Same for all the NRPE services on B (need to make sure that NRPE-service on B is OK) and same for C. According to the documentation I quoted, we should be able to do something like this:
Code: Select all
define hostgroup{
hostgroup_name private
alias Private Hosts
}
define host{
use private-host-template
host_name A
hostgroups private
}
<REPEAT FOR B AND C>
define servicegroup{
servicegroup_name NRPE
alias Services dependent upon NRPE
}
define service{
use nrpe-service
service_description SERVICE CHECK 1
hostgroups private
servicegroups NRPE
check_command check_nrpe!<command>!params|etc
}
<REPEAT FOR SERVICE CHECK 2 AND 3>
define servicedependency{
hostgroup_name private
service_description NRPE-service
dependent_servicegroup_name NRPE
inherits_parent 1
execution_failure_criteria u,c
}
And if I parse the documentation correctly, we should end up with dependencies like:
Code: Select all
S1 on A dependent upon NRPE-service on A being OK
S2 on A dependent upon NRPE-service on A being OK
S3 on A dependent upon NRPE-service on A being OK
S1 on B dependent upon NRPE-service on B being OK
S2 on B dependent upon NRPE-service on B being OK
S3 on B dependent upon NRPE-service on B being OK
S1 on C dependent upon NRPE-service on C being OK
S2 on C dependent upon NRPE-service on C being OK
S3 on C dependent upon NRPE-service on C being OK
What we get is:
Code: Select all
S1 on A dependent upon NRPE-service on C being OK
S2 on A dependent upon NRPE-service on C being OK
S3 on A dependent upon NRPE-service on C being OK
S1 on B dependent upon NRPE-service on C being OK
S2 on B dependent upon NRPE-service on C being OK
S3 on B dependent upon NRPE-service on C being OK
S1 on C dependent upon NRPE-service on C being OK
S2 on C dependent upon NRPE-service on C being OK
S3 on C dependent upon NRPE-service on C being OK
Note that host C was the last host defined. If we switch the order of the "define host" config snippets so that C is defined first, then A, then B, then everything in the previous CODE section will be dependent upon NRPE-service on B being OK.
My goal is to avoid listing specific services and just use the service group. So I figure, okay, this isn't parsing as I expect it to, so let's do each one host-by-host:
Code: Select all
define servicedependency{
host_name A
dependent_host_name A
service_description NRPE-service
dependent_servicegroup_name NRPE
inherits_parent 1
execution_failure_criteria u,c
}
<REPEAT FOR HOST B AND C>
Now the parser complains:
Error: Could not expand dependent service(s) (at config file '<dependency file>', starting on line 1)
(line 1 is the first line of the dependencies file)
So, whew! Even trying to use just the service group on each individual host doesn't work. So we tried using the hostgroup on each individual service:
Code: Select all
define servicedependency{
hostgroup_name private
service_description NRPE-service
dependent_service_description 1,2,3
inherits_parent 1
execution_failure_criteria u,c
}
And that works as desired (yay!). Except, as much as we have our Nagios automated and integrated with things, we don't want to have to change dependencies to add a new service every time we add a new service - that's the point of servicegroups! So I'd love to know if
we're doing something wrong or if it's the parser.