Service Dependency Management

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
simon.pytches
Posts: 13
Joined: Mon Feb 13, 2012 9:38 am

Service Dependency Management

Post by simon.pytches »

Help my heads melting!

Here what i've set up

1. I have a generic Host (it's a storage device) the generic host is pinging it and it's up
2. I've created a Host group to contain all storage devices of this type
3. Ive created a number of scripts
a. First script ("EVA_Systems_S2") probes the storage device for it's Hi level status. ie everything is ok or something needs attention.
b. A number of deeper dive status check scripts. example below is ("check_eva_disks")
4. These scripts are all assigned to the host group ("S2_HP_EVA")

What i'm trying to achieve is under normal "OK" circumstances just run the First script but when the first script goes to anything but "OK" I want to do the other checks.


Image

Here you can see my 3 scripts as you will notice that the master script is warning. ie no longer ok. However neither of the other checks are running

This is the service dependency for the first child script is this...

Image

Anyone got any ideas what's going wrong?

Cheers

Simon
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Service Dependency Management

Post by scottwilkerson »

This does look correct. What version of XI are you running?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
simon.pytches
Posts: 13
Joined: Mon Feb 13, 2012 9:38 am

Re: Service Dependency Management

Post by simon.pytches »

Nagios XI 2011R2.3
simon.pytches
Posts: 13
Joined: Mon Feb 13, 2012 9:38 am

Re: Service Dependency Management

Post by simon.pytches »

I've tried making a host template with the check_systems associated to the template. then made the check_disk_s2 and check_shelf_s2 dependent on the check_systems service template. In case it was something to do with the host being OK from the ping check as the ping check is no longer associated I thought it should run. :s even with check_systems returning a warning the check_disks service doesn't seem to run. If i hit the Schedule an immediate check on the disk service with the systems service in warning. the timer for the next interval moves later in time and the disk service stays in pending never checked.

Cheers
Si
simon.pytches
Posts: 13
Joined: Mon Feb 13, 2012 9:38 am

Re: Service Dependency Management

Post by simon.pytches »

I've even tried forcing the host to be down with the service also returning warning. but the check_disk service still doesn't run.

Cheers

Si
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Service Dependency Management

Post by scottwilkerson »

We have tested this and verified it is working properly, however you do need to realize that the it will only start the others when the service is in a HARD down state.

Verify on the service detail page -> Advanced Tab that the service is in a HARD Warning or Critical state
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
simon.pytches
Posts: 13
Joined: Mon Feb 13, 2012 9:38 am

Re: Service Dependency Management

Post by simon.pytches »

Service State: Warning
Duration: 1d 18h 34m 20s
State Type: Hard

This is from the check_systems service advance page. I was wondering if it could be anything to do the the scheduling and retry times of the child services :s (I'm kinda clutching at straws) or a permissions problem, though i've tried setting the dependency to n instead of o and they all start running.

Cheers

Si
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Service Dependency Management

Post by scottwilkerson »

I guess it is possibly due to the interval times.

On my machine, when I was testing it, as soon as the service went down, the next time the dependent services were set to check they did, after the service came back OK they immediately started skipping their checks
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked