Service Dependency

ffolse · Post by **ffolse** » Tue Mar 24, 2015 1:32 pm

Hello,

I need some clarification when setting up service dependencies on the same host. My current config is as follows:

define servicedependency {
dependent_hostgroup_name HOSTGROUP1
dependent_service_description SWAP
hostgroup_name HOSTGROUP1
service_description SSH
inherits_parent 1
execution_failure_criteria c,
notification_failure_criteria c,
dependency_period xi_timeperiod_24x7

}

With this configuration, SWAP is still sending notifications when SSH service is down on any of the hosts that belongs to HOSTGROUP1. We should only be getting SSH alerts.
In the nagios documentation, it says that if the service dependent is on the same host as the service being checked, no host/hostgroup needs to be provided, however, NagiosXi Config Manager is requiring input and wil not save without selecting hostgroups.

Thanks and if anyone can clarify the proper config that would be great.

--
Liz

tmcdonald · Post by **tmcdonald** » Tue Mar 24, 2015 2:32 pm

Are you trying to say

"If ANY of the SSH services on ANY of the hosts in HOSTGROUP1 go down, stop alerting for SWAP on ALL of those hosts"

or

"If the SSH service on ANY of the hosts in HOSTGROUP1 goes down, stop alerting for SWAP on THAT host"?

ffolse · Post by **ffolse** » Tue Mar 24, 2015 2:59 pm

If the SSH service on ANY of the hosts in HOSTGROUP1 goes down, stop alerting for SWAP on THAT host"?

I need for SWAP service to be dependent on the SSH service on a particular host.

Thanks,
ffolse

Post by **lmiltchev** » Tue Mar 24, 2015 3:28 pm

Can you show us the HOSTGROUP1's definition and the actual email notification that you received for the "swap service"?

ffolse · Post by **ffolse** » Tue Mar 24, 2015 3:40 pm

HOSTGROUP1:

define hostgroup {
hostgroup_name HOSTGROUP1
alias DB Dev Servers
members host1,host2,host3
}

Ive edited the IPs on this notification email:

From: wxnagios01@txxxxx
Sent: Tuesday, March 24, 2015 10:15 AM
To: ffolse
Subject: PROBLEM (CRITICAL) host1 SWAP
Importance: High

Nagios has detected a problem with this service.
Connection refused by host
(sanmateo) IP: xxx.xxx.xx.x
2015-03-24 10:14:39

Post by **Nagios Support** » Tue Mar 24, 2015 4:55 pm

We will try to recreate the issue in-house and will get back to you within the next 24 hours.

Post by **Box293** » Tue Mar 24, 2015 6:07 pm

ffolse wrote:With this configuration, SWAP is still sending notifications when SSH service is down on any of the hosts that belongs to HOSTGROUP1. We should only be getting SSH alerts.

I would like to see the service definitions for both SSH and SWAP. Specifically I want to see the check_interval, retry_interval and max_check_attempts.

ffolse · Post by **ffolse** » Wed Mar 25, 2015 12:29 pm

define service {
service_description SWAP
hostgroup_name HOSTGROUP1
check_command check_nrpe!check_swap!!!!!!!
max_check_attempts 3
check_interval 3
retry_interval 1
check_period xi_timeperiod_24x7
notification_interval 30
first_notification_delay 0
notification_period work_hours
notification_options c,r,s,
notifications_enabled 0
contact_groups linuxteam
_xiwizard nrpe
register 1
}

define service {
service_description SSH
hostgroup_name HOSTGROUP1
check_command check_nrpe!check_ssh!!!!!!!
max_check_attempts 3
check_interval 3
retry_interval 1
check_period xi_timeperiod_24x7
notification_interval 30
first_notification_delay 0
notification_period xi_timeperiod_24x7
notification_options c,r,s,
notifications_enabled 1
contact_groups linuxteam
_xiwizard nrpe
register 1
}

Post by **lmiltchev** » Wed Mar 25, 2015 2:04 pm

I was not able to recreate this issue. Maybe we will need to move this to the email ticketing system.

BTW, here's what I tried:

Code: Select all

define servicedependency {
       dependent_hostgroup_name      		HOSTGROUP1
       dependent_service_description 		Swap Usage HOSTGROUP1
       hostgroup_name                		HOSTGROUP1
       service_description           		SSH Server HOSTGROUP1
       inherits_parent               		1
       execution_failure_criteria    		c,
       notification_failure_criteria 		c,
       dependency_period             		24x7

}

define hostgroup {
	hostgroup_name                		HOSTGROUP1
	alias                         		HOSTGROUP1
	members                       		CentOS6-SNMP
	}

define service {
	service_description		SSH Server HOSTGROUP1
	use				xiwizard_nrpe_service
	hostgroup_name			HOSTGROUP1
	check_command			check_nrpe!check_init_service!-a 'sshd'!!!!!!
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	notification_interval		60
	contacts			nagiosadmin
	_xiwizard			linux-server
	register			1
	}
	
define service {
	service_description		Swap Usage HOSTGROUP1
	use				xiwizard_nrpe_service
	hostgroup_name			HOSTGROUP1
	check_command			check_nrpe!check_swap!-a '-w 50 -c 20'!!!!!!
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	notification_interval		60
	contacts			nagiosadmin
	_xiwizard			linux-server
	register			1
	}

Both services failed, but I got notified only about the SSH.

example01.PNG

example02.PNG

Post by **Box293** » Wed Mar 25, 2015 4:49 pm

I think your problem relates to the SSH service having the same intervals and retires as the services that depend on it.

Here's a scenario:

SSH
Check Interval: 3m
Retry Interval: 1m
Max Check Attempts: 3

SWAP
Check Interval: 3m
Retry Interval: 1m
Max Check Attempts: 3

1.00 Nagios checks SSH service, service is OK, next check is 1.03, attempt 1/3
1.01 SSH service breaks somehow, Nagios does not know about it yet
1.02 Nagios checks SWAP service, fails because of SSH service broken, service is w/c/u, SOFT state, next check is 1.03, attempt 1/3
1.03 Nagios checks SSH service, fails, service is w/c/u, SOFT state, next check is 1.04, attempt 1/3
1.03 Nagios checks SWAP service, fails because of SSH service broken, service is w/c/u, SOFT state, next check is 1.04, attempt 2/3
1.04 Nagios checks SSH service, fails, service is w/c/u, SOFT state, next check is 1.05, attempt 2/3
1.04 Nagios checks SWAP service, fails because of SSH service broken, service is w/c/u, HARD state, notifications sent, next check is 1.05, attempt 3/3
1.05 Nagios checks SSH service, fails, service is w/c/u, HARD state, notifications sent, service dependencies now apply, next check is 1.06, attempt 3/3
1.05 Nagios pushes back check of SWAP service because dependencies now apply, however remains in a critical state. service is w/c/u, HARD state, next check is 1.06, attempt 3/3

So what is happening in this scenario is that the SWAP service goes critical BEFORE the SSH service does and sends out notifications. To stop this from happening, set your SSH service to have the check_interval of 1m AND max_check_attempts to 2. This means that in the scenario above, the SSH service would have entered the HARD state first and then the service dependencies would have taken affect.

Does this make sense?

Nagios Support Forum

Service Dependency

Service Dependency

Re: Service Dependency

Re: Service Dependency

Re: Service Dependency

Re: Service Dependency

Re: Service Dependency

Re: Service Dependency

Re: Service Dependency

Re: Service Dependency

Re: Service Dependency