Service dependency problems
Posted: Tue Jul 06, 2010 10:10 am
I'm having a bit of a problem with how the service dependencies work in Nagios. Essentially, I'm using SNMP to perform things like process checks and disk availability checks. So what I've done is made those services dependent on the SNMP service. Here's an example config from the /usr/local/nagios/etc/servicedependencies.cfg file:
Seems pretty straightforward ... atd, cron, disk monitor, ssh, vas, vasgp all depend on the SNMP service (since all of those services use SNMP to perform the check). What I thought it would be doing is instantiating an active check against the "host_name/service_description" when one of the "dependent_host_name/dependent_service_description" checks failed.
What I'm seeing is it looks like the dependent service is validating against the current state of the parent service. So the dependent service notifications are going through, despite the fact that the parent service really is unresponsive. It just hasn't performed the check yet for the parent service.
To run through a chain of events using a timed example (all services are on 5 minute timers):
Am I missing some configuration item that will make the service dependencies behave the way I want? Or is this not something nagios can do at this time?
Code: Select all
define servicedependency {
dependent_host_name gb-doc-svb-0060
dependent_service_description atd,Cron,Disk Monitor,SSH,VAS,VAS GP
host_name gb-doc-svb-0060
service_description SNMP
inherits_parent 1
execution_failure_criteria w
notification_failure_criteria w,u,c
}
What I'm seeing is it looks like the dependent service is validating against the current state of the parent service. So the dependent service notifications are going through, despite the fact that the parent service really is unresponsive. It just hasn't performed the check yet for the parent service.
To run through a chain of events using a timed example (all services are on 5 minute timers):
Code: Select all
00:00 - Cron OK
00:10 - atd OK
00:15 - SNMP OK
00:20 - Disk Monitor OK
00:30 - SSH OK
00:40 - VAS OK
00:50 - VAS GP OK
04:30 - SNMP service crashes
05:00 - Cron Critical - Notification Sent
05:10 - atd Critical - Notification Sent
05:15 - SNMP Critical - Notification Sent
05:20 - Disk Monitor - Notification Suppressed
05:30 - SSH - Notification Suppressed
05:40 - VAS - Notification Suppressed
05:50 - VAS GP - Notification Suppressed