References:
Nagios 4 announcement mentioning service parents
Relevant excerpt from Section 2:
"Services now support a parents attribute. A service parent performs a function similar to host parents and can be used in place of service dependencies in simple circumstances."
Nagios 4 docs on service definitions
"parents: This directive is used to define a comma-delimited list of short names of the "parent" services for this particular service. Parent services are typically other services that need to be available in order for a check of this service to occur. For example, if a service checks the status of a disk using SSH, the disk check service would have the SSH service as a parent. If the service has no parent services, simply omit the "parents" directive. More complex service dependencies may be specified with service dependency objects."
This sounded ideal for simplifying some of my earlier configs that required one or multiple service dependencies that were simple child-parent situations.
In testing, I haven't seen notifications for child service checks get suppressed (get marked unreachable) when a parent service is in a critical state as I would expect. My tests with host directives' parents attribute appears to work as they did in prior versions of nagios.
I have a simplified test environment that I would like someone else to test or point out the error (or assumption) I'm making. The configs aren't necessarily appropriate for production and have mostly been organized to try to give a simplified example of the issue.
On host, 'nagiostestb', I have a web service running.
Code: Select all
define host{
use linux-server
host_name nagiostestb
alias nagiostestb
address 192.168.176.42
}
Code: Select all
define command{
command_name check_http
command_line $USER1$/check_http -I $HOSTADDRESS$ $ARG1$
}
define service{
use generic-service
host_name nagiostestb
service_description HTTP
check_command check_http
}
Code: Select all
define command{
command_name check_http_index_for_hello
command_line $USER1$/check_http -I $HOSTADDRESS$ -w 5 -c 10 -t 45 -u/index.html -rhello
}
define service{
use generic-service
host_name nagiostestb
service_description HTTP-INDEX-HI
check_command check_http_index_for_hello
parents HTTP
}
Code: Select all
define service{
register 0
name generic-service
active_checks_enabled 1
passive_checks_enabled 1
parallelize_check 1
obsess_over_service 1
check_freshness 0
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
is_volatile 0
check_period 24x7
max_check_attempts 2
normal_check_interval 1
retry_check_interval 1
contact_groups admins
check_interval 1
notification_options w,c,r
notification_interval 1
notification_period 24x7
}
If I stop the web service on the host, the HTTP service goes critical and alerts. The HTTP-INDEX-HI service also alerts since the page would not come back. I expected the HTTP-INDEX-HI service to be in a 'u' state and with the notification_options set for w,c,r to not notify. Instead, the HTTP-INDEX-HI service is in a critical state and notifies.
It does not appear to be due to ordering of the checks - If I acknowledge the HTTP service check, and leave the HTTP-INDEX-HI check unacknowledged, it continues to notify at the notification_interval (In older versions I've had cases where child hosts have alerted before nagios picked up the parent host was also out before it suppressed later notifications).
The only similar reference I found to this issue was here.
However, that thread appeared to get directed away from the service parent issue (and some of the responses appeared to be unaware of the added service definition feature in 4.0.x).
Did I miss a config setting to use the new feature or make it suppress notifications? Or does the service parents not work like the host parents defined in the reachability doc.
I can still define service dependencies the traditional way, and they appear to suppress notifications as expected, but I would very much like to use the new parents feature. A large percentage of my config files end up being service dependencies for simple service parent-child relationships to prevent the text deluge when a primary service fails.
The nagios server is running 4.0.2 on CentOS 6.5 x86_64.
I had the same issue with 4.0.1 on CentOS 6.4 x86_64 and did not test it with 4.0.0.
Relevant nagios.log entries:
Code: Select all
[1389935318] SERVICE ALERT: nagiostestb;HTTP;CRITICAL;SOFT;1;Connection refused
[1389935318] SERVICE ALERT: nagiostestb;HTTP-INDEX-HI;CRITICAL;SOFT;1;Connection refused
[1389935378] SERVICE ALERT: nagiostestb;HTTP-INDEX-HI;CRITICAL;HARD;2;Connection refused
[1389935378] SERVICE NOTIFICATION: nagiosadmin;nagiostestb;HTTP-INDEX-HI;CRITICAL;notify-service-by-email;Connection refused
[1389935378] SERVICE ALERT: nagiostestb;HTTP;CRITICAL;HARD;2;Connection refused
[1389935378] SERVICE NOTIFICATION: nagiosadmin;nagiostestb;HTTP;CRITICAL;notify-service-by-email;Connection refused
[1389935402] EXTERNAL COMMAND: ACKNOWLEDGE_SVC_PROBLEM;nagiostestb;HTTP;2;1;0;Nagios Admin;ack - this should have been the only notification
[1389935402] SERVICE NOTIFICATION: nagiosadmin;nagiostestb;HTTP;ACKNOWLEDGEMENT (CRITICAL);notify-service-by-email;Connection refused;Nagios Admin;ack - this should have been the only notification
[1389935438] SERVICE NOTIFICATION: nagiosadmin;nagiostestb;HTTP-INDEX-HI;CRITICAL;notify-service-by-email;Connection refused
[1389935498] SERVICE NOTIFICATION: nagiosadmin;nagiostestb;HTTP-INDEX-HI;CRITICAL;notify-service-by-email;Connection refused
...
until recovery
#full object definitions (includes the bits blocked out above)
Code: Select all
define timeperiod{
timeperiod_name 24x7
alias 24 Hours A Day, 7 Days A Week
sunday 00:00-24:00
monday 00:00-24:00
tuesday 00:00-24:00
wednesday 00:00-24:00
thursday 00:00-24:00
friday 00:00-24:00
saturday 00:00-24:00
}
# host template - generic-host
define host{
register 0
name generic-host
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
notification_period 24x7
}
# host template - linux-server
define host{
register 0
name linux-server
use generic-host
check_period 24x7
check_interval 1
retry_interval 1
max_check_attempts 3
check_command check-host-alive
notification_period 24x7
notification_interval 5
notification_options d,u,r
contact_groups admins
}
define command{
command_name notify-host-by-email
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$
}
define command{
command_name notify-service-by-email
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n" | /bin/mail -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
}
define command{
command_name check-host-alive
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
}
define command{
command_name check_snmp
command_line $USER1$/check_snmp -H $HOSTADDRESS$ $ARG1$
}
define command{
command_name check_ssh
command_line $USER1$/check_ssh $ARG1$ $HOSTADDRESS$
}
define command{
command_name check_ping
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5
}
define contact{
contact_name nagiosadmin
use generic-contact
alias Nagios Admin
email nagios@localhost
}
define contactgroup{
contactgroup_name admins
alias Nagios Administrators
members nagiosadmin
}
define hostgroup{
hostgroup_name linux-servers
alias Linux Servers
members nagiostestb
}
define service{
use generic-service
hostgroup linux-servers
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
define service{
use generic-service
hostgroup linux-servers
service_description SSH
check_command check_ssh
notifications_enabled 0
parents PING
}
#contact template - generic-contact
define contact{
register 0
name generic-contact
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,c,r
host_notification_options d,u,r,f,s
service_notification_commands notify-service-by-email
host_notification_commands notify-host-by-email
}
#nagiostestb
define host{
use linux-server
host_name nagiostestb
alias nagiostestb
address 192.168.176.42
}
define command{
command_name check_http
command_line $USER1$/check_http -I $HOSTADDRESS$ $ARG1$
}
define command{
command_name check_http_index_for_hello
command_line $USER1$/check_http -I $HOSTADDRESS$ -w 5 -c 10 -t 45 -u/index.html -rhello
}
define service{
use generic-service
host_name nagiostestb
service_description HTTP
check_command check_http
}
define service{
use generic-service
host_name nagiostestb
service_description HTTP-INDEX-HI
check_command check_http_index_for_hello
parents HTTP
}
#template generic-service
define service{
register 0
name generic-service
active_checks_enabled 1
passive_checks_enabled 1
parallelize_check 1
obsess_over_service 1
check_freshness 0
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
is_volatile 0
check_period 24x7
max_check_attempts 2
normal_check_interval 1
retry_check_interval 1
contact_groups admins
check_interval 1
notification_options w,c,r
notification_interval 1
notification_period 24x7
}
#nagios.cfg (conf.d has a single file containing the above object definitions)
Code: Select all
cfg_dir=/etc/nagios/conf.d
object_cache_file=/var/log/nagios/objects.cache
precached_object_file=/var/log/nagios/objects.precache
resource_file=/etc/nagios/private/resource.cfg
status_file=/var/log/nagios/status.dat
status_update_interval=10
nagios_user=nagios
nagios_group=nagios
check_external_commands=1
command_file=/var/spool/nagios/cmd/nagios.cmd
lock_file=/var/run/nagios/nagios.pid
temp_file=/var/log/nagios/nagios.tmp
temp_path=/tmp
event_broker_options=-1
log_rotation_method=d
log_archive_path=/var/log/nagios/archives
use_syslog=1
log_notifications=1
log_service_retries=1
log_host_retries=1
log_event_handlers=1
log_initial_states=0
log_current_states=1
log_external_commands=1
log_passive_checks=1
service_inter_check_delay_method=s
max_service_check_spread=30
service_interleave_factor=s
host_inter_check_delay_method=s
max_host_check_spread=30
max_concurrent_checks=0
check_result_reaper_frequency=10
max_check_result_reaper_time=30
check_result_path=/var/log/nagios/spool/checkresults
max_check_result_file_age=3600
cached_host_check_horizon=15
cached_service_check_horizon=15
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
soft_state_dependencies=0
auto_reschedule_checks=0
auto_rescheduling_interval=30
auto_rescheduling_window=180
service_check_timeout=60
host_check_timeout=30
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=5
retain_state_information=1
state_retention_file=/var/log/nagios/retention.dat
retention_update_interval=60
use_retained_program_state=1
use_retained_scheduling_info=1
retained_host_attribute_mask=0
retained_service_attribute_mask=0
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0
interval_length=60
check_for_updates=1
bare_update_check=0
use_aggressive_host_checking=0
execute_service_checks=1
accept_passive_service_checks=1
execute_host_checks=1
accept_passive_host_checks=1
enable_notifications=1
enable_event_handlers=1
process_performance_data=0
obsess_over_services=0
obsess_over_hosts=0
translate_passive_host_checks=0
passive_host_checks_are_soft=0
check_for_orphaned_services=1
check_for_orphaned_hosts=1
check_service_freshness=1
service_freshness_check_interval=60
service_check_timeout_state=c
check_host_freshness=0
host_freshness_check_interval=60
additional_freshness_latency=15
enable_flap_detection=1
low_service_flap_threshold=5.0
high_service_flap_threshold=20.0
low_host_flap_threshold=5.0
high_host_flap_threshold=20.0
date_format=us
illegal_object_name_chars=`~!$%^&*|'"<>?,()=
illegal_macro_output_chars=`~$&|'"<>
use_regexp_matching=0
use_true_regexp_matching=0
admin_email=nagios@localhost
admin_pager=pagenagios@localhost
daemon_dumps_core=0
use_large_installation_tweaks=0
enable_environment_macros=0
debug_level=0
debug_verbosity=1
debug_file=/var/log/nagios/nagios.debug
max_debug_file_size=1000000
allow_empty_hostgroup_assignment=0
Code: Select all
main_config_file=/etc/nagios/nagios.cfg
physical_html_path=/usr/share/nagios/html
url_html_path=/nagios
show_context_help=0
use_pending_states=1
use_authentication=1
use_ssl_authentication=0
authorized_for_system_information=nagiosadmin
authorized_for_configuration_information=nagiosadmin
authorized_for_system_commands=nagiosadmin
authorized_for_all_services=nagiosadmin
authorized_for_all_hosts=nagiosadmin
authorized_for_all_service_commands=nagiosadmin
authorized_for_all_host_commands=nagiosadmin
default_statusmap_layout=5
default_statuswrl_layout=4
ping_syntax=/bin/ping -n -U -c 5 $HOSTADDRESS$
refresh_rate=90
result_limit=100
escape_html_tags=1
action_url_target=_blank
notes_url_target=_blank
lock_author_names=1
navbar_search_for_addresses=1
navbar_search_for_aliases=1
Code: Select all
$USER1$=/usr/lib64/nagios/plugins