Page 1 of 1
Passive Host Check goes immediately to HARD Host Down State
Posted: Mon Aug 07, 2017 2:34 am
by Fred Kroeger
I saw a similar thread on this forum - but it's locked with no resolution logged.
I have a remote Nagios XI server forwarding events to my main NagiosXI server via NRDP Outbound/Inbound transfers. All these monitors from the remote XI server are setup as passive monitors in the main NagiosXI server.
My problem is that as soon as the Host Down event hits the main NagiosXI server, it generates an alert immediately. Both the Host entry and the Passive Host Template have 4 retries configured.
In fact the Host History shows the event as "Hard 1of 4" instead of stepping through Soft 1 of 4 , Soft 2of 4 , Soft 3 of 4 before it should go "Hard 4 of 4".
Note - this is only happening with Passive Host events, The Passive Service events step through the required number of retries before going Hard as do the Active Host & Service events
I'm running NagioXI 5.4.3 on both the Remote & main NagiosXI servers.
Code: Select all
Date / Time Host Service State State Type Attempt Information
2017-08-06 16:08:03 x NTP Time OK SOFT 2 of 4 NTP OK: Offset 0.001703500748 secs
2017-08-06 16:06:38 x NTP Time CRITICAL SOFT 1 of 4 CRITICAL - Socket timeout after 10 seconds
2017-08-06 12:32:33 x Swap Usage OK SOFT 2 of 4 SWAP OK - 100% free (2047 MB out of 2047 MB)
2017-08-06 12:29:57 x Memory Usage OK SOFT 2 of 4 OK - 86.7% (16355192 kB) free.
2017-08-06 12:25:22 x Linux Service - sshd OK SOFT 2 of 4 openssh-daemon (pid 2327) is running...
2017-08-06 12:25:22 x Load OK SOFT 2 of 4 OK - load average: 0.49, 0.59, 0.72
2017-08-06 12:21:47 x UP HARD 1 of 4 OK - 10.0.32.116: rta 3.747ms, lost 0%
2017-08-06 12:20:12 x DOWN HARD 1 of 4 (Host Check Timed Out On Worker: z)
2017-08-06 12:19:57 x Swap Usage CRITICAL SOFT 1 of 4 (Service Check Timed Out On Worker: z)
2017-08-06 12:14:13 x ]UP HARD 1 of 4 OK - 10.0.32.116: rta 42.305ms, lost 0%
2017-08-06 12:10:47 x Memory Usage CRITICAL SOFT 1 of 4 (Service Check Timed Out On Worker: z)
2017-08-06 12:09:12 x DOWN HARD 1 of 4 (Host Check Timed Out On Worker: z)
2017-08-06 12:08:47 x Linux Service - sshd CRITICAL SOFT 1 of 4 (Service Check Timed Out On Worker: z)
2017-08-06 12:08:27 x Load CRITICAL SOFT 1 of 4 (Service Check Timed Out On Worker:z)
Re: Passive Host Check goes immediately to HARD Host Down St
Posted: Mon Aug 07, 2017 2:44 pm
by tgriep
Can you post your nagios.cfg file and an example of the passive host configuration from the XI server that is receiving the inbound transfers so we can view them.
Re: Passive Host Check goes immediately to HARD Host Down St
Posted: Mon Aug 07, 2017 6:51 pm
by Fred Kroeger
Code: Select all
# MODIFIED
admin_email=root@localhost
admin_pager=root@localhost
translate_passive_host_checks=1
log_event_handlers=1
use_large_installation_tweaks=1
enable_environment_macros=0
host_down_disable_service_checks=1
# Mod-Gearman Integration
broker_module=/usr/lib64/mod_gearman/mod_gearman.o config=/etc/mod_gearman/mod_gearman_neb.conf eventhandler=no
# BSM Configuration
# Removed 11/4/16 - FK
#broker_module=/opt/OV/HPBsmIntNagios/lib64/libbsmintneb4.so
# NDOUtils module
broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg
###object_cache_file=/var/nagiosramdisk/objects.cache
# PNP settings - bulk mode with NCPD
process_performance_data=1
# service performance data
#service_perfdata_file=/usr/local/nagios/var/service-perfdata
service_perfdata_file=/var/nagiosramdisk/service-perfdata
service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$\tSERVICEOUTPUT::$SERVICEOUTPUT$
service_perfdata_file_mode=a
service_perfdata_file_processing_interval=15
service_perfdata_file_processing_command=process-service-perfdata-file-bulk
# host performance data
#host_perfdata_file=/usr/local/nagios/var/host-perfdata
host_perfdata_file=/var/nagiosramdisk/host-perfdata
host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tHOSTOUTPUT::$HOSTOUTPUT$
host_perfdata_file_mode=a
host_perfdata_file_processing_interval=15
host_perfdata_file_processing_command=process-host-perfdata-file-bulk
# OBJECTS - UNMODIFIED
#cfg_file=/usr/local/nagios/etc/objects/commands.cfg
#cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
#cfg_file=/usr/local/nagios/etc/objects/templates.cfg
#cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg
# STATIC OBJECT DEFINITIONS (THESE DON'T GET EXPORTED/IMPORTED BY NAGIOSQL)
cfg_dir=/usr/local/nagios/etc/static
# OBJECTS EXPORTED FROM NAGIOSQL
cfg_file=/usr/local/nagios/etc/contacttemplates.cfg
cfg_file=/usr/local/nagios/etc/contactgroups.cfg
cfg_file=/usr/local/nagios/etc/contacts.cfg
cfg_file=/usr/local/nagios/etc/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/commands.cfg
cfg_file=/usr/local/nagios/etc/hostgroups.cfg
cfg_file=/usr/local/nagios/etc/servicegroups.cfg
cfg_file=/usr/local/nagios/etc/hosttemplates.cfg
cfg_file=/usr/local/nagios/etc/servicetemplates.cfg
cfg_file=/usr/local/nagios/etc/servicedependencies.cfg
cfg_file=/usr/local/nagios/etc/serviceescalations.cfg
cfg_file=/usr/local/nagios/etc/hostdependencies.cfg
cfg_file=/usr/local/nagios/etc/hostescalations.cfg
cfg_file=/usr/local/nagios/etc/hostextinfo.cfg
cfg_file=/usr/local/nagios/etc/serviceextinfo.cfg
cfg_dir=/usr/local/nagios/etc/hosts
cfg_dir=/usr/local/nagios/etc/services
# GLOBAL EVENT HANDLERS
global_host_event_handler=xi_host_event_handler
#global_service_event_handler=xi_service_event_handler
# UNMODIFIED
accept_passive_host_checks=1
accept_passive_service_checks=1
additional_freshness_latency=15
auto_reschedule_checks=1
auto_rescheduling_interval=30
auto_rescheduling_window=45
bare_update_check=0
cached_host_check_horizon=30
###cached_host_check_horizon=15
cached_service_check_horizon=15
check_external_commands=1
check_for_orphaned_hosts=1
check_for_orphaned_services=1
#check_for_updates=1
check_for_updates=0
#check_host_freshness=0
check_host_freshness=1
#check_result_path=/usr/local/nagios/var/spool/checkresults
check_result_path=/var/nagiosramdisk/spool/checkresults
#check_result_reaper_frequency=10
###check_result_reaper_frequency=3
check_result_reaper_frequency=5
check_service_freshness=1
#command_check_interval=-1
command_check_interval=5
command_file=/usr/local/nagios/var/rw/nagios.cmd
daemon_dumps_core=0
date_format=us
debug_file=/usr/local/nagios/var/nagios.debug
debug_level=0
#debug_verbosity=1
debug_verbosity=0
#enable_embedded_perl=1
enable_event_handlers=1
enable_flap_detection=1
enable_notifications=1
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
event_broker_options=-1
event_handler_timeout=30
execute_host_checks=1
execute_service_checks=1
#external_command_buffer_slots=4096
high_host_flap_threshold=20.0
#high_service_flap_threshold=20.0
high_service_flap_threshold=50.0
##host_check_timeout=30
host_check_timeout=60
host_freshness_check_interval=60
###host_freshness_check_interval=90
host_inter_check_delay_method=s
illegal_macro_output_chars=`~$&|'"<>
illegal_object_name_chars=`~!$%^&*|'"<>?,()=
interval_length=60
lock_file=/usr/local/nagios/var/nagios.lock
log_archive_path=/usr/local/nagios/var/archives
log_external_commands=1
log_file=/usr/local/nagios/var/nagios.log
log_host_retries=1
log_initial_states=0
log_notifications=1
log_passive_checks=0
log_rotation_method=d
log_service_retries=1
low_host_flap_threshold=5.0
#low_service_flap_threshold=5.0
low_service_flap_threshold=25.0
max_check_result_file_age=3600
#max_check_result_reaper_time=20
#max_check_result_reaper_time=30
##max_check_result_reaper_time=15
#####max_check_result_reaper_time=10
max_check_result_reaper_time=20
##max_concurrent_checks=0
max_concurrent_checks=0
#max_concurrent_checks=90
max_debug_file_size=1000000
max_host_check_spread=30
max_service_check_spread=30
nagios_group=nagios
nagios_user=nagios
notification_timeout=30
#object_cache_file=/usr/local/nagios/var/objects.cache
object_cache_file=/var/nagiosramdisk/objects.cache
obsess_over_hosts=0
obsess_over_services=0
ocsp_timeout=5
#p1_file=/usr/local/nagios/bin/p1.pl
passive_host_checks_are_soft=0
perfdata_timeout=5
###perfdata_timeout=15
precached_object_file=/usr/local/nagios/var/objects.precache
resource_file=/usr/local/nagios/etc/resource.cfg
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0
retained_host_attribute_mask=0
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
retained_service_attribute_mask=0
retain_state_information=1
retention_update_interval=60
##service_check_timeout=60
service_check_timeout=90
service_freshness_check_interval=60
service_inter_check_delay_method=s
service_interleave_factor=s
#sleep_time=0.25
soft_state_dependencies=0
state_retention_file=/usr/local/nagios/var/retention.dat
#status_file=/usr/local/nagios/var/status.dat
status_file=/var/nagiosramdisk/status.dat
##status_update_interval=10
status_update_interval=20
temp_file=/usr/local/nagios/var/nagios.tmp
#temp_path=/tmp
temp_path=/var/nagiosramdisk/tmp
use_aggressive_host_checking=0
######use_embedded_perl_implicitly=1
use_regexp_matching=0
use_retained_program_state=1
use_retained_scheduling_info=1
use_syslog=1
use_true_regexp_matching=0
Sample Host config file - note the xiwizard_passive_host template has also been changed for 4 retries, in case the template was being used instead of the retries defined for the host
Code: Select all
###############################################################################
#
# Host configuration file
#
# Created by: Nagios Core Config Manager 2.6.5
# Date: 2017-07-27 11:16:23
# Version: Nagios 3.x config file
#
# --- DO NOT EDIT THIS FILE BY HAND ---
# Nagios CCM will overwrite all manual settings during the next update if you
# would like to edit files manually, place them in the 'static' directory or
# import your configs into the CCM by placing them in the 'import' directory.
#
###############################################################################
define host {
host_name server.x
use xiwizard_passive_host
alias UNIX
display_name a19fc1404f181200cd111d801310c70d
address 10.x.x.x
parents parent.x
hostgroups CAT-1 Linux
check_command check-host-alive!!!!!!!!
max_check_attempts 4
check_interval 5
retry_interval 2
check_period 24x7
event_handler SNE_Host-Event_Handler
contact_groups SCADA-Unix-Cat1
notification_interval 0
notification_period 24x7
notification_options d,u,r,
icon_image redhat.png
statusmap_image redhat.gd2
register 1
}
###############################################################################
#
# Host configuration file
#
# END OF FILE
#
###############################################################################
Re: Passive Host Check goes immediately to HARD Host Down St
Posted: Tue Aug 08, 2017 11:01 am
by tgriep
Try changing this setting in the nagios.cfg file from
to
Save and restart the nagios process by running
This option determines whether or not Nagios will treat passive host checks as HARD states or SOFT states. By default, a passive host check result will put a host into a HARD state type. You can change this behavior by enabling this option.
0 = Passive host checks are HARD (default)
1 = Passive host checks are SOFT
Re: Passive Host Check goes immediately to HARD Host Down St
Posted: Tue Aug 08, 2017 7:49 pm
by Fred Kroeger
Thanks - I'll give that a try.
A bit off thread.... but can you point me to a document that lists all the nagios.cfg settings and explains what they do ?
Re: Passive Host Check goes immediately to HARD Host Down St
Posted: Wed Aug 09, 2017 9:45 am
by bolson
Re: Passive Host Check goes immediately to HARD Host Down St
Posted: Thu Aug 10, 2017 8:35 pm
by Fred Kroeger
Thanks for your help & the link
Can also confirm that changing the passive_host_checks_are_soft setting worked as expected
You can lock this one up.
Fred