Page 1 of 1

Passive Host Check goes immediately to HARD Host Down State

Posted: Mon Aug 07, 2017 2:34 am
by Fred Kroeger
I saw a similar thread on this forum - but it's locked with no resolution logged.
I have a remote Nagios XI server forwarding events to my main NagiosXI server via NRDP Outbound/Inbound transfers. All these monitors from the remote XI server are setup as passive monitors in the main NagiosXI server.
My problem is that as soon as the Host Down event hits the main NagiosXI server, it generates an alert immediately. Both the Host entry and the Passive Host Template have 4 retries configured.
In fact the Host History shows the event as "Hard 1of 4" instead of stepping through Soft 1 of 4 , Soft 2of 4 , Soft 3 of 4 before it should go "Hard 4 of 4".
Note - this is only happening with Passive Host events, The Passive Service events step through the required number of retries before going Hard as do the Active Host & Service events

I'm running NagioXI 5.4.3 on both the Remote & main NagiosXI servers.

Code: Select all

Date / Time 	       Host 	Service 	State 	State Type 	Attempt 	Information
2017-08-06 16:08:03	x 	NTP Time	OK	        SOFT	2 of 4	NTP OK: Offset 0.001703500748 secs
2017-08-06 16:06:38	x 	NTP Time	CRITICAL	SOFT	1 of 4	CRITICAL - Socket timeout after 10 seconds
2017-08-06 12:32:33	x      Swap Usage	OK	SOFT	2 of 4	SWAP OK - 100% free (2047 MB out of 2047 MB)
2017-08-06 12:29:57	x	Memory Usage	OK	SOFT	2 of 4	OK - 86.7% (16355192 kB) free.
2017-08-06 12:25:22	x	Linux Service - sshd	OK	SOFT	2 of 4	openssh-daemon (pid 2327) is running...
2017-08-06 12:25:22	x	Load	OK	                SOFT	2 of 4	OK - load average: 0.49, 0.59, 0.72
2017-08-06 12:21:47	x		UP	               HARD	1 of 4	OK - 10.0.32.116: rta 3.747ms, lost 0%
2017-08-06 12:20:12	x		DOWN	      HARD	1 of 4	(Host Check Timed Out On Worker: z)
2017-08-06 12:19:57	x	Swap Usage	CRITICAL	SOFT	1 of 4	(Service Check Timed Out On Worker: z)
2017-08-06 12:14:13	x		]UP	              HARD	1 of 4	OK - 10.0.32.116: rta 42.305ms, lost 0%
2017-08-06 12:10:47	x	Memory Usage	CRITICAL	SOFT	1 of 4	(Service Check Timed Out On Worker: z)
2017-08-06 12:09:12	x		DOWN	     HARD	1 of 4	(Host Check Timed Out On Worker: z)
2017-08-06 12:08:47	x	Linux Service - sshd	CRITICAL	SOFT	1 of 4	(Service Check Timed Out On Worker: z)
2017-08-06 12:08:27	x	Load	CRITICAL	     SOFT	1 of 4	(Service Check Timed Out On Worker:z)

Re: Passive Host Check goes immediately to HARD Host Down St

Posted: Mon Aug 07, 2017 2:44 pm
by tgriep
Can you post your nagios.cfg file and an example of the passive host configuration from the XI server that is receiving the inbound transfers so we can view them.

Re: Passive Host Check goes immediately to HARD Host Down St

Posted: Mon Aug 07, 2017 6:51 pm
by Fred Kroeger

Code: Select all

# MODIFIED
admin_email=root@localhost
admin_pager=root@localhost
translate_passive_host_checks=1
log_event_handlers=1
use_large_installation_tweaks=1
enable_environment_macros=0
host_down_disable_service_checks=1

# Mod-Gearman Integration
broker_module=/usr/lib64/mod_gearman/mod_gearman.o config=/etc/mod_gearman/mod_gearman_neb.conf eventhandler=no

# BSM Configuration
# Removed 11/4/16 - FK
#broker_module=/opt/OV/HPBsmIntNagios/lib64/libbsmintneb4.so


# NDOUtils module
broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg

###object_cache_file=/var/nagiosramdisk/objects.cache

# PNP settings - bulk mode with NCPD
process_performance_data=1
# service performance data
#service_perfdata_file=/usr/local/nagios/var/service-perfdata
service_perfdata_file=/var/nagiosramdisk/service-perfdata
service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$\tSERVICEOUTPUT::$SERVICEOUTPUT$
service_perfdata_file_mode=a
service_perfdata_file_processing_interval=15
service_perfdata_file_processing_command=process-service-perfdata-file-bulk
# host performance data
#host_perfdata_file=/usr/local/nagios/var/host-perfdata
host_perfdata_file=/var/nagiosramdisk/host-perfdata
host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tHOSTOUTPUT::$HOSTOUTPUT$
host_perfdata_file_mode=a
host_perfdata_file_processing_interval=15
host_perfdata_file_processing_command=process-host-perfdata-file-bulk


# OBJECTS - UNMODIFIED
#cfg_file=/usr/local/nagios/etc/objects/commands.cfg
#cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
#cfg_file=/usr/local/nagios/etc/objects/templates.cfg
#cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg


# STATIC OBJECT DEFINITIONS (THESE DON'T GET EXPORTED/IMPORTED BY NAGIOSQL)
cfg_dir=/usr/local/nagios/etc/static

# OBJECTS EXPORTED FROM NAGIOSQL
cfg_file=/usr/local/nagios/etc/contacttemplates.cfg
cfg_file=/usr/local/nagios/etc/contactgroups.cfg
cfg_file=/usr/local/nagios/etc/contacts.cfg
cfg_file=/usr/local/nagios/etc/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/commands.cfg
cfg_file=/usr/local/nagios/etc/hostgroups.cfg
cfg_file=/usr/local/nagios/etc/servicegroups.cfg
cfg_file=/usr/local/nagios/etc/hosttemplates.cfg
cfg_file=/usr/local/nagios/etc/servicetemplates.cfg
cfg_file=/usr/local/nagios/etc/servicedependencies.cfg
cfg_file=/usr/local/nagios/etc/serviceescalations.cfg
cfg_file=/usr/local/nagios/etc/hostdependencies.cfg
cfg_file=/usr/local/nagios/etc/hostescalations.cfg
cfg_file=/usr/local/nagios/etc/hostextinfo.cfg
cfg_file=/usr/local/nagios/etc/serviceextinfo.cfg
cfg_dir=/usr/local/nagios/etc/hosts
cfg_dir=/usr/local/nagios/etc/services

# GLOBAL EVENT HANDLERS
global_host_event_handler=xi_host_event_handler
#global_service_event_handler=xi_service_event_handler



# UNMODIFIED
accept_passive_host_checks=1
accept_passive_service_checks=1
additional_freshness_latency=15
auto_reschedule_checks=1
auto_rescheduling_interval=30
auto_rescheduling_window=45
bare_update_check=0
cached_host_check_horizon=30
###cached_host_check_horizon=15
cached_service_check_horizon=15
check_external_commands=1
check_for_orphaned_hosts=1
check_for_orphaned_services=1
#check_for_updates=1
check_for_updates=0
#check_host_freshness=0
check_host_freshness=1
#check_result_path=/usr/local/nagios/var/spool/checkresults
check_result_path=/var/nagiosramdisk/spool/checkresults
#check_result_reaper_frequency=10
###check_result_reaper_frequency=3
check_result_reaper_frequency=5
check_service_freshness=1
#command_check_interval=-1
command_check_interval=5
command_file=/usr/local/nagios/var/rw/nagios.cmd
daemon_dumps_core=0
date_format=us
debug_file=/usr/local/nagios/var/nagios.debug
debug_level=0
#debug_verbosity=1
debug_verbosity=0
#enable_embedded_perl=1
enable_event_handlers=1
enable_flap_detection=1
enable_notifications=1
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
event_broker_options=-1
event_handler_timeout=30
execute_host_checks=1
execute_service_checks=1
#external_command_buffer_slots=4096
high_host_flap_threshold=20.0
#high_service_flap_threshold=20.0
high_service_flap_threshold=50.0
##host_check_timeout=30
host_check_timeout=60
host_freshness_check_interval=60
###host_freshness_check_interval=90
host_inter_check_delay_method=s
illegal_macro_output_chars=`~$&|'"<>
illegal_object_name_chars=`~!$%^&*|'"<>?,()=
interval_length=60
lock_file=/usr/local/nagios/var/nagios.lock
log_archive_path=/usr/local/nagios/var/archives
log_external_commands=1
log_file=/usr/local/nagios/var/nagios.log
log_host_retries=1
log_initial_states=0
log_notifications=1
log_passive_checks=0
log_rotation_method=d
log_service_retries=1
low_host_flap_threshold=5.0
#low_service_flap_threshold=5.0
low_service_flap_threshold=25.0
max_check_result_file_age=3600
#max_check_result_reaper_time=20
#max_check_result_reaper_time=30
##max_check_result_reaper_time=15
#####max_check_result_reaper_time=10
max_check_result_reaper_time=20
##max_concurrent_checks=0
max_concurrent_checks=0
#max_concurrent_checks=90
max_debug_file_size=1000000
max_host_check_spread=30
max_service_check_spread=30
nagios_group=nagios
nagios_user=nagios
notification_timeout=30
#object_cache_file=/usr/local/nagios/var/objects.cache
object_cache_file=/var/nagiosramdisk/objects.cache
obsess_over_hosts=0
obsess_over_services=0
ocsp_timeout=5
#p1_file=/usr/local/nagios/bin/p1.pl
passive_host_checks_are_soft=0
perfdata_timeout=5
###perfdata_timeout=15
precached_object_file=/usr/local/nagios/var/objects.precache
resource_file=/usr/local/nagios/etc/resource.cfg
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0
retained_host_attribute_mask=0
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
retained_service_attribute_mask=0
retain_state_information=1
retention_update_interval=60
##service_check_timeout=60
service_check_timeout=90
service_freshness_check_interval=60
service_inter_check_delay_method=s
service_interleave_factor=s
#sleep_time=0.25
soft_state_dependencies=0
state_retention_file=/usr/local/nagios/var/retention.dat
#status_file=/usr/local/nagios/var/status.dat
status_file=/var/nagiosramdisk/status.dat
##status_update_interval=10
status_update_interval=20
temp_file=/usr/local/nagios/var/nagios.tmp
#temp_path=/tmp
temp_path=/var/nagiosramdisk/tmp
use_aggressive_host_checking=0
######use_embedded_perl_implicitly=1
use_regexp_matching=0
use_retained_program_state=1
use_retained_scheduling_info=1
use_syslog=1
use_true_regexp_matching=0
Sample Host config file - note the xiwizard_passive_host template has also been changed for 4 retries, in case the template was being used instead of the retries defined for the host

Code: Select all

###############################################################################
#
# Host configuration file
#
# Created by: Nagios Core Config Manager 2.6.5
# Date:       2017-07-27 11:16:23
# Version:    Nagios 3.x config file
#
# --- DO NOT EDIT THIS FILE BY HAND ---
# Nagios CCM will overwrite all manual settings during the next update if you
# would like to edit files manually, place them in the 'static' directory or
# import your configs into the CCM by placing them in the 'import' directory.
#
###############################################################################

define host {
        host_name                       server.x
        use                             xiwizard_passive_host
        alias                           UNIX
        display_name                    a19fc1404f181200cd111d801310c70d
        address                         10.x.x.x
        parents                         parent.x
        hostgroups                      CAT-1 Linux
        check_command                   check-host-alive!!!!!!!!
        max_check_attempts              4
        check_interval                  5
        retry_interval                  2
        check_period                    24x7
        event_handler                   SNE_Host-Event_Handler
        contact_groups                  SCADA-Unix-Cat1
        notification_interval           0
        notification_period             24x7
        notification_options            d,u,r,
        icon_image                      redhat.png
        statusmap_image                 redhat.gd2
        register                        1
        }

###############################################################################
#
# Host configuration file
#
# END OF FILE
#
###############################################################################

Re: Passive Host Check goes immediately to HARD Host Down St

Posted: Tue Aug 08, 2017 11:01 am
by tgriep
Try changing this setting in the nagios.cfg file from

Code: Select all

passive_host_checks_are_soft=0
to

Code: Select all

passive_host_checks_are_soft=1
Save and restart the nagios process by running

Code: Select all

service nagios restart
This option determines whether or not Nagios will treat passive host checks as HARD states or SOFT states. By default, a passive host check result will put a host into a HARD state type. You can change this behavior by enabling this option.

0 = Passive host checks are HARD (default)
1 = Passive host checks are SOFT

Re: Passive Host Check goes immediately to HARD Host Down St

Posted: Tue Aug 08, 2017 7:49 pm
by Fred Kroeger
Thanks - I'll give that a try.
A bit off thread.... but can you point me to a document that lists all the nagios.cfg settings and explains what they do ?

Re: Passive Host Check goes immediately to HARD Host Down St

Posted: Wed Aug 09, 2017 9:45 am
by bolson

Re: Passive Host Check goes immediately to HARD Host Down St

Posted: Thu Aug 10, 2017 8:35 pm
by Fred Kroeger
Thanks for your help & the link
Can also confirm that changing the passive_host_checks_are_soft setting worked as expected
You can lock this one up.

Fred