Page 1 of 2

Freshness Checks

Posted: Wed Nov 02, 2011 3:15 am
by SDohmen
Hello,

Currently i am running a CentOS 6 installation with a NagiosXI 1.7 updated to 1.8.

The checks are all recieved by NRDP which are send from several client core installations.

Because of this passive distribution system we have freshness checks enabled. However it seems that for the host checks it works fine but for the service checks it looks like it executes the normal command instead of the freshness command. All active checks have been disabled to make sure this wasn't the problem. According to some other post on the forum you need to have the flapping check disabled aswell but also here no luck. The config files are posted below.

Nagios.cfg

Code: Select all

# MODIFIED
admin_email=root@localhost
admin_pager=root@localhost
translate_passive_host_checks=1
log_event_handlers=0
use_large_installation_tweaks=1
enable_environment_macros=0


# NDOUtils module
broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg


# PNP settings - bulk mode with NCPD
process_performance_data=1
# service performance data
service_perfdata_file=/usr/local/nagios/var/service-perfdata
service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$\tSERVICEOUTPUT::$SERVICEOUTPUT$
service_perfdata_file_mode=a
service_perfdata_file_processing_interval=15
service_perfdata_file_processing_command=process-service-perfdata-file-bulk
# host performance data
host_perfdata_file=/usr/local/nagios/var/host-perfdata
host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tHOSTOUTPUT::$HOSTOUTPUT$
host_perfdata_file_mode=a
host_perfdata_file_processing_interval=15
host_perfdata_file_processing_command=process-host-perfdata-file-bulk


# OBJECTS - UNMODIFIED
#cfg_file=/usr/local/nagios/etc/objects/commands.cfg
#cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
#cfg_file=/usr/local/nagios/etc/objects/templates.cfg
#cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg


# STATIC OBJECT DEFINITIONS (THESE DON'T GET EXPORTED/IMPORTED BY NAGIOSQL)
cfg_dir=/usr/local/nagios/etc/static

# OBJECTS EXPORTED FROM NAGIOSQL
cfg_file=/usr/local/nagios/etc/contacttemplates.cfg
cfg_file=/usr/local/nagios/etc/contactgroups.cfg
cfg_file=/usr/local/nagios/etc/contacts.cfg
cfg_file=/usr/local/nagios/etc/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/commands.cfg
cfg_file=/usr/local/nagios/etc/hostgroups.cfg
cfg_file=/usr/local/nagios/etc/servicegroups.cfg
cfg_file=/usr/local/nagios/etc/hosttemplates.cfg
cfg_file=/usr/local/nagios/etc/servicetemplates.cfg
cfg_file=/usr/local/nagios/etc/servicedependencies.cfg
cfg_file=/usr/local/nagios/etc/serviceescalations.cfg
cfg_file=/usr/local/nagios/etc/hostdependencies.cfg
cfg_file=/usr/local/nagios/etc/hostescalations.cfg
cfg_file=/usr/local/nagios/etc/hostextinfo.cfg
cfg_file=/usr/local/nagios/etc/serviceextinfo.cfg
cfg_dir=/usr/local/nagios/etc/hosts
cfg_dir=/usr/local/nagios/etc/services

# GLOBAL EVENT HANDLERS
global_host_event_handler=xi_host_event_handler
global_service_event_handler=xi_service_event_handler



# UNMODIFIED
accept_passive_host_checks=1
accept_passive_service_checks=1
additional_freshness_latency=15
auto_reschedule_checks=0
auto_rescheduling_interval=30
auto_rescheduling_window=180
bare_update_check=0
cached_host_check_horizon=15
cached_service_check_horizon=15
check_external_commands=1
check_for_orphaned_hosts=1
check_for_orphaned_services=1
check_for_updates=1
check_host_freshness=1
check_result_path=/usr/local/nagios/var/spool/checkresults
check_result_reaper_frequency=10
check_service_freshness=1
command_check_interval=-1
command_file=/usr/local/nagios/var/rw/nagios.cmd
daemon_dumps_core=0
date_format=us
debug_file=/usr/local/nagios/var/nagios.debug
debug_level=0
debug_verbosity=1
enable_embedded_perl=1
enable_event_handlers=1
enable_flap_detection=1
enable_notifications=1
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
event_broker_options=-1
event_handler_timeout=30
execute_host_checks=1
execute_service_checks=0
external_command_buffer_slots=4096
high_host_flap_threshold=20.0
high_service_flap_threshold=20.0
host_check_timeout=30
host_freshness_check_interval=60
host_inter_check_delay_method=s
illegal_macro_output_chars=`~$&|'"<>
illegal_object_name_chars=`~!$%^&*|'"<>?,()=
interval_length=60
lock_file=/usr/local/nagios/var/nagios.lock
log_archive_path=/usr/local/nagios/var/archives
log_external_commands=0
log_file=/usr/local/nagios/var/nagios.log
log_host_retries=1
log_initial_states=0
log_notifications=1
log_passive_checks=0
log_rotation_method=d
log_service_retries=1
low_host_flap_threshold=5.0
low_service_flap_threshold=5.0
max_check_result_file_age=3600
max_check_result_reaper_time=30
max_concurrent_checks=0
max_debug_file_size=1000000
max_host_check_spread=30
max_service_check_spread=30
nagios_group=nagios
nagios_user=nagios
notification_timeout=30
object_cache_file=/usr/local/nagios/var/objects.cache
obsess_over_hosts=0
obsess_over_services=0
ocsp_timeout=5
p1_file=/usr/local/nagios/bin/p1.pl
passive_host_checks_are_soft=0
perfdata_timeout=5
precached_object_file=/usr/local/nagios/var/objects.precache
resource_file=/usr/local/nagios/etc/resource.cfg
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0
retained_host_attribute_mask=0
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
retained_service_attribute_mask=0
retain_state_information=1
retention_update_interval=60
service_check_timeout=60
service_freshness_check_interval=60
service_inter_check_delay_method=s
service_interleave_factor=s
sleep_time=0.25
soft_state_dependencies=0
state_retention_file=/usr/local/nagios/var/retention.dat
status_file=/usr/local/nagios/var/status.dat
status_update_interval=10
temp_file=/usr/local/nagios/var/nagios.tmp
temp_path=/tmp
use_aggressive_host_checking=0
use_embedded_perl_implicitly=1
use_regexp_matching=0
use_retained_program_state=1
use_retained_scheduling_info=1
use_syslog=1
use_true_regexp_matching=0
The service templates are as follows:
Generic service (pure the standard information which confirms when to check etc, this is because the central itself is actively checked)
Passive service (Here the freshness check is forced and the active check disabled)
customer service (this only exists so customers can have there own contact groups for notifications, the rest is all skipped)

The service itself uses the customer service template. All 3 are posted below.
generic.JPG
passive_1.JPG
customer_1.JPG

Re: Freshness Checks

Posted: Wed Nov 02, 2011 3:16 am
by SDohmen
passive_2.JPG
customer_2.JPG
These are the 2 common tab screens of the same templates.

Re: Freshness Checks

Posted: Wed Nov 02, 2011 3:26 am
by SDohmen
Host.jpg
bbmap.JPG
As you see on the bbmap there are 3 things.

The hosts are all childs of the ars-osm host which is the customer core installation. From the hosts itself all works fine and they register as host is stale. The service however give me a result which is the same as the normal service check and to make matters even weirder the checks from the nagios installation itself are being noted as up.

You might think why is this weird, well its easy to explain when i say that the nagios core is actually shut down so there are no checks/sends what so ever.

My biggest question now is how do i get the service freshness checks working and how can i fix the core checks to be stale aswell?

Re: Freshness Checks

Posted: Thu Nov 03, 2011 11:13 am
by mguthrie
Ok, this might be worth a read:
http://nagios.sourceforge.net/docs/3_0/freshness.html

The check command that you've defined for these services is the command that will be actively run once Nagios detects that the results are stale. Even if active checks are enabled, if you've enabled freshness checking, Nagios will issue an active check command (that you've defined) once the freshness threshold has been exceeded.

Re: Freshness Checks

Posted: Fri Nov 04, 2011 1:57 pm
by SDohmen
I am pretty sure the command and all are set as you can see on the pictures on top. The problem is that for the services the command isn't run but for some reason the actual command that is set to that service is run which of course fails. With that also the commands for the local core installation are run (and give back data) even though that machine is completely down and there is no way the commands are coming from that machine.

Re: Freshness Checks

Posted: Mon Nov 07, 2011 3:15 pm
by mguthrie
I'm not sure I'm completely following that last post, but lets troubleshoot this in the following order.

#1 - We need to verify that you're able to receive the passive results for these services. Turn off freshness checking and active checks for this service and see if the correct results come in.

#2 - After verifying that #1 is working correctly, I would suggest manually testing the active check (that will be run if the freshness_threshold is exceeded) from the command-line, to see if it gives the results you're expecting.

Re: Freshness Checks

Posted: Tue Nov 08, 2011 2:16 am
by SDohmen
Seeing as my description wasn't clear enough, its perhaps better if i explain it more detailed.

Our main central is the NagiosXI install. Our passive senders are Core installations.

The communication between the 2 work fine (see the NRDP post about that).

Now my co-workers want to have a freshness check enabled so we can spot when a Core installation decides to go down or not send data anymore. I have added the freshness checks to both the host and service templates according to the link you posted before. So far all goes well and works fine.

Now when i decide to test this and kill one of the Core installations it does the following. As soon as i bring the Core installation down no more results are send to the Central. After 15min when the freshness checks are supposed to kick in the following happens. For the hosts i get the host is stale results so they work fine. For the service checks its different, instead of getting service is stale results i get results that would indicate that the service is not responding on the Core installation end. It looks like somehow the Core installation is sending results even though its kinda impossible when the Core install is completely shut down.

Apart from the services giving critical results for ping availability (as example) the other weird thing is that the Core install services itself all have ok results and are working. I will add a picture of this later on today. That might be more clear then.

Re: Freshness Checks

Posted: Tue Nov 08, 2011 2:37 am
by SDohmen
Ok here are the detailed screens for the host and service screens to show the problems with the freshness checks.
Host.jpg
service.JPG
As you can see on the first screen the host freshness checks work fine. On the second screen you can see the results from the service freshness checks. Also on the same screen you can see the core installation who still having services stated as ok even though the host itself is down.

Re: Freshness Checks

Posted: Tue Nov 08, 2011 11:09 am
by mguthrie
For your services, I'd suggest defining the following as your check command:

check command = check_dummy

$ARG1$ 2
$ARG2$ "Results are stale"

Right now what you have defined for the service you've shown is that Nagios will run the check_nt command and attempt to contact NSClient++ once the results are stale. This is probably not what you want.

Re: Freshness Checks

Posted: Tue Nov 08, 2011 2:32 pm
by SDohmen
That check_nt check screen is from 1 of the hosts. Its not the passive service template where the freshness check is located in.