Page 2 of 2
Re: Inconsistent Nagios Report
Posted: Fri Feb 22, 2013 2:34 am
by fran.pastor
slansing wrote:One issue I noticed right away was this:
Code: Select all
check_interval 300.000000
retry_interval 300.000000
You have your check_interval and retry_interval set to 300 minutes as this is how they interpret the numbers, setting them each to 5 for example would mean the host is checked at a 5 minute interval, and then every 5 minutes after that if the state changes it will be checked again three times before generating an alert.
In this fashion it is entirely possible that it detected the state change, but never checked again until 300 minutes later, and it would have had to do this three times before finding that the host was back up and switching to an Ok state.
We need to make checks under one minute and we changed the "interval_length" from 60 to 1, so we have to specify seconds instead of minutes.
This is the event on nagios.log:
Code: Select all
[nagios@nagios archives]$ cat nagios-02-21-2013-00.log | grep "Watchmouse;Check Hotelopia"
[1361314800] CURRENT SERVICE STATE: Watchmouse;Check Hotelopia;OK;HARD;1;OK. Check Hotelopia is up
[1361316008] SERVICE ALERT: Watchmouse;Check Hotelopia;UNKNOWN;HARD;1;UNKNOWN! Problem connecting to Watchmouse. Verify login or connectivity
[1361316302] SERVICE ALERT: Watchmouse;Check Hotelopia;CRITICAL;SOFT;1;CRITICAL. Check Hotelopia is down (Not matched)
[1361316602] SERVICE ALERT: Watchmouse;Check Hotelopia;OK;SOFT;2;OK. Check Hotelopia is up
[nagios@nagios archives]$
we have not specified stalking_options values and only records errors and recoveries, you can observe only had errors at that time, the rest of the day checking every 5 minutes dont have generated an error.
Code: Select all
[nagios@nagios archives]$ cat ~/var/objects.cache | grep stalking | sort -u
stalking_options n
[nagios@nagios archives]$
At the moment I do not find any logic in this inconsistency.
Soon I will publish in the tracker.
thz for assistance
http://tracker.nagios.org/view.php?id=425
Re: Inconsistent Nagios Report
Posted: Fri Feb 22, 2013 3:26 pm
by abrist
IF you set your check_interval to "1", then this is indeed strange. Hopefully some answers will come from tracker. You could also poke around the mailing list for possible solutions:
https://lists.sourceforge.net/lists/lis ... gios-users
Re: Inconsistent Nagios Report
Posted: Mon Feb 25, 2013 3:30 am
by fran.pastor
No abrist, i set check_interval to 300 because we changed interval_length to 1. Therefore we must specify the checks in seconds rather than minutes. We have about 20 or 30 services with an check interval every 30 seconds. If we do not change the interval length from 60 to 1 we don't have the possibility to check below 60 seconds.
thz for support abrist

Re: Inconsistent Nagios Report
Posted: Mon Feb 25, 2013 10:24 am
by abrist
Ahh. Yes, then this is not right. I tried to reproduce on a few of my core installs to no avail. Are other services experiencing this behavior, or just the host in question?
Re: Inconsistent Nagios Report
Posted: Tue Feb 26, 2013 3:10 am
by fran.pastor
I detected only two services on the same host, but I can not say if it has passed into more services, we now have 550hosts/3250services. I detect because we do a monthly report on our business processes and the availability report for this mounth was not real.
Curiously, on Nagios log level or alert history or perfdata/pnp4nagios everything is normal, the anomaly is only for Trends report(trends.cgi) and Availability Report(avail.cgi)
Re: Inconsistent Nagios Report
Posted: Tue Feb 26, 2013 12:09 pm
by abrist
I know you just tripped across the potential bug, but have you seen the behavior on any other hosts or found any way to reproduce it?
Re: Inconsistent Nagios Report
Posted: Tue Feb 26, 2013 12:44 pm
by fran.pastor
I've only detected in two services on the same host. I can reproduce if I ask for a report of that date.
I do not use reporting for anything (only for these hostgroup called Business_Process_Report). I've been doing random availability report and don't found any more.
thz for assistance
This is a trace of config for one of services for this report problem
Code: Select all
[nagios@nagios etc]$ tree
.
|-- cgi.cfg
|-- htpasswd.users
|-- nagios.cfg
|-- nsca.cfg
|-- objects
| |-- commands.cfg
| |-- contactgroup.cfg
| |-- contacts.cfg
| |-- hostgroup.cfg
| |-- hosts.cfg
| |-- servicegroups.cfg
| |-- services.cfg
| |-- templates.cfg
| `-- timeperiods.cfg
`-- resource.cfg
nagios.cfg
Code: Select all
##############################################################################
#
# NAGIOS.CFG - Sample Main Config File for Nagios 3.3.1
#
##############################################################################
log_file=/usr/local/nagios/var/nagios.log
cfg_dir=/usr/local/nagios/etc/objects
object_cache_file=/usr/local/nagios/var/objects.cache
precached_object_file=/usr/local/nagios/var/objects.precache
resource_file=/usr/local/nagios/etc/resource.cfg
status_file=/usr/local/nagios/var/status.dat
status_update_interval=10
nagios_user=nagios
nagios_group=nagios
check_external_commands=1
command_check_interval=-1
command_file=/usr/local/nagios/var/rw/nagios.cmd
external_command_buffer_slots=4096
lock_file=/usr/local/nagios/var/nagios.lock
temp_file=/usr/local/nagios/var/nagios.tmp
temp_path=/tmp
event_broker_options=-1
broker_module=/usr/local/nagios/pnp4nagios/lib/npcdmod.o config_file=/usr/local/nagios/pnp4nagios/etc/npcd.cfg
broker_module=/usr/local/nagios/mk-livestatus/lib/mk-livestatus/livestatus.o /usr/local/nagios/var/rw/live
log_rotation_method=d
log_archive_path=/usr/local/nagios/var/archives
use_syslog=0
log_notifications=1
log_service_retries=1
log_host_retries=1
log_event_handlers=1
log_initial_states=0
log_external_commands=1
log_passive_checks=1
service_inter_check_delay_method=s
max_service_check_spread=60
service_interleave_factor=s
host_inter_check_delay_method=s
max_host_check_spread=60
max_concurrent_checks=0
check_result_reaper_frequency=3 ;10
max_check_result_reaper_time=10 ;30
check_result_path=/usr/local/nagios/var/spool/checkresults
max_check_result_file_age=3600
cached_host_check_horizon=15
cached_service_check_horizon=15
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
soft_state_dependencies=0
auto_reschedule_checks=0
auto_rescheduling_interval=30
auto_rescheduling_window=180
sleep_time=0.25
service_check_timeout=60
host_check_timeout=30
event_handler_timeout=120
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=5
retain_state_information=1
state_retention_file=/usr/local/nagios/var/retention.dat
retention_update_interval=60
use_retained_program_state=1
use_retained_scheduling_info=0
retained_host_attribute_mask=0
retained_service_attribute_mask=0
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0
interval_length=1
check_for_updates=1
bare_update_check=0
use_aggressive_host_checking=0
execute_service_checks=1
accept_passive_service_checks=1
execute_host_checks=1
accept_passive_host_checks=0
enable_notifications=1
enable_event_handlers=1
process_performance_data=1
obsess_over_services=0
obsess_over_hosts=0
translate_passive_host_checks=0
passive_host_checks_are_soft=0
check_for_orphaned_services=1
check_for_orphaned_hosts=1
check_service_freshness=1
service_freshness_check_interval=60
check_host_freshness=1
host_freshness_check_interval=60
additional_freshness_latency=15
enable_flap_detection=1
low_service_flap_threshold=5.0
high_service_flap_threshold=20.0
low_host_flap_threshold=5.0
high_host_flap_threshold=20.0
date_format=euro
enable_embedded_perl=0
use_embedded_perl_implicitly=0
illegal_object_name_chars=`~!$%^&*|'"<>?,=
illegal_macro_output_chars=`~$&|'"<>
use_regexp_matching=0
use_true_regexp_matching=0
[email protected]
admin_pager=pagenagios@localhost
daemon_dumps_core=0
use_large_installation_tweaks=1
enable_environment_macros=1
#debug_level=-1
#debug_verbosity=1
#debug_file=/usr/local/nagios/var/nagios.debug
#max_debug_file_size=1000000
FOR HOST
Code: Select all
hosts.cfg
define host {
use datacenter-tic,datacenter-exten1-tic
host_name Watchmouse
alias Watchmouse
address api.watchmouse.com
parents FW-TIC
}
templates.cfg
define host {
name generic-host-datacenter
check_command check_fping
active_checks_enabled 1
passive_checks_enabled 1
check_period 24x7
obsess_over_host 0
event_handler_enabled 1
flap_detection_enabled 1
flap_detection_options o,d,u
failure_prediction_enabled 1
retain_status_information 1
retain_nonstatus_information 1
notification_period 24x7
notifications_enabled 1
#stalking_options d,u
register 0
icon_image rack_linux.png
statusmap_image rack_linux.png
}
define host {
name generic-host-datacenter-exten1
max_check_attempts 3
check_interval 300
retry_interval 120
notification_interval 0
notification_options d,r,u
register 0
}
define host{
use generic-host-datacenter
name datacenter-tic
hostgroups Datacenter-TIC
contact_groups datacenter-administrators-tic
register 0
}
define host{
use generic-host-datacenter-exten1
name datacenter-exten1-tic
register 0
}
FOR SERVICE
Code: Select all
services.cfg
define service {
use datacenter-tic,datacenter-exten1-tic
host_name Watchmouse
service_description Check Hotelopia
check_command check_watchmouse!Check Hotelopia!
retry_interval 300
}
templates.cfg
define service{
name generic-service-datacenter
active_checks_enabled 1
passive_checks_enabled 1
check_period 24x7
flap_detection_enabled 1
flap_detection_options o,w,c,u
failure_prediction_enabled 1
retain_status_information 1
retain_nonstatus_information 1
notification_period 24x7
notifications_enabled 1
parallelize_check 1
#stalking_options w,c
register 0
}
define service{
name generic-service-datacenter-exten1
max_check_attempts 3
check_interval 300
retry_interval 120
notification_interval 0
notification_options w,u,c,r,s
register 0
}
define service{
use generic-service-datacenter
name datacenter-tic
contact_groups datacenter-administrators-tic
register 0
}
define service{
use generic-service-datacenter-exten1
name datacenter-exten1-tic
register 0
}
object.cache resultant
Code: Select all
define host {
host_name Watchmouse
alias Watchmouse
address api.watchmouse.com
parents FW-TIC
check_period 24x7
check_command check_fping
contact_groups datacenter-administrators-tic
notification_period 24x7
initial_state o
check_interval 300.000000
retry_interval 120.000000
max_check_attempts 3
active_checks_enabled 1
passive_checks_enabled 1
obsess_over_host 0
event_handler_enabled 1
low_flap_threshold 0.000000
high_flap_threshold 0.000000
flap_detection_enabled 1
flap_detection_options o,d,u
freshness_threshold 0
check_freshness 0
notification_options d,u,r
notifications_enabled 1
notification_interval 0.000000
first_notification_delay 0.000000
stalking_options n
process_perf_data 1
failure_prediction_enabled 1
icon_image rack_linux.png
statusmap_image rack_linux.png
retain_status_information 1
retain_nonstatus_information 1
}
define service {
host_name Watchmouse
service_description Check Hotelopia
check_period 24x7
check_command check_watchmouse!Check Hotelopia!
contact_groups datacenter-administrators-tic
notification_period 24x7
initial_state o
check_interval 300.000000
retry_interval 300.000000
max_check_attempts 3
is_volatile 0
parallelize_check 1
active_checks_enabled 1
passive_checks_enabled 1
obsess_over_service 1
event_handler_enabled 1
low_flap_threshold 0.000000
high_flap_threshold 0.000000
flap_detection_enabled 1
flap_detection_options o,w,u,c
freshness_threshold 0
check_freshness 0
notification_options u,w,c,r,s
notifications_enabled 1
notification_interval 0.000000
first_notification_delay 0.000000
stalking_options n
process_perf_data 1
failure_prediction_enabled 1
retain_status_information 1
retain_nonstatus_information 1
}
Re: Inconsistent Nagios Report
Posted: Tue Feb 26, 2013 5:47 pm
by slansing
Please let us know if you see this again, were you able to file a report for the potential bug?
Re: Inconsistent Nagios Report
Posted: Wed Feb 27, 2013 2:36 am
by fran.pastor
slansing wrote:Please let us know if you see this again, were you able to file a report for the potential bug?
ok slansing, this night we have a massive network outage and we check if this situation pass again.
we have created a case in nagios tracker but i don't have any response.
http://tracker.nagios.org/view.php?id=425
Re: Inconsistent Nagios Report
Posted: Wed Feb 27, 2013 11:53 am
by slansing
You may not get a reply on your tracker, if a developer determines it indeed was a bug and they can reproduce it they will add it to their work list. If this is indeed a bug it may be fixed for the next release, or a release down the road depending on it's severity. You may see that it was fixed in a change log.