Inconsistent Nagios Report

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
fran.pastor
Posts: 24
Joined: Tue Nov 22, 2011 3:17 am

Re: Inconsistent Nagios Report

Post by fran.pastor »

slansing wrote:One issue I noticed right away was this:

Code: Select all

check_interval 300.000000
retry_interval 300.000000
You have your check_interval and retry_interval set to 300 minutes as this is how they interpret the numbers, setting them each to 5 for example would mean the host is checked at a 5 minute interval, and then every 5 minutes after that if the state changes it will be checked again three times before generating an alert.

In this fashion it is entirely possible that it detected the state change, but never checked again until 300 minutes later, and it would have had to do this three times before finding that the host was back up and switching to an Ok state.
We need to make checks under one minute and we changed the "interval_length" from 60 to 1, so we have to specify seconds instead of minutes.

This is the event on nagios.log:

Code: Select all

    [nagios@nagios archives]$ cat nagios-02-21-2013-00.log | grep "Watchmouse;Check Hotelopia"
    [1361314800] CURRENT SERVICE STATE: Watchmouse;Check Hotelopia;OK;HARD;1;OK. Check Hotelopia is up
    [1361316008] SERVICE ALERT: Watchmouse;Check Hotelopia;UNKNOWN;HARD;1;UNKNOWN! Problem connecting to Watchmouse. Verify login or connectivity
    [1361316302] SERVICE ALERT: Watchmouse;Check Hotelopia;CRITICAL;SOFT;1;CRITICAL. Check Hotelopia is down (Not matched)
    [1361316602] SERVICE ALERT: Watchmouse;Check Hotelopia;OK;SOFT;2;OK. Check Hotelopia is up
    [nagios@nagios archives]$


we have not specified ​​stalking_options values and only records errors and recoveries, you can observe only had errors at that time, the rest of the day checking every 5 minutes dont have generated an error.

Code: Select all

   [nagios@nagios archives]$ cat ~/var/objects.cache | grep stalking | sort -u
            stalking_options        n
    [nagios@nagios archives]$

At the moment I do not find any logic in this inconsistency.
Soon I will publish in the tracker.
thz for assistance


http://tracker.nagios.org/view.php?id=425
Last edited by slansing on Fri Feb 22, 2013 11:49 am, edited 1 time in total.
Reason: Please do not double post.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Inconsistent Nagios Report

Post by abrist »

IF you set your check_interval to "1", then this is indeed strange. Hopefully some answers will come from tracker. You could also poke around the mailing list for possible solutions: https://lists.sourceforge.net/lists/lis ... gios-users
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
fran.pastor
Posts: 24
Joined: Tue Nov 22, 2011 3:17 am

Re: Inconsistent Nagios Report

Post by fran.pastor »

abrist wrote:IF you set your check_interval to "1", then this is indeed strange. Hopefully some answers will come from tracker. You could also poke around the mailing list for possible solutions: https://lists.sourceforge.net/lists/lis ... gios-users
No abrist, i set check_interval to 300 because we changed interval_length to 1. Therefore we must specify the checks in seconds rather than minutes. We have about 20 or 30 services with an check interval every 30 seconds. If we do not change the interval length from 60 to 1 we don't have the possibility to check below 60 seconds.
thz for support abrist :D
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Inconsistent Nagios Report

Post by abrist »

Ahh. Yes, then this is not right. I tried to reproduce on a few of my core installs to no avail. Are other services experiencing this behavior, or just the host in question?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
fran.pastor
Posts: 24
Joined: Tue Nov 22, 2011 3:17 am

Re: Inconsistent Nagios Report

Post by fran.pastor »

I detected only two services on the same host, but I can not say if it has passed into more services, we now have 550hosts/3250services. I detect because we do a monthly report on our business processes and the availability report for this mounth was not real.
Curiously, on Nagios log level or alert history or perfdata/pnp4nagios everything is normal, the anomaly is only for Trends report(trends.cgi) and Availability Report(avail.cgi)
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Inconsistent Nagios Report

Post by abrist »

I know you just tripped across the potential bug, but have you seen the behavior on any other hosts or found any way to reproduce it?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
fran.pastor
Posts: 24
Joined: Tue Nov 22, 2011 3:17 am

Re: Inconsistent Nagios Report

Post by fran.pastor »

I've only detected in two services on the same host. I can reproduce if I ask for a report of that date.
I do not use reporting for anything (only for these hostgroup called Business_Process_Report). I've been doing random availability report and don't found any more.
thz for assistance


This is a trace of config for one of services for this report problem

Code: Select all

[nagios@nagios etc]$ tree 
.
|-- cgi.cfg
|-- htpasswd.users
|-- nagios.cfg
|-- nsca.cfg
|-- objects
|   |-- commands.cfg
|   |-- contactgroup.cfg
|   |-- contacts.cfg
|   |-- hostgroup.cfg
|   |-- hosts.cfg
|   |-- servicegroups.cfg
|   |-- services.cfg
|   |-- templates.cfg
|   `-- timeperiods.cfg
`-- resource.cfg
nagios.cfg

Code: Select all

##############################################################################
#
# NAGIOS.CFG - Sample Main Config File for Nagios 3.3.1
#
##############################################################################

log_file=/usr/local/nagios/var/nagios.log
cfg_dir=/usr/local/nagios/etc/objects
object_cache_file=/usr/local/nagios/var/objects.cache
precached_object_file=/usr/local/nagios/var/objects.precache
resource_file=/usr/local/nagios/etc/resource.cfg
status_file=/usr/local/nagios/var/status.dat
status_update_interval=10
nagios_user=nagios
nagios_group=nagios
check_external_commands=1
command_check_interval=-1
command_file=/usr/local/nagios/var/rw/nagios.cmd
external_command_buffer_slots=4096
lock_file=/usr/local/nagios/var/nagios.lock
temp_file=/usr/local/nagios/var/nagios.tmp
temp_path=/tmp
event_broker_options=-1
broker_module=/usr/local/nagios/pnp4nagios/lib/npcdmod.o config_file=/usr/local/nagios/pnp4nagios/etc/npcd.cfg
broker_module=/usr/local/nagios/mk-livestatus/lib/mk-livestatus/livestatus.o /usr/local/nagios/var/rw/live 
log_rotation_method=d
log_archive_path=/usr/local/nagios/var/archives
use_syslog=0
log_notifications=1
log_service_retries=1
log_host_retries=1
log_event_handlers=1
log_initial_states=0
log_external_commands=1
log_passive_checks=1
service_inter_check_delay_method=s
max_service_check_spread=60
service_interleave_factor=s
host_inter_check_delay_method=s
max_host_check_spread=60
max_concurrent_checks=0
check_result_reaper_frequency=3 ;10
max_check_result_reaper_time=10 ;30
check_result_path=/usr/local/nagios/var/spool/checkresults
max_check_result_file_age=3600
cached_host_check_horizon=15
cached_service_check_horizon=15
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
soft_state_dependencies=0
auto_reschedule_checks=0
auto_rescheduling_interval=30
auto_rescheduling_window=180
sleep_time=0.25
service_check_timeout=60
host_check_timeout=30
event_handler_timeout=120
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=5
retain_state_information=1
state_retention_file=/usr/local/nagios/var/retention.dat
retention_update_interval=60
use_retained_program_state=1
use_retained_scheduling_info=0
retained_host_attribute_mask=0
retained_service_attribute_mask=0
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0
interval_length=1
check_for_updates=1
bare_update_check=0
use_aggressive_host_checking=0
execute_service_checks=1
accept_passive_service_checks=1
execute_host_checks=1
accept_passive_host_checks=0
enable_notifications=1
enable_event_handlers=1
process_performance_data=1
obsess_over_services=0
obsess_over_hosts=0
translate_passive_host_checks=0
passive_host_checks_are_soft=0
check_for_orphaned_services=1
check_for_orphaned_hosts=1
check_service_freshness=1
service_freshness_check_interval=60
check_host_freshness=1
host_freshness_check_interval=60
additional_freshness_latency=15
enable_flap_detection=1
low_service_flap_threshold=5.0
high_service_flap_threshold=20.0
low_host_flap_threshold=5.0
high_host_flap_threshold=20.0
date_format=euro
enable_embedded_perl=0
use_embedded_perl_implicitly=0
illegal_object_name_chars=`~!$%^&*|'"<>?,=
illegal_macro_output_chars=`~$&|'"<>
use_regexp_matching=0
use_true_regexp_matching=0
[email protected]
admin_pager=pagenagios@localhost
daemon_dumps_core=0
use_large_installation_tweaks=1
enable_environment_macros=1
#debug_level=-1
#debug_verbosity=1
#debug_file=/usr/local/nagios/var/nagios.debug
#max_debug_file_size=1000000
FOR HOST

Code: Select all

hosts.cfg
define host {
        use             datacenter-tic,datacenter-exten1-tic
        host_name       Watchmouse
        alias           Watchmouse
        address         api.watchmouse.com
        parents         FW-TIC
        }


templates.cfg
define host {
        name                            generic-host-datacenter
        check_command                   check_fping
        active_checks_enabled           1
        passive_checks_enabled          1
        check_period                    24x7
        obsess_over_host                0
        event_handler_enabled           1
        flap_detection_enabled          1
        flap_detection_options          o,d,u
        failure_prediction_enabled      1
        retain_status_information       1
        retain_nonstatus_information    1
        notification_period             24x7
        notifications_enabled           1
        #stalking_options               d,u
        register                        0
        icon_image                      rack_linux.png
        statusmap_image                 rack_linux.png
}

define host {
        name                            generic-host-datacenter-exten1
        max_check_attempts              3
        check_interval                  300
        retry_interval                  120
        notification_interval           0
        notification_options            d,r,u
        register                        0
}

define host{
        use                             generic-host-datacenter
        name                            datacenter-tic
        hostgroups                      Datacenter-TIC
        contact_groups                  datacenter-administrators-tic
        register                        0
        }

define host{
        use                             generic-host-datacenter-exten1
        name                            datacenter-exten1-tic
        register                        0
}
FOR SERVICE

Code: Select all

services.cfg 
 define service {
        use                     datacenter-tic,datacenter-exten1-tic
        host_name               Watchmouse
        service_description     Check Hotelopia
        check_command           check_watchmouse!Check Hotelopia!
        retry_interval          300
        }

templates.cfg
define service{
        name                   		generic-service-datacenter
        active_checks_enabled   	1
        passive_checks_enabled  	1
        check_period            	24x7
        flap_detection_enabled  	1
        flap_detection_options  	o,w,c,u
        failure_prediction_enabled      1
        retain_status_information       1
        retain_nonstatus_information    1
        notification_period     	24x7
        notifications_enabled   	1
        parallelize_check       	1
        #stalking_options        	w,c
        register                	0
}

define service{
        name                    generic-service-datacenter-exten1
        max_check_attempts      3
        check_interval          300
        retry_interval          120
        notification_interval   0
        notification_options    w,u,c,r,s
        register                0
        }

define service{
        use                     generic-service-datacenter
        name                    datacenter-tic
        contact_groups          datacenter-administrators-tic
        register                0
        }

define service{
        use                     generic-service-datacenter-exten1
        name                    datacenter-exten1-tic
        register                0
        }


object.cache resultant

Code: Select all

define host {
        host_name       Watchmouse
        alias   Watchmouse
        address api.watchmouse.com
        parents FW-TIC
        check_period    24x7
        check_command   check_fping
        contact_groups  datacenter-administrators-tic
        notification_period     24x7
        initial_state   o
        check_interval  300.000000
        retry_interval  120.000000
        max_check_attempts      3
        active_checks_enabled   1
        passive_checks_enabled  1
        obsess_over_host        0
        event_handler_enabled   1
        low_flap_threshold      0.000000
        high_flap_threshold     0.000000
        flap_detection_enabled  1
        flap_detection_options  o,d,u
        freshness_threshold     0
        check_freshness 0
        notification_options    d,u,r
        notifications_enabled   1
        notification_interval   0.000000
        first_notification_delay        0.000000
        stalking_options        n
        process_perf_data       1
        failure_prediction_enabled      1
        icon_image      rack_linux.png
        statusmap_image rack_linux.png
        retain_status_information       1
        retain_nonstatus_information    1
        }

define service {
        host_name       Watchmouse
        service_description     Check Hotelopia
        check_period    24x7
        check_command   check_watchmouse!Check Hotelopia!
        contact_groups  datacenter-administrators-tic
        notification_period     24x7
        initial_state   o
        check_interval  300.000000
        retry_interval  300.000000
        max_check_attempts      3
        is_volatile     0
        parallelize_check       1
        active_checks_enabled   1
        passive_checks_enabled  1
        obsess_over_service     1
        event_handler_enabled   1
        low_flap_threshold      0.000000
        high_flap_threshold     0.000000
        flap_detection_enabled  1
        flap_detection_options  o,w,u,c
        freshness_threshold     0
        check_freshness 0
        notification_options    u,w,c,r,s
        notifications_enabled   1
        notification_interval   0.000000
        first_notification_delay        0.000000
        stalking_options        n
        process_perf_data       1
        failure_prediction_enabled      1
        retain_status_information       1
        retain_nonstatus_information    1
        }
Last edited by slansing on Tue Feb 26, 2013 1:54 pm, edited 1 time in total.
Reason: Please combine posts if there is not another user's post separating them.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Inconsistent Nagios Report

Post by slansing »

Please let us know if you see this again, were you able to file a report for the potential bug?
fran.pastor
Posts: 24
Joined: Tue Nov 22, 2011 3:17 am

Re: Inconsistent Nagios Report

Post by fran.pastor »

slansing wrote:Please let us know if you see this again, were you able to file a report for the potential bug?
ok slansing, this night we have a massive network outage and we check if this situation pass again.
we have created a case in nagios tracker but i don't have any response. http://tracker.nagios.org/view.php?id=425
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Inconsistent Nagios Report

Post by slansing »

You may not get a reply on your tracker, if a developer determines it indeed was a bug and they can reproduce it they will add it to their work list. If this is indeed a bug it may be fixed for the next release, or a release down the road depending on it's severity. You may see that it was fixed in a change log.
Locked