Page 1 of 1

Current attempt for hosts stays on 1

Posted: Thu Jun 11, 2015 6:12 am
by WillemDH
Hello,

Just noticed some strange behaviour. In open host problems, I see four hosts down, which is not particularly special, but they each are at attempt one of three. When I check these host's services, they seem normal. So I retrieved the current config of one of those hosts:

Code: Select all

 cat /var/nagiosramdisk/objects.cache | sed -rn "/define host \{/{:a;N;/}/{/.host_name.srv2012/p;d};ba}"
define host {
        host_name       srv2012test.gentgrp.gent.be
        alias   srv2012test.gentgrp.gent.be
        address xx.xx.xx.xx
        check_period    xi_timeperiod_24x7
        check_command   check_xi_host_ping!3000.0!20%!5000.0!80%!!!!
        contacts        willem.dhaese,nagiosadmin,nagiosadmin
        contact_groups  cg_dummy
        notification_period     xi_timeperiod_24x7
        initial_state   o
        importance      0
        check_interval  5.000000
        retry_interval  1.000000
        max_check_attempts      3
        active_checks_enabled   1
        passive_checks_enabled  1
        obsess  1
        event_handler_enabled   1
        low_flap_threshold      0.000000
        high_flap_threshold     0.000000
        flap_detection_enabled  1
        flap_detection_options  a
        freshness_threshold     0
        check_freshness 0
        notification_options    a
        notifications_enabled   1
        notification_interval   1440.000000
        first_notification_delay        0.000000
        stalking_options        n
        process_perf_data       1
        icon_image      win_server.png
        statusmap_image win_server.png
        retain_status_information       1
        retain_nonstatus_information    1
        _XIWIZARD       windowsserver
        }
Please check the attached screenshot. I did receive an email and after checking the state history, it seems like it is doing two check resulting in a soft state and then after the final third check it goes to hard state. THis does seem normal, but the open host problems (and all) does not seem to reflect correctly at which attempt the host is currently? Thoughts?

Grtz

Willem

Re: Current attempt for hosts stays on 1

Posted: Thu Jun 11, 2015 4:42 pm
by tmcdonald
Does Core reflect the same information or is XI the odd one out?

Re: Current attempt for hosts stays on 1

Posted: Fri Jun 12, 2015 1:53 am
by WillemDH
Apparently core does indeed also reflect the same situation. See screenshot.

Re: Current attempt for hosts stays on 1

Posted: Fri Jun 12, 2015 1:54 pm
by tmcdonald
Just to dig a bit deeper down the chain, can you grep out that host's status from status.dat? I want to ensure that this is not just a display issue.

At some point we might want to look at a full profile and turn on some debugging, but let's save that for if we move to a ticket.

Re: Current attempt for hosts stays on 1

Posted: Tue Jun 16, 2015 2:25 am
by WillemDH
Trevor,

Seems the status.dat also show the host check is only at 1/3

Code: Select all

grep -A 45 "host_name=srv2012test" /var/nagiosramdisk/status.dat
        host_name=srv2012test
        modified_attributes=0
        check_command=check_xi_host_ping!3000.0!20%!5000.0!80%!!!!
        check_period=xi_timeperiod_24x7
        notification_period=xi_timeperiod_24x7
        check_interval=5.000000
        retry_interval=1.000000
        event_handler=
        has_been_checked=1
        should_be_scheduled=1
        check_execution_time=10.001
        check_latency=0.000
        check_type=0
        current_state=1
        last_hard_state=1
        last_event_id=544989
        current_event_id=544993
        current_problem_id=250134
        last_problem_id=216162
        plugin_output=CRITICAL - 10.54.26.13: rta nan, lost 100%
        long_plugin_output=
        performance_data=rta=0.000ms;3000.000;5000.000;0; pl=100%;20;80;; rtmax=0.000ms;;;; rtmin=0.000ms;;;;
        last_check=1434439278
        next_check=1434439588
        check_options=0
        current_attempt=1
        max_attempts=3
        state_type=1
        last_state_change=1434020587
        last_hard_state_change=1434020587
        last_time_up=1434020512
        last_time_down=1434439288
        last_time_unreachable=0
        last_notification=1434366744
        next_notification=1434453144
        no_more_notifications=0
        current_notification_number=5
        current_notification_id=323123
        notifications_enabled=1
        problem_has_been_acknowledged=0
        acknowledgement_type=0
        active_checks_enabled=1
        passive_checks_enabled=1
        event_handler_enabled=1
        flap_detection_enabled=1
        process_performance_data=1
Let me know if I need to create a support ticket for further analysis.

Grtz

Willem

Re: Current attempt for hosts stays on 1

Posted: Tue Jun 16, 2015 3:23 pm
by tmcdonald
Is it just these hosts that show the issue? I would imagine so but I am wondering if there is anything special about them that we can focus on.

And this is Core 4.0.8, right?

Re: Current attempt for hosts stays on 1

Posted: Tue Jun 16, 2015 3:42 pm
by WillemDH
Hey Trevor,

No it's all the hosts in open host problems. The one I grepped was just an example.

I'm on R2.7 so I guess that's indeed Core 4.0.8

Grtz

Willem

Re: Current attempt for hosts stays on 1

Posted: Wed Jun 17, 2015 11:22 am
by tmcdonald
Miiiight want to move this to a ticket. Could you send in your profile to [email protected] and link back to this thread? I feel like a remote with some debugging is in order.

Re: Current attempt for hosts stays on 1

Posted: Tue Nov 03, 2015 8:11 am
by WillemDH
This issue had been solved in XI 5. Please close.