Page 1 of 2

No Blocking Outage Alerts for Parents

Posted: Tue Feb 26, 2013 3:49 pm
by kurt2439
I am not getting alerts from my "blocking outage" hosts, ie the ones that
the other servers depend on and are setup as parents in nagios hosts file. When all
the services and hosts are down, they register as down in the Web front end,
but when I click on the hosts they are only in a SOFT state (1/3) and each
time the 'next scheduled active check' time comes and goes it remains at
SOFT state (1/3). So I never actually get any alerts for them.

I tried just changing the IP addresses of those blocking outage parent hosts so
that they would not be online and I did get alerts as expected from Nagios,
but when there are many services and hosts down it seems to get stuck.
Probably the situation is more complicated than this but that is all I have
boiled it down to so far.

I can't make sense of this unless there is some bug. Reloading the nagios
service while it is stuck does not fix the issue (and wouldn't be a solution anyways).
If I force a host check, it does move the state to SOFT 2/3 however it then just
sits at SOFT 2/3 through each of the next scheduled active checks.

Thoughts?

Re: No Blocking Outage Alerts for Parents

Posted: Tue Feb 26, 2013 4:31 pm
by slansing
Can you choose one of the hosts as an example and show us your configuration definition for it?

Re: No Blocking Outage Alerts for Parents

Posted: Wed Feb 27, 2013 2:07 pm
by kurt2439
Thanks, here is the host soekris1. One of the parents that is not generating alerts and getting stuck at 1/3 SOFT state. The parent "apple" was online when I was not receiving alerts

define host{
use generic-host
host_name soekris1
alias meganet-firewall1
hostgroups Meganet
address x.x.x.x
statusmap_image firewall.gd2
parents apple
}

define host{
name generic-host
notifications_enabled 1
check_interval 5
max_check_attempts 3
retry_interval 1
notification_interval 2
notification_period 24x7
notification_options d,r
contact_groups admins_aim,admins_email
event_handler_enabled 1
flap_detection_enabled 1
failure_prediction_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
check_command check-host-alive
register 0
}

define hostgroup{
hostgroup_name Meganet
alias Meganet Colocation Servers
}

define hostescalation{
hostgroup_name Meganet
first_notification 2
last_notification 2
notification_interval 1440
contact_groups admins_pager
}

define hostescalation{
hostgroup_name Meganet
first_notification 3
last_notification 0
notification_interval 1440
contact_groups admins_email
}

Re: No Blocking Outage Alerts for Parents

Posted: Thu Feb 28, 2013 5:51 pm
by scottwilkerson
Was apple's parent in a UP state? All the way up the line?

What state in marked as DOWN or UNREACHABLE ?

Re: No Blocking Outage Alerts for Parents

Posted: Fri Mar 08, 2013 11:46 am
by kurt2439
Apple was UP, yes.

The hosts that I am worried about, soekris1 and shiva2 were in a DOWN state

Re: No Blocking Outage Alerts for Parents

Posted: Fri Mar 08, 2013 12:56 pm
by abrist
hmmm. Can we see the configuration for the host "Apple"?

Re: No Blocking Outage Alerts for Parents

Posted: Wed Mar 13, 2013 10:11 am
by kurt2439
Note: the soekris1-sham and soekris2-sham are at a different network location -- so network checks in view of dependency go from nagios server -> switch -> firewalls -> apple internet host -> firewalls at remote location (which are not running their checks as described above).

define host{
use generic-host
host_name apple
alias Apple Server (Internet)
check_command check-host-alive
address http://www.apple.com
parents soekris1-sham,soekris2-shamm
}

define host{
use sham-generic-host
host_name soekris1-sham
alias Soekris Backup Firewall
address 192.168.1.26
statusmap_image firewall.gd2
parents sp_int_sw1
2d_coords 200,80
}

define host{
use sham-generic-host
host_name soekris2-sham
alias Soekris Firewall
address 192.168.1.25
statusmap_image firewall.gd2
parents sp_int_sw1
2d_coords 120,80
}

define host{
use sham-generic-host
host_name sp_int_sw1
alias SP Internal SW1 - 48 Port
address 192.168.1.4
statusmap_image router.gd2
# 2d_coords 0,160
}

define host{
name sham-generic-host
notifications_enabled 1
check_interval 5
max_check_attempts 3
retry_interval 1
notification_interval 12
notification_period 24x7
notification_options d,r
contact_groups sham_admins_aim,sham_admins_email
event_handler_enabled 1
flap_detection_enabled 1
failure_prediction_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
check_command check-host-alive
register 0
}

Re: No Blocking Outage Alerts for Parents

Posted: Wed Mar 13, 2013 10:40 am
by abrist
So the network order is:

Nagios --> sp_int_sw1 --> soekris1-sham,soekris2-shamm --> apple --> soekris1

Is this correct?

Re: No Blocking Outage Alerts for Parents

Posted: Wed Mar 13, 2013 11:05 am
by kurt2439
soekris1/shiva2 at the end of the chain, but yes, correct.

Re: No Blocking Outage Alerts for Parents

Posted: Wed Mar 13, 2013 11:24 am
by abrist
Alright, back to your initial question:
Is the problem that you are not receiving alerts from "soekris" when it is down, but ""apple" was up? But if "apple" is down, you receive alerts correctly?