No Blocking Outage Alerts for Parents

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
kurt2439
Posts: 9
Joined: Wed Jun 22, 2011 11:36 am

No Blocking Outage Alerts for Parents

Post by kurt2439 »

I am not getting alerts from my "blocking outage" hosts, ie the ones that
the other servers depend on and are setup as parents in nagios hosts file. When all
the services and hosts are down, they register as down in the Web front end,
but when I click on the hosts they are only in a SOFT state (1/3) and each
time the 'next scheduled active check' time comes and goes it remains at
SOFT state (1/3). So I never actually get any alerts for them.

I tried just changing the IP addresses of those blocking outage parent hosts so
that they would not be online and I did get alerts as expected from Nagios,
but when there are many services and hosts down it seems to get stuck.
Probably the situation is more complicated than this but that is all I have
boiled it down to so far.

I can't make sense of this unless there is some bug. Reloading the nagios
service while it is stuck does not fix the issue (and wouldn't be a solution anyways).
If I force a host check, it does move the state to SOFT 2/3 however it then just
sits at SOFT 2/3 through each of the next scheduled active checks.

Thoughts?
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: No Blocking Outage Alerts for Parents

Post by slansing »

Can you choose one of the hosts as an example and show us your configuration definition for it?
kurt2439
Posts: 9
Joined: Wed Jun 22, 2011 11:36 am

Re: No Blocking Outage Alerts for Parents

Post by kurt2439 »

Thanks, here is the host soekris1. One of the parents that is not generating alerts and getting stuck at 1/3 SOFT state. The parent "apple" was online when I was not receiving alerts

define host{
use generic-host
host_name soekris1
alias meganet-firewall1
hostgroups Meganet
address x.x.x.x
statusmap_image firewall.gd2
parents apple
}

define host{
name generic-host
notifications_enabled 1
check_interval 5
max_check_attempts 3
retry_interval 1
notification_interval 2
notification_period 24x7
notification_options d,r
contact_groups admins_aim,admins_email
event_handler_enabled 1
flap_detection_enabled 1
failure_prediction_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
check_command check-host-alive
register 0
}

define hostgroup{
hostgroup_name Meganet
alias Meganet Colocation Servers
}

define hostescalation{
hostgroup_name Meganet
first_notification 2
last_notification 2
notification_interval 1440
contact_groups admins_pager
}

define hostescalation{
hostgroup_name Meganet
first_notification 3
last_notification 0
notification_interval 1440
contact_groups admins_email
}
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: No Blocking Outage Alerts for Parents

Post by scottwilkerson »

Was apple's parent in a UP state? All the way up the line?

What state in marked as DOWN or UNREACHABLE ?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
kurt2439
Posts: 9
Joined: Wed Jun 22, 2011 11:36 am

Re: No Blocking Outage Alerts for Parents

Post by kurt2439 »

Apple was UP, yes.

The hosts that I am worried about, soekris1 and shiva2 were in a DOWN state
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: No Blocking Outage Alerts for Parents

Post by abrist »

hmmm. Can we see the configuration for the host "Apple"?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
kurt2439
Posts: 9
Joined: Wed Jun 22, 2011 11:36 am

Re: No Blocking Outage Alerts for Parents

Post by kurt2439 »

Note: the soekris1-sham and soekris2-sham are at a different network location -- so network checks in view of dependency go from nagios server -> switch -> firewalls -> apple internet host -> firewalls at remote location (which are not running their checks as described above).

define host{
use generic-host
host_name apple
alias Apple Server (Internet)
check_command check-host-alive
address http://www.apple.com
parents soekris1-sham,soekris2-shamm
}

define host{
use sham-generic-host
host_name soekris1-sham
alias Soekris Backup Firewall
address 192.168.1.26
statusmap_image firewall.gd2
parents sp_int_sw1
2d_coords 200,80
}

define host{
use sham-generic-host
host_name soekris2-sham
alias Soekris Firewall
address 192.168.1.25
statusmap_image firewall.gd2
parents sp_int_sw1
2d_coords 120,80
}

define host{
use sham-generic-host
host_name sp_int_sw1
alias SP Internal SW1 - 48 Port
address 192.168.1.4
statusmap_image router.gd2
# 2d_coords 0,160
}

define host{
name sham-generic-host
notifications_enabled 1
check_interval 5
max_check_attempts 3
retry_interval 1
notification_interval 12
notification_period 24x7
notification_options d,r
contact_groups sham_admins_aim,sham_admins_email
event_handler_enabled 1
flap_detection_enabled 1
failure_prediction_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
check_command check-host-alive
register 0
}
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: No Blocking Outage Alerts for Parents

Post by abrist »

So the network order is:

Nagios --> sp_int_sw1 --> soekris1-sham,soekris2-shamm --> apple --> soekris1

Is this correct?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
kurt2439
Posts: 9
Joined: Wed Jun 22, 2011 11:36 am

Re: No Blocking Outage Alerts for Parents

Post by kurt2439 »

soekris1/shiva2 at the end of the chain, but yes, correct.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: No Blocking Outage Alerts for Parents

Post by abrist »

Alright, back to your initial question:
Is the problem that you are not receiving alerts from "soekris" when it is down, but ""apple" was up? But if "apple" is down, you receive alerts correctly?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Locked