I am not getting alerts from my "blocking outage" hosts, ie the ones that
the other servers depend on and are setup as parents in nagios hosts file. When all
the services and hosts are down, they register as down in the Web front end,
but when I click on the hosts they are only in a SOFT state (1/3) and each
time the 'next scheduled active check' time comes and goes it remains at
SOFT state (1/3). So I never actually get any alerts for them.
I tried just changing the IP addresses of those blocking outage parent hosts so
that they would not be online and I did get alerts as expected from Nagios,
but when there are many services and hosts down it seems to get stuck.
Probably the situation is more complicated than this but that is all I have
boiled it down to so far.
I can't make sense of this unless there is some bug. Reloading the nagios
service while it is stuck does not fix the issue (and wouldn't be a solution anyways).
If I force a host check, it does move the state to SOFT 2/3 however it then just
sits at SOFT 2/3 through each of the next scheduled active checks.
Thoughts?
No Blocking Outage Alerts for Parents
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: No Blocking Outage Alerts for Parents
Can you choose one of the hosts as an example and show us your configuration definition for it?
Re: No Blocking Outage Alerts for Parents
Thanks, here is the host soekris1. One of the parents that is not generating alerts and getting stuck at 1/3 SOFT state. The parent "apple" was online when I was not receiving alerts
define host{
use generic-host
host_name soekris1
alias meganet-firewall1
hostgroups Meganet
address x.x.x.x
statusmap_image firewall.gd2
parents apple
}
define host{
name generic-host
notifications_enabled 1
check_interval 5
max_check_attempts 3
retry_interval 1
notification_interval 2
notification_period 24x7
notification_options d,r
contact_groups admins_aim,admins_email
event_handler_enabled 1
flap_detection_enabled 1
failure_prediction_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
check_command check-host-alive
register 0
}
define hostgroup{
hostgroup_name Meganet
alias Meganet Colocation Servers
}
define hostescalation{
hostgroup_name Meganet
first_notification 2
last_notification 2
notification_interval 1440
contact_groups admins_pager
}
define hostescalation{
hostgroup_name Meganet
first_notification 3
last_notification 0
notification_interval 1440
contact_groups admins_email
}
define host{
use generic-host
host_name soekris1
alias meganet-firewall1
hostgroups Meganet
address x.x.x.x
statusmap_image firewall.gd2
parents apple
}
define host{
name generic-host
notifications_enabled 1
check_interval 5
max_check_attempts 3
retry_interval 1
notification_interval 2
notification_period 24x7
notification_options d,r
contact_groups admins_aim,admins_email
event_handler_enabled 1
flap_detection_enabled 1
failure_prediction_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
check_command check-host-alive
register 0
}
define hostgroup{
hostgroup_name Meganet
alias Meganet Colocation Servers
}
define hostescalation{
hostgroup_name Meganet
first_notification 2
last_notification 2
notification_interval 1440
contact_groups admins_pager
}
define hostescalation{
hostgroup_name Meganet
first_notification 3
last_notification 0
notification_interval 1440
contact_groups admins_email
}
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: No Blocking Outage Alerts for Parents
Was apple's parent in a UP state? All the way up the line?
What state in marked as DOWN or UNREACHABLE ?
What state in marked as DOWN or UNREACHABLE ?
Re: No Blocking Outage Alerts for Parents
Apple was UP, yes.
The hosts that I am worried about, soekris1 and shiva2 were in a DOWN state
The hosts that I am worried about, soekris1 and shiva2 were in a DOWN state
Re: No Blocking Outage Alerts for Parents
hmmm. Can we see the configuration for the host "Apple"?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: No Blocking Outage Alerts for Parents
Note: the soekris1-sham and soekris2-sham are at a different network location -- so network checks in view of dependency go from nagios server -> switch -> firewalls -> apple internet host -> firewalls at remote location (which are not running their checks as described above).
define host{
use generic-host
host_name apple
alias Apple Server (Internet)
check_command check-host-alive
address http://www.apple.com
parents soekris1-sham,soekris2-shamm
}
define host{
use sham-generic-host
host_name soekris1-sham
alias Soekris Backup Firewall
address 192.168.1.26
statusmap_image firewall.gd2
parents sp_int_sw1
2d_coords 200,80
}
define host{
use sham-generic-host
host_name soekris2-sham
alias Soekris Firewall
address 192.168.1.25
statusmap_image firewall.gd2
parents sp_int_sw1
2d_coords 120,80
}
define host{
use sham-generic-host
host_name sp_int_sw1
alias SP Internal SW1 - 48 Port
address 192.168.1.4
statusmap_image router.gd2
# 2d_coords 0,160
}
define host{
name sham-generic-host
notifications_enabled 1
check_interval 5
max_check_attempts 3
retry_interval 1
notification_interval 12
notification_period 24x7
notification_options d,r
contact_groups sham_admins_aim,sham_admins_email
event_handler_enabled 1
flap_detection_enabled 1
failure_prediction_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
check_command check-host-alive
register 0
}
define host{
use generic-host
host_name apple
alias Apple Server (Internet)
check_command check-host-alive
address http://www.apple.com
parents soekris1-sham,soekris2-shamm
}
define host{
use sham-generic-host
host_name soekris1-sham
alias Soekris Backup Firewall
address 192.168.1.26
statusmap_image firewall.gd2
parents sp_int_sw1
2d_coords 200,80
}
define host{
use sham-generic-host
host_name soekris2-sham
alias Soekris Firewall
address 192.168.1.25
statusmap_image firewall.gd2
parents sp_int_sw1
2d_coords 120,80
}
define host{
use sham-generic-host
host_name sp_int_sw1
alias SP Internal SW1 - 48 Port
address 192.168.1.4
statusmap_image router.gd2
# 2d_coords 0,160
}
define host{
name sham-generic-host
notifications_enabled 1
check_interval 5
max_check_attempts 3
retry_interval 1
notification_interval 12
notification_period 24x7
notification_options d,r
contact_groups sham_admins_aim,sham_admins_email
event_handler_enabled 1
flap_detection_enabled 1
failure_prediction_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
check_command check-host-alive
register 0
}
Re: No Blocking Outage Alerts for Parents
So the network order is:
Nagios --> sp_int_sw1 --> soekris1-sham,soekris2-shamm --> apple --> soekris1
Is this correct?
Nagios --> sp_int_sw1 --> soekris1-sham,soekris2-shamm --> apple --> soekris1
Is this correct?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: No Blocking Outage Alerts for Parents
soekris1/shiva2 at the end of the chain, but yes, correct.
Re: No Blocking Outage Alerts for Parents
Alright, back to your initial question:
Is the problem that you are not receiving alerts from "soekris" when it is down, but ""apple" was up? But if "apple" is down, you receive alerts correctly?
Is the problem that you are not receiving alerts from "soekris" when it is down, but ""apple" was up? But if "apple" is down, you receive alerts correctly?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.