Am now doing a PoC with an all-Core Nagios platform (XI didn't do contacts/alerting organisation as well as we'd liked). Almost all monitoring targets are network gear - routers, switches, firewalls.
Have slave Nagios Core servers monitoring locally, obsessing over services and using the NRDP shell script to send results passively to a master Core server. With XI as master, this is trivial - tell it to accept the passive checks it just got and off we go. So far my Core experiments are fruitless.
Am trying to get a simple host check to work before adding any services, yet it's stuck in PENDING. Even my manual NRDP update doesn't fix this. All my googling has yielded lots of discussion but practically no fully-detailed how-to's for this particular problem (millions for a linux server monitoring itself with nrpe and sending checks back tho).
So, here's my setup. Target won't exit PENDING state, and consensus is that this happens mostly due to a check not being in place. I have something slightly wrong, somewhere, but I've combed through logs, cache, cleared retention files, etc to eliminate what errors pop up, and now am left with PENDING and no further hints. Also from what i can tell, hostgroups between Slave/Master don't have to match up as that information is not passed between hosts during the passive updates.
SLAVE SIDE:
set to obsess over services and hosts. eventhandler script called, sends updates as detailed after host definitions below.
Code: Select all
define host {
use passivebase
host_groups passiverouters
host_name router1.example.com
address 1.2.3.4
alias router1 Juniper
}
Code: Select all
define host {
use generic-host
name passivebase
register 0
alias Juniper passive template
address 127.0.0.1
hostgroups passiverouters
check_period 24x7 ; By default, switches are monitored round the clock
check_interval 5 ; Switches are checked every 5 minutes
retry_interval 1 ; Schedule host check retries at 1 minute intervals
max_check_attempts 10 ; Check each switch 10 times (max)
check_command check-host-alive ; Default command to check if routers are "alive"
notification_period 24x7 ; Send notifications at any time
notification_interval 30 ; Resend notifications every 30 minutes
notification_options d,r,f ; Only send notifications for specific host states
contact_groups admins
}
MASTER SIDE:
define host {
use passiverouter
host_groups passivehosts
host_name router1.example.com
address 1.2.3.4
alias router1 Juniper
}
Code: Select all
define host{
use generic-host-passive
name passiverouter
register 0
alias Templated Juniper Passive
hostgroups passivehosts
check_period 24x7 ; By default, switches are monitored round the clock
check_interval 5 ; Switches are checked every 5 minutes
retry_interval 1 ; Schedule host check retries at 1 minute intervals
max_check_attempts 10 ; Check each switch 10 times (max)
notification_period 24x7 ; Send notifications at any time
notification_interval 30 ; Resend notifications every 30 minutes
notification_options d,r,f ; Only send notifications for specific host states
contact_groups admins
active_checks_enabled 0
passive_checks_enabled 1
}
Code: Select all
define host {
name generic-host-passive
check_command check-host-alive
use generic-host
max_check_attempts 1
active_checks_enabled 0
passive_checks_enabled 1
register 0
}
/bin/echo -e "$1\t$2\t$2\n" |/usr/local/nrdp/clients/send_nrdp.sh -u http://10.20.30.40/nrdp/ -t abcxyz123
(yes i've verified these variables send what the other side expects/needs to see - again, works for XI without a problem)
This is what I use on the command line:
/usr/local/nrdp/clients/send_nrdp.sh -u http://10.20.30.40/nrdp/ -t abcxyz123 -H router1.example.com -S 2 -o "Everything is fine"
Any clues or help would be greatly appreciated here.
thanks
-C