Nagios Core - Federated - passive checks to master server

chris_rr · Post by **chris_rr** » Tue Dec 20, 2016 1:44 pm

Hi all,

Am now doing a PoC with an all-Core Nagios platform (XI didn't do contacts/alerting organisation as well as we'd liked). Almost all monitoring targets are network gear - routers, switches, firewalls.

Have slave Nagios Core servers monitoring locally, obsessing over services and using the NRDP shell script to send results passively to a master Core server. With XI as master, this is trivial - tell it to accept the passive checks it just got and off we go. So far my Core experiments are fruitless.

Am trying to get a simple host check to work before adding any services, yet it's stuck in PENDING. Even my manual NRDP update doesn't fix this. All my googling has yielded lots of discussion but practically no fully-detailed how-to's for this particular problem (millions for a linux server monitoring itself with nrpe and sending checks back tho).

So, here's my setup. Target won't exit PENDING state, and consensus is that this happens mostly due to a check not being in place. I have something slightly wrong, somewhere, but I've combed through logs, cache, cleared retention files, etc to eliminate what errors pop up, and now am left with PENDING and no further hints. Also from what i can tell, hostgroups between Slave/Master don't have to match up as that information is not passed between hosts during the passive updates.

SLAVE SIDE:

set to obsess over services and hosts. eventhandler script called, sends updates as detailed after host definitions below.

Code: Select all

define host {
        use                         passivebase
        host_groups             passiverouters
        host_name               router1.example.com
        address                    1.2.3.4
        alias                        router1 Juniper
}

Code: Select all

define host {
        use                        generic-host
        name                     passivebase
        register                  0
        alias                      Juniper passive template
        address                  127.0.0.1
        hostgroups              passiverouters
        check_period            24x7            ; By default, switches are monitored round the clock
        check_interval          5               ; Switches are checked every 5 minutes
        retry_interval          1               ; Schedule host check retries at 1 minute intervals
        max_check_attempts      10              ; Check each switch 10 times (max)
        check_command           check-host-alive        ; Default command to check if routers are "alive"
        notification_period     24x7            ; Send notifications at any time
        notification_interval   30              ; Resend notifications every 30 minutes
        notification_options    d,r,f           ; Only send notifications for specific host states
        contact_groups          admins
}

MASTER SIDE:

define host {
        use                 passiverouter
        host_groups     passivehosts
        host_name       router1.example.com
        address           1.2.3.4
        alias                router1 Juniper
}

Code: Select all

define host{
        use             generic-host-passive
        name            passiverouter
        register        0
        alias           Templated Juniper Passive
        hostgroups      passivehosts
        check_period            24x7            ; By default, switches are monitored round the clock
        check_interval          5               ; Switches are checked every 5 minutes
        retry_interval          1               ; Schedule host check retries at 1 minute intervals
        max_check_attempts      10              ; Check each switch 10 times (max)
        notification_period     24x7            ; Send notifications at any time
        notification_interval   30              ; Resend notifications every 30 minutes
        notification_options    d,r,f           ; Only send notifications for specific host states
        contact_groups          admins
        active_checks_enabled   0
        passive_checks_enabled  1
}

Code: Select all

define host {
        name                                     generic-host-passive
        check_command                    check-host-alive
        use                                        generic-host
        max_check_attempts              1
        active_checks_enabled           0
        passive_checks_enabled          1
        register                                  0
}

tcpdump shows me that checks are being sent and received. I grabbed an event-handler shellscript i'm sure anybody doing this stuff has come across. Here's the line that sends the data:
/bin/echo -e "$1\t$2\t$2\n" |/usr/local/nrdp/clients/send_nrdp.sh -u http://10.20.30.40/nrdp/ -t abcxyz123
(yes i've verified these variables send what the other side expects/needs to see - again, works for XI without a problem)

This is what I use on the command line:
/usr/local/nrdp/clients/send_nrdp.sh -u http://10.20.30.40/nrdp/ -t abcxyz123 -H router1.example.com -S 2 -o "Everything is fine"

Any clues or help would be greatly appreciated here.

thanks
-C

tmcdonald · Post by **tmcdonald** » Tue Dec 20, 2016 5:40 pm

I promise not to push XI further after this one message, but if this is working in XI I can almost guarantee you we can help clear up any issues with contacts or alerting that you are having. That seems like it would be possibly a faster and more stable resolution. I did see your previous thread but there might have been some other options like the API or Bulk Modifications that would work for you.

If this is something you are open to, let me know and we can move this to the XI forum, otherwise we'll continue with Core as planned.

chris_rr · Post by **chris_rr** » Tue Dec 20, 2016 6:14 pm

Hi,

I'd love to see if those issues could get solved for XI as we prefer the GUI, can we arrange something offline?

This particular thread I'd like to keep in Core as now I want to solve it either way now.

I've at this point gone over my entire configuration many times, and it seems that even when submitting on the http://<server>/nrdp page directly that nagios doesn't update. I therefore have to conclude that the document I used (see below) is missing something that ties the checks to Nagios or something. I've gone so far as to edit the send_nrdp.sh files to log exactly what XML is being output (it's fine), taken that output and cut/pasted it into the XML input on the url above (it accepts), and yet I'm looking at a set of pending services and host. I'm at a complete loss at this point.

https://assets.nagios.com/downloads/nrd ... erview.pdf

It should be noted that for linux/ubuntu installs of apache2, the conf file needs to be placed in /etc/apache2/sites-enabled, and that it should be changed the following way:

Code: Select all

<Directory "/usr/local/nrdp">
#  SSLRequireSSL
   Options None
   AllowOverride None
# Order allow,deny
   Allow from all
   Require all granted
#  Order deny,allow
#  Deny from all
#  Allow from 127.0.0.1
#   AuthName "NRDP"
#   AuthType Basic
#   AuthUserFile /usr/local/nrdp/htpasswd.users
#  Require valid-user
</Directory>

cheers,
C

tmcdonald · Post by **tmcdonald** » Wed Dec 21, 2016 5:18 pm

If it works in XI but not in Core then the issue is not likely on the sending side, so I will ignore that for now. It seems to be between NRDP and Core.

Can you turn on debug logging in Core (debug_level=-1 in nagios.cfg) and restart Nagios? Then send a few test results, turn off debugging, and post or PM the logs. Also send over the Apache logs for that time, both access and error. This will give us an idea of what is happening once the checks reach NRDP and then Core.

chris_rr · Post by **chris_rr** » Thu Dec 22, 2016 10:48 am

apache sees the connections (via access.log):

remote.server.IP - - [21/Dec/2016:17:55:46 -0500] "POST /nrdp// HTTP/1.1" 200 340 "-" "curl/7.47.0"
remote.server.IP - - [21/Dec/2016:17:55:49 -0500] "POST /nrdp// HTTP/1.1" 200 340 "-" "curl/7.47.0"
remote.server.IP - - [21/Dec/2016:17:55:53 -0500] "POST /nrdp// HTTP/1.1" 200 340 "-" "curl/7.47.0"

when submitted locally I get this:

my.laptop.IP - - [21/Dec/2016:17:59:38 -0500] "POST /nrdp/ HTTP/1.1" 200 387 "http://master.server.IP/nrdp/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36"

Nagios debug log shows.. zip. It's like the NRDP server does not interact with nagios core at all.

rkennedy · Post by **rkennedy** » Thu Dec 22, 2016 4:51 pm

Assuming you turned debugging on, could you please post the nagios.log that relates to the 12-21 date as well for us to review? This will help to see what's going on in the Nagios side of things.

Nagios Support Forum

Nagios Core - Federated - passive checks to master server

Nagios Core - Federated - passive checks to master server

Re: Nagios Core - Federated - passive checks to master serve

Re: Nagios Core - Federated - passive checks to master serve

Re: Nagios Core - Federated - passive checks to master serve

Re: Nagios Core - Federated - passive checks to master serve

Re: Nagios Core - Federated - passive checks to master serve