For one server nrpe-2.14 works, nrpe-3.0 does not

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
jobst
Posts: 16
Joined: Mon Jul 22, 2013 6:33 pm

For one server nrpe-2.14 works, nrpe-3.0 does not

Post by jobst »

Hi.

I am using nagios-4.2.0 as the Nagios Core Server, it runs CentOS6.X - all latest patches
Most servers I can monitor but one, all monitored servers run the same CentOS6.X (all patches) using nrpe 3.0.

If I downgrade that one server to nrpe 2.14 (with ssl), it works, happily.
If I upgrade that one server to nrpe 3.0 (with ssl), it does NOT work.

The Nagios server sits on a local network running NRPE 3.0 behind a firewall.
The Nagios Server happily talks to all other servers on various networks on port 5666 - I know the port is open on ALL machines for access from the Nagios Core Server.

I, too, know the port 5666 is open on the machine that has problem using nrpe 3.0 - I can telnet to it on port 5666, but it than bails out when using 3.0 but does not bail out when using 2.4
It would not work at all if the port is NOT open.

If I downgrade that machine NRPE_CLIENT to nrpe 2.14 it happily works, but the server logs are full with
Aug 25 16:43:47 CORESERVER check_nrpe: Remote NRPE_CLIENT does not support Version 3 Packets
Aug 25 16:43:49 CORESERVER check_nrpe: Remote NRPE_CLIENT accepted a Version 2 Packet
and the logs of that NRPE_CLIENT report
Aug 25 16:45:05 NRPE_CLIENT nrpe[23013]: Error: Request packet type/version was invalid!
Aug 25 16:45:05 NRPE_CLIENT nrpe[23013]: Client request was invalid, bailing out...
I get when testing it:
[root@NAGIOSCORE /var/log] #>/usr/local/nagios/libexec/check_nrpe -H NRPE_CLIENT -c check_procs
PROCS OK : count 1706 |count=706;2000;2100 runqueue=1 blocked=0 running=1 new=0.00

Now if I upgrade the NRPE_CLIENT to nrpe 3.0, the logs are filled with
Aug 25 16:56:15 NAGIOSCORE check_nrpe: Error: Could not complete SSL handshake with NRPE_CLIENT: rc=0 SSL-error=5
and the logs of that NRPE_CLIENT report
Aug 25 16:55:17 NRPE_CLIENT nrpe[23815]: Host NAGIOSCORE is not allowed to talk to us!

What I see on the firewall is a "deny tcp (no connection) from NAGIOSCORE to NRPE_CLIENT flags on interface", which clearly means that the SSL handshake flow attempting to go through it does not seem to be following the correct TCP session flow (SYN, SYN ACK etc.) - It sees the second on but not the first one.

But my issue is that this does NOT happen when using nrpe-2.4

So what's makes this SSL thingo so different in 3.0 with respect to 2.4?
Does anybody have an idea how I can fix this?

thanks
Jobst
jfrickson

Re: For one server nrpe-2.14 works, nrpe-3.0 does not

Post by jfrickson »

First, add -n to the check_nrpe command line for that host, and the nrpe command line on that host. With no encryption, that will tell us if it's an SSL problem or not.

If it works with -n, remove the -n from both ends. Add -s -1 to the check_nrpe command line, and set ssl_logging=0x2f in the nrpe.cfg file. This will log extra SSL information to syslog on both the Nagios host and the remote host. Post the resulting logfile lines from both ends so I can take a look.

EDIT: had -x -1, should be -s -1. Corrected above.
jobst
Posts: 16
Joined: Mon Jul 22, 2013 6:33 pm

Re: For one server nrpe-2.14 works, nrpe-3.0 does not

Post by jobst »

EDIT: I heavily edited this reply!

I figured out the problem - the error messages are oh so very misleading.
In the moment I believe it's a bug - I will need to do some more debugging before I will say this for sure.

EDIT: it IS a bug, I now know where it comes from.

The Nagios core server is a multi homed host: outside interface facing the internet and an inside interface connected to DMZ.
Obvioulsy it talks to many NRPE clients through both interfaces. I know that the routing table on the Nagios Server Host is 100% correct, I also know that it works for everything else BUT nrpe-3.0 only. I also know it works for nrpe < 3.0.

There is something VERY odd in the setup sequence of the SSL handshake were it picks up one of the IP ADDRESSES from the allowed_hosts lists.

EDIT: section below heavily changed after first submission

If I have this:

server_address=NAGIOS_CORE_SERVER
allowed_hosts=127.0.0.1,NAGIOS_CORE_SERVER

it works, if I have this:

server_address=NAGIOS_CORE_SERVER
allowed_hosts=127.0.0.1,localhost, NAGIOS_CORE_SERVER

it does not.

I has to do with the "parse_allowed_hosts(char *)" which uses the following three functions:
  • add_domain_to_acl()
  • add_ipv6_to_acl()
  • add_ipv4_to_acl()
If you use a HOSTNAME, it breaks.
If you use IP_ADDESSES ONLY, it does NOT break.

I am launching a couple of bugs @ github.

Jobst
Last edited by jobst on Mon Aug 29, 2016 12:50 am, edited 4 times in total.
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: For one server nrpe-2.14 works, nrpe-3.0 does not

Post by Box293 »

allowed_hosts=127.0.0.1,NAGIOS_CORE_SERVER, ANOTHER_HOST

There is a space after the comma, does this make a difference if you remove the space?

Are you using IP addresses or dns entires? Are they IPv6 addresses?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
jobst
Posts: 16
Joined: Mon Jul 22, 2013 6:33 pm

Re: For one server nrpe-2.14 works, nrpe-3.0 does not

Post by jobst »

Box293 wrote:allowed_hosts=127.0.0.1,NAGIOS_CORE_SERVER, ANOTHER_HOST

There is a space after the comma, does this make a difference if you remove the space?

Are you using IP addresses or dns entires? Are they IPv6 addresses?

Hi, thanks for the reply.
I edited my original reply as I found the issue.
The space is not the problem - also if you read the code for nrpe.c you see that the spaces are dealt with when the configuration filr is read.
jobst
Posts: 16
Joined: Mon Jul 22, 2013 6:33 pm

Re: For one server nrpe-2.14 works, nrpe-3.0 does not

Post by jobst »

This is an explanation of the issue.
The problem only occurs for nrpe > 3.X, it does not occur for example using nrpe 2.14

If you have this on the NRPE client

Code: Select all

  server_address=172.16.1.1
  allowed_hosts=127.0.0.1,192.168.1.1


and issue a command on the machine 192.168.1.1 like so

Code: Select all

check_nrpe -H 172.16.1.1 -c check_some-disk
it will work. If you add a HOSTNAME i.e. not an ip-address to the list of allowed hosts on the NRPE CLIENT like so:

Code: Select all

  server_address=172.16.1.1
  allowed_hosts=127.0.0.1,localhost,192.168.1.1


it will not work, the nrpe client will display in the log file

Code: Select all

Aug 26 17:07:10 172.16.1.1 nrpe[15514]: Connection from 192.168.1.1 port 38593
Aug 26 17:07:10 172.16.1.1 nrpe[15514]: Host 192.168.1.1 is not allowed to talk to us!
The reason for this is that an issue with strtok in "parse_allowed_hosts" stops processing AFTER "localhost" and "192.168.1.1" is NOT added to the allowed acls.

"strtok" is NOT re-entrant, so once the function "add_ipv4_to_acl" is called and cannot add "localhost" (which is correct BTW) it hands over the flow to "add_domain_to_acl" (which is correct as well), but because "strtok" looses the PTR (due to the added function call) processing STOPS after "localhost" is added and "192.168.1.1" is NEVER added.

I have suggested on github's nrpe forum to add a function called "str_split" which prevents this and makes the entire things more poartable.

My problem, too, is that although my system has "strtok_r" (shown when I run the "configure" command) it is not called during the function call of "parse_allowed_hosts".
As the function "strtok" uses and internal function pointer calls to "strtok" from another function looses that pointer.

So in the code

Code: Select all

#ifdef HAVE_STRTOK_R
        tok = strtok_r(hosts, delim, &saveptr);
#else
        tok = strtok(hosts, delim);
#endif
        while( tok) {
                trimmed_tok = malloc( sizeof( char) * ( strlen( tok) + 1));
                trim( tok, trimmed_tok);
                if( strlen( trimmed_tok) > 0) {
                        if (!add_ipv4_to_acl(trimmed_tok) && !add_ipv6_to_acl(trimmed_tok)
                                        && !add_domain_to_acl(trimmed_tok)) {
                                syslog(LOG_ERR,"Can't add to ACL this record (%s). Check allowed_hosts option!\n",trimmed_tok);
                        }
                }
                free( trimmed_tok);
#ifdef HAVE_STRTOK_R
                tok = strtok_r(NULL, delim, &saveptr);
#else
                tok = strtok(NULL, delim);
#endif
     }
the INTERNAL pointer that "strtok" keeps is LOST between these 3 functions calls

Code: Select all

!add_ipv4_to_acl(trimmed_tok)
!add_ipv6_to_acl(trimmed_tok)
!add_domain_to_acl(trimmed_tok)
it will never work.
jfrickson

Re: For one server nrpe-2.14 works, nrpe-3.0 does not

Post by jfrickson »

Nice work! Sorry you had to put so much time into it, though. And thanks for entering the issues on github.

There are been several problems with parse_allowed_hosts so it doesn't surprise me that it's the culprit here.

I'll get to work on these and hopefully have them done in time for the 3.0.1 release coming up shortly.
Locked