NRPE 2 versus 3

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
SteveBeauchemin
Posts: 524
Joined: Mon Oct 14, 2013 7:19 pm

NRPE 2 versus 3

Post by SteveBeauchemin »

I think I'd like to use nrpe 3.0.1

The problem with that is the incompatibility between the 2 and 3. The version 3 is not a simple drop in replacement. If I have nrpe 3 on a remote host, do I have to also run check_nrpe 3 on my core nagios host or use different parameters? If I use check_nrpe 3 to query nrpe 2, then do I need to use different parameters again. Does this mean that I have to know the difference between a client running 2 versus 3. Do I have to double up the Service tests, and host groups, in order to move from one version to another, one a version 2 and one a version 3 for the same test? I am very confused.

Has anyone developed a Migration process to move from NRPE 2.1.2 to NRPE 3.0.1 yet? If so, please chime in.

Would it be possible to get the Version 2 and 3 to be compatible so we can easily migrate? Do I have to compile nrpe 1500 times?

I think I'd like to use nrpe 3.0.1 - but I fear that I'll be on nrpe 2 for a very long time because they are incompatible. And I am not even asking about NSClient and check_nrpe3.

Here's a rock - here I am - here's a hard place. Help, I'm stuck! Please advise...

Steve B
XI 5.7.3 / Core 4.4.6 / NagVis 1.9.8 / LiveStatus 1.5.0p11 / RRDCached 1.7.0 / Redis 3.2.8 /
SNMPTT / Gearman 0.33-7 / Mod_Gearman 3.0.7 / NLS 2.0.8 / NNA 2.3.1 /
NSClient 0.5.0 / NRPE Solaris 3.2.1 Linux 3.2.1 HPUX 3.2.1
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: NRPE 2 versus 3

Post by WillemDH »

Steve,

I built a check_nrpe_v3 and I'm working on migrating my Windows clients automatically with a Rundeck job.
Does this mean that I have to know the difference between a client running 2 versus 3
Yes, as the command is different, you will need to know what version the client is using in orde to use the correct command.
Do I have to double up the Service tests, and host groups, in order to move from one version to another, one a version 2 and one a version 3 for the same test? I am very confused.
Yes, you will need different templates / commands for NRPE v3.
Would it be possible to get the Version 2 and 3 to be compatible so we can easily migrate?
Would be nice, but as Nagios and NSCLient are not playing along very well, support is limited.. So I guess this might work one day for Linux and the NRPE agent, but not for NSClient.
Do I have to compile nrpe 1500 times?
It should be possible to automate this?

I'll see if I can pass my function to detect the NSClient version for you.

Grtz
Nagios XI 5.8.1
https://outsideit.net
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: NRPE 2 versus 3

Post by tmcdonald »

I spoke with our NRPE dev to confirm, and he said that:
v3 check_nrpe will talk to a v2 server (it will try v3 protocol first, so there's a little delay) and a v3 server will talk to a v2 check_nrpe.
and from the changelog:
- Added support for version 3 variable sized packets up to 64KB. nrpe will
accept either version from check_nrpe. check_nrpe will try to send a
version 3 packet first, and fall back to version 2. check_nrpe can be forced
to only send version 2 packets if the switch `-2` is used. (John Frickson)
Have you tested this? If it is not working then it might be a bug and we can probably fix that for you a lot easier than migrating all of your checks.
Former Nagios employee
SteveBeauchemin
Posts: 524
Joined: Mon Oct 14, 2013 7:19 pm

Re: NRPE 2 versus 3

Post by SteveBeauchemin »

Let me get some more specific and repeatable data for you. I did try to just drop in nrpe3 but the nsclient tests failed to run properly so I had to step back from that.

To be fair, I will get more specific data.

Also, I use mod_gearman, and have a slightly mixed setup right now. My Nagios core is totally updated and running Red Hat 7 Nagios XI 5.4.2. but still on nrpe 2.1.2. The Gearman workers are still Red Hat 6. I am trying to build a Red Hat 7 gearman worker and that is what triggered this question. I would like to build a new mod_gearman with nrpe 3, but initial research gave me a hard stop on that.

I will add to this post with real data soon.

Thanks

Steve B
XI 5.7.3 / Core 4.4.6 / NagVis 1.9.8 / LiveStatus 1.5.0p11 / RRDCached 1.7.0 / Redis 3.2.8 /
SNMPTT / Gearman 0.33-7 / Mod_Gearman 3.0.7 / NLS 2.0.8 / NNA 2.3.1 /
NSClient 0.5.0 / NRPE Solaris 3.2.1 Linux 3.2.1 HPUX 3.2.1
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: NRPE 2 versus 3

Post by dwhitfield »

SteveBeauchemin wrote: I did try to just drop in nrpe3 but the nsclient tests failed to run properly so I had to step back from that.
As @WillemDH already alluded to, NSClient does have some incompatibilities with NRPE, but I thought they had broken everything for NRPE 2.x, not 3.

Since you are thinking of a "rewrite" anyway, have you considered NCPA?

We await more specific data. :)
SteveBeauchemin
Posts: 524
Joined: Mon Oct 14, 2013 7:19 pm

Re: NRPE 2 versus 3

Post by SteveBeauchemin »

Tested using check_nrpe from my core server to my NSClient systems. I have 3 versions of NSClient++ in use.

It seems that the on a system running the 0.3.9 NSClient++ agent is where I saw bad stuff happen, and caused me to NOT use the check_nrpe 3 on my core system. At the time, I was dealing with migrating to new OS and new Nagios XI version. When I saw the problem, I just reverted the core server version back to 2.15. Now, looking again, taking some time to analyze, I see that only the very old NSClient agent has a problem.

Code: Select all

./check_nrpe-Version-2.15 -H [various hosts with Different NSClient versions]
I (0.3.9.328 2011-08-16) seem to be doing fine...
I (0.4.4.19 2015-12-08) seem to be doing fine...
I (0.5.0.62 2016-09-14) seem to be doing fine...

./check_nrpe-Version-3.0.1 -H [Same hosts same order]
Could not construct return packet in NRPE handler check client side (nsclient.log) logs...
I (0.4.4.19 2015-12-08) seem to be doing fine...
I (0.5.0.62 2016-09-14) seem to be doing fine...
Now that I know better, to solve this, I simply changed 4 command definitions to use the version 2.15 check_nrpe. For the small handful of hosts with Old agents, they now look like this:

Code: Select all

$USER1$/check_nrpe-Version-2.15 -H $HOSTADDRESS$ ...
So they are using the old binary. The rest of my tests on my core server are now using the check_nrpe version 3. So far so good.

Damn I wish I knew this a couple weeks ago. So it is a drop in replacement, as long as you are not using pre-historical code on remote systems.

Willem - you have version 0.3.9 deployed? Moving from 0.3.9 to new version did require me to spend a lot of time changing test syntax. I can jump start you by sending some examples if you like. You're probably okay already, but let me know.

It would be so much easier if we did stuff like Management thinks. Don't you just push a button and it works? When it is not working they ask - Did you push the button?

Well, now, yes. I pushed the button. And the migration methodology is easy, post something stupid to a forum, then actually do some research and testing.

Thanks

Steve B
(close this if you like...)
XI 5.7.3 / Core 4.4.6 / NagVis 1.9.8 / LiveStatus 1.5.0p11 / RRDCached 1.7.0 / Redis 3.2.8 /
SNMPTT / Gearman 0.33-7 / Mod_Gearman 3.0.7 / NLS 2.0.8 / NNA 2.3.1 /
NSClient 0.5.0 / NRPE Solaris 3.2.1 Linux 3.2.1 HPUX 3.2.1
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: NRPE 2 versus 3

Post by WillemDH »

Steve,

This is the function I use in my Rundeck scripts to determine the version of NRPE:

Code: Select all

WriteLog () {
    ScriptName=$(basename "${BASH_SOURCE}")
    if [ ! -z "$ScriptName" ] ; then
        ScriptName="$(basename "$(readlink -f "$0")")"
    fi
    if [ -z "$1" ] ; then
        echo "WriteLog: Log parameter #1 is zero length. Please debug..."
        exit 1
    else
        if [ -z "$2" ] ; then
            echo "WriteLog: Severity parameter #2 is zero length. Please debug..."
            exit 1
        else
            if [ -z "$3" ] ; then
                echo "WriteLog: Message parameter #3 is zero length. Please debug..."
                exit 1
            fi
        fi
    fi
    Now=$(date '+%Y-%m-%d %H:%M:%S,%3N')
    if [ "$1" = "Debug" ] && [ "$Debug" = 1 ] ; then
        echo "$Now: $ScriptName: $2: Debug: $3 "
    elif [ "$1" = "Verbose" ] && [ "$Verbose" = 1 ] ; then
        echo "$Now: $ScriptName: $2: $3"
    elif [ "$1" = "Output" ] ; then
        echo "${Now}: $ScriptName: $2: $3"
    elif [ -f "$1" ] ; then
        echo "${Now}: $ScriptName: $2: $3" >> "$1"
    fi
    if [ ! -z "$LogLocal" ] ; then
        if [ "$1" = "Debug" ] && [ "$Debug" = 1 ] ; then
            echo "$Now: $ScriptName: $2: Debug: $3 " >> "$LogLocal"
        elif [ "$1" = "Verbose" ] && [ "$Verbose" = 1 ] ; then
            echo "$Now: $ScriptName: $2: $3" >> "$LogLocal"
        elif [ "$1" = "Output" ] ; then
            echo "${Now}: $ScriptName: $2: $3" >> "$LogLocal"
        fi
    fi
}

CheckNrpeWindows () {
    WinHost=$1
    NrpeTest="$(/usr/local/nagios/libexec/check_nrpe_v3 -2 -P 10240 -H $WinHost)"
    WriteLog Debug Info "NrpeTest $WinHost: $NrpeTest"
    if [[ $NrpeTest = *"Receive underflow"* ]] ; then
        NrpeCom="/usr/local/nagios/libexec/check_nrpe"
    elif [[ $NrpeTest = *"seem to be doing fine"* ]] ; then
        NrpeCom="/usr/local/nagios/libexec/check_nrpe_v3 -2 -P 10240"
    elif [[ $NrpeTest = *"Could not complete SSL handshake"* ]] ; then
        WriteLog Output Error "NRPE: Could not complete SSL handshake. Please debug. "
        exit 2
    else
        WriteLog Output Error "NRPE detection failed. Please debug. "
        exit 2
    fi
}
When

Code: Select all

Receive underflow
is detected in the

Code: Select all

/usr/local/nagios/libexec/check_nrpe_v3 -2 -P 10240 -H $WinHost
commùand it's using NRPE 2.15.

I'm migrating from 0.4.1.105 to 0.5.0.62 and at the same time upgrading NRPE by setting the

Code: Select all

payload length = 10240
in nsclient.ini

Never used 0.3.x NSCLients.

I have a test group of 12 Windows servers which have been migrated. 580 more to do. I've implemented it so that I can switch fast and easy with the help of some Rundeck jobs between NSCLient 0.4.1.105 with NRPE 2.15 and NSCLient 0.5.0.62 and NRPE v3.

Grtz
Nagios XI 5.8.1
https://outsideit.net
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: NRPE 2 versus 3

Post by cdienger »

Hi Steve & Willem, just want to get a consensus here before locking the thread, were there any questions regarding the code Willem provided or would either of you like to keep the thread open a bit longer?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
SteveBeauchemin
Posts: 524
Joined: Mon Oct 14, 2013 7:19 pm

Re: NRPE 2 versus 3

Post by SteveBeauchemin »

I am okay to close this. If and When I get to having Rundeck, I will probably look here for the data Willem provided.

Thanks.

Steve B
XI 5.7.3 / Core 4.4.6 / NagVis 1.9.8 / LiveStatus 1.5.0p11 / RRDCached 1.7.0 / Redis 3.2.8 /
SNMPTT / Gearman 0.33-7 / Mod_Gearman 3.0.7 / NLS 2.0.8 / NNA 2.3.1 /
NSClient 0.5.0 / NRPE Solaris 3.2.1 Linux 3.2.1 HPUX 3.2.1
Locked