ssh vs nrpe

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: ssh vs nrpe

Post by jdalrymple »

The clash of the titans... ssh vs. nrpe :)

My opinion will be obvious

Lots of support out there for both Puppet and Chef for automation of the deployment, not to mention NFS mounting configuration directories for nrpe.cfg is just as trivial as NFS mounting plugin directories.

I leave dont_blame_nrpe=0 on my configs, I prefer to NOT pass arguments for security's sake, then I don't need to worry about whether my communications are in cleartext or whatever. I guess that assumes that there is no iffy data in your Nagios check results, I can't think of any reason to have any though.

Here is my strongest sales pitch though:

Code: Select all

[jdalrymple@localhost ~]$ time /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -c check_dummy
OK

real    0m0.009s
user    0m0.003s
sys     0m0.000s
[jdalrymple@localhost ~]$ time ssh localhost /usr/local/nagios/libexec/check_dummy 0
OK

real    0m0.089s
user    0m0.014s
sys     0m0.004s
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: ssh vs nrpe

Post by Box293 »

There are two reasons why I like check_by_ssh over check_nrpe.

#1 is that check_by_ssh does not have a limit on how much data can be returned. This is the primary reason why I chose this method for box293_check_vmware as some of my checks return a lot of status text and performance data. This can be overcome with check_npre but requires re-compiling using a patch at both ends and all participants must be on the same patch.

#2 is that check_nrpe requires compiling on the client side whereas check_by_ssh does not require this on the client side.
BanditBBS wrote:Thousands of ssh connections can be a big load hog though...that plus this darn ssh issue we are facing here is just killing me. We can not find the cause or resolution! But yeah, having keys in place already makes ssh so nice and not having to actually install anything on the remote servers is great.
Leland Lammert did a great presentation at the Nagios World Conference 2013 called "Nagios in a Multi-Platform Environment".

https://www.youtube.com/watch?v=hjyJi-PsHw4

It's been a long time since I was in that room listening to his presentation however if I remember correctly, he talked about establishing permanent SSH tunnels between Nagios and the hosts. This in turn should reduce the number of connections.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
mp4783
Posts: 116
Joined: Wed May 14, 2014 11:11 am

Re: ssh vs nrpe

Post by mp4783 »

Permanent tunnels is a very compelling idea. If you use a muxed (shared) connection, then it will stay up until you tear it down. How many of these connections your server will support is most likely based upon the server resources. I think I've spun up a couple hundred. Most of them will be quiescent.

There is a technology I have played around with that is part of a very interesting product called GateOne (http://liftoffsoftware.com/Products/GateOne). This is a Python based tool that does some really cool things. Inside it is something called Tornado (I think) that sounds like it's the type of thing to support thousands of simultaneous sessions.

One thing I have not been able to get working is the use of an encrypted private key on the Nagios XI server, accessed via an ssh-agent process. I use this all the time and I know that Apache supports it, but any time I try it, it fails. The only thing I've ever gotten to work was an unencrypted key. By the way, the ssh-agent carries the unencrypted key in memory and a utility called keychain (http://www.funtoo.org/Keychain) creates a small file with the two environment variables necessary for Apache to access the ssh-agent process. With proper configuration, these environment variable will be set whenever the user logs on or can be sourced like this: . ~/.keychain/$(uname -n)-sh. Here are examples of the environment variables:

SSH_AUTH_SOCK=/tmp/ssh-BeD438an5648/agent.5648; export SSH_AUTH_SOCK;
SSH_AGENT_PID=5649; export SSH_AGENT_PID;
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: ssh vs nrpe

Post by BanditBBS »

My issue with the permanent ssh tunnels is what happens when I need to start using gearman to distribute the load. We're going to eventually need to use gearman, so when that happens, it'd be a nightmare to maintain, no? Plus it would be 1100 tunnels right now and eventually many more, that might be a bit much.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: ssh vs nrpe

Post by ssax »

Seems like 1100+ tunnels is a lot, I wonder what type of resources that would consume?
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: ssh vs nrpe

Post by Box293 »

BanditBBS wrote:My issue with the permanent ssh tunnels is what happens when I need to start using gearman to distribute the load. We're going to eventually need to use gearman, so when that happens, it'd be a nightmare to maintain, no? Plus it would be 1100 tunnels right now and eventually many more, that might be a bit much.
Lets say you had the XI server running gearman as the master and a worker. You also have two other external workers. So that's 3 workers.

From each worker you would need a tunnel to all remote hosts. Thats a lot of tunnels from the workers perspective. From the remote hosts perspective, there's only 3 tunnels established. Thats less overhead than establishing an ssh or nrpe session each time a check needs to run.


BanditBBS wrote:We're going to eventually need to use gearman, so when that happens, it'd be a nightmare to maintain, no?
Yeah that's the part I've not had to actually do. But I'm sure it's just as much work as maintaining check_nrpe when upgraded versions come out. I'm sure there is a scripted way to manage it.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
mp4783
Posts: 116
Joined: Wed May 14, 2014 11:11 am

Re: ssh vs nrpe

Post by mp4783 »

There isn't a clear answer here. You're going to have to judge system load and overhead with respect to how many tunnels you can support. I think I've had a couple hundred shared connections up and the server didn't seem to mind. Most of the time, they're not doing anything. When all is said and done, the shared connection just buys you some speed in terms of not having to build the connection and tear it down each time. This time saving may not be worth the hassle.

As for the management aspects of it, I don't know enough about Mod Gearman to comment, but like anything, you can probably script a solution. Sorry I don't have a better answer.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: ssh vs nrpe

Post by BanditBBS »

I appreciate all the replies. I think I am going to go the nrpe route. I am just in charge of all the monitoring here and this ssh issue we have every so often I just can't get enough traction behind investigating and fixing it. I know I can get nrpe to work and work well. I also have automation for deployment and updating all figured out in my head, just need to put pen to paper :geek:
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: ssh vs nrpe

Post by rajasegar »

In enterprise environment, nrpe will be flagged out in every audit.
There is no way around this. For these cases, we will use check_by_ssh

We rarely have to modify nrpe.cfg because all is parameterised.
This is another auditor's favourite topic which we counter by saying we have sufficient mitigation in place via firewall rules and nrpe whitelist.

Our nrpe is not running under inetd or xinetd. It is run using normal nagios user account.
Never had any problem so far using this method. Easier to get access to nagios account than root.
Any permission issues can be handled via sudo or RBAC (Solaris).

Tons of ways to automate but since most of our nrpe.cfg is different by server we do it manually.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: ssh vs nrpe

Post by BanditBBS »

Yeah, I've had to deal with the SOC audits myself in the past. I was always able to shut them up by showing the security and other stuff setup.

I was planning on writing a check that runs once a day on all my linux servers that checks a folder for a file. If it exists, then a script will be kicked off that will grab new plugins and new nrpe.cfg. It will then restart xinetd. We centrally manage sudo files, so can easily add the line to allow the nagios user to restart that server globally. So basically, I'll throw an update file in a folder, and remove it 24 hours later. In that time all servers should run the once/1440 minutes check.

I need to automate it as we are constantly writing new plugins/expanding our monitoring. That, plus bug fixes or updates to existing plugins. Of course, this plan may all get thrown away and we may do it another way, but that's what is in my mind at the moment.
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Locked