nrpe.cfg reconfig?

xpac · Post by **xpac** » Tue Mar 10, 2015 11:18 am

So I would like to use a plugin for monitoring Cassandra, the only issue is that it requires support for command arguments in the NRPE daemon, which of course when I originally configured it I didn't use the "-enable-command-args" which apparently you need to do.

SO, can I just simply reconfigure NRPE on each remote host with the "-enable-command-args" or will this break something else? (I also have to change the nrpe.cfg file to dont_blame_nrep=1 but that's simple enough).

jdalrymple · Post by **jdalrymple** » Tue Mar 10, 2015 12:06 pm

If you use the same configure options that you used originally, usually just "--with-nagios-user" and "--with-nagios-group" then add in your "--enable-command-args" and you should be good to go. It sounds like you have a good handle on the process. Probably best to run through the process on a test/non-prod system, just for safety's sake.

Let use know if anything goes wrong.

xpac · Post by **xpac** » Fri Apr 03, 2015 12:46 pm

Thanks, I got it to "work" at least in theory. I used these instructions in case anybody is interested:

http://labs.nagios.com/2014/08/21/monit ... ase-nodes/

However, it really doesn't work. Unfortunately I can't find anybody else on these forums that has tried to set up Cassandra monitoring via Nagios core. I wonder if the problem is that I'm using Nagios Core instead of XI (although that really doesn't make any sense as far as I can tell)?

The problem is I can get the command to not give me an error when run remotely via NRPE, however the output is completely useless, as no matter what it returns a value of 0 live database nodes which is completely incorrect - although the first time I saw the output it gave me quite a scare

It's so frustrating, because the script runs fine locally on the Cassandra node, but when I try to run it remotely it just gives me the erroneous information

jdalrymple · Post by **jdalrymple** » Fri Apr 03, 2015 12:48 pm

This is almost always troubleshootable and fixable.

Can you give us a more verbose indication of the error? What happens if you run the check_cassandra script/bin as the nagios user?

xpac · Post by **xpac** » Fri Apr 03, 2015 2:54 pm

Ok, first I should mention that there was a permissions error which did not allow nagios user to run the "nodetool" command which is a VERY important part of this script

That said I fixed it, and still it gives me garbage info.

Here's the correct info, when I run the script remotely from the Nagios server via ssh (as nagios user):

ssh mytestcassandranode /usr/local/nagios/libexec/check_cassandra_cluster.sh -H localhost -P 7199 -w 1 -c 0

WARNING - Live Node:1 - 127.0.0.1:Normal,101.6,KB100.00%,-1405306065600736406 | Load_127.0.0.1=KB100.00% Owns_127.0.0.1=-1405306065600736406

This is the correct output.

However, if I try to run the command remotely using NRPE I get this incorrect info:

./check_nrpe -H mytestcassandranode-c check_cassandra_cluster -a '-H mytestcassandranode -P 7199 -w 1 -c 0'

CRITICAL - Live Node:0 - |

The command should take 5-10 seconds to run the command remotely, as it does when I run it using ssh. However, when I run it using NRPE it immediately returns that incorrect info.

jdalrymple · Post by **jdalrymple** » Mon Apr 06, 2015 8:53 am

Are you sure you have the argument passing bit working proper? To troubleshoot that I'd statically configure the args in the nrpe.cfg and run it from the Nagios server with just

Code: Select all

check_nrpe -H HOST -t 60 -c check_cassandra

xpac · Post by **xpac** » Mon Apr 06, 2015 2:37 pm

To troubleshoot that I'd statically configure the args in the nrpe.cfg and run it from the Nagios server

Unfortunately that just gives me the same output as trying to run it from the Nagios server with arguments.

Also as another test, I was able to run the basic check_users command from the Nagios server, both using the hardcoded arguments and then I did it with argument passing and both worked fine.

The only other thing I haven't mentioned yet is that I'm testing this on some VirtualBox instances, but I don't see where this would cause any issue.

Out of ideas for the moment

jdalrymple · Post by **jdalrymple** » Mon Apr 06, 2015 4:04 pm

xpac wrote:Here's the correct info, when I run the script remotely from the Nagios server via ssh (as nagios user):

ssh mytestcassandranode /usr/local/nagios/libexec/check_cassandra_cluster.sh -H localhost -P 7199 -w 1 -c 0

WARNING - Live Node:1 - 127.0.0.1:Normal,101.6,KB100.00%,-1405306065600736406 | Load_127.0.0.1=KB100.00% Owns_127.0.0.1=-1405306065600736406

This is the correct output.

However, if I try to run the command remotely using NRPE I get this incorrect info:

./check_nrpe -H mytestcassandranode-c check_cassandra_cluster -a '-H mytestcassandranode -P 7199 -w 1 -c 0'

This is a stretch, but I am noticing that you're not putting localhost into the check args when using NRPE, but you are when using ssh. Is it possible that the node isn't resolving its own hostname properly?

xpac · Post by **xpac** » Tue Apr 07, 2015 4:56 pm

This is a stretch, but I am noticing that you're not putting localhost into the check args when using NRPE, but you are when using ssh. Is it possible that the node isn't resolving its own hostname properly?

Unfortunately that isn't it.

So what I've done now is located an even simpler script that really only checks to verify that my nodes are alive (just to simplify things). I don't even have to provide the arguments as they're hard coded into the script. So, I ran the script from the host, and also ran the script locally via NRPE on the host (since localhost is allowed in the xinetd.d/nrpe conf file).

Same problem, running the script locally was just fine - when I ran the command via NRPE (locally) I got the same error/output.

It seems as though NRPE and/or xinetd is causing the problem, just not sure how or why.

jdalrymple · Post by **jdalrymple** » Tue Apr 07, 2015 4:59 pm

Just run check_nrpe -H hostname from the Nagios host. Let's see if that even works.

Code: Select all

[jdalrymple@localhost libexec]$ ./check_nrpe -H 127.0.0.1
NRPE v2.15

Nagios Support Forum

nrpe.cfg reconfig?

nrpe.cfg reconfig?

Re: nrpe.cfg reconfig?

Re: nrpe.cfg reconfig?

Re: nrpe.cfg reconfig?

Re: nrpe.cfg reconfig?

Re: nrpe.cfg reconfig?

Re: nrpe.cfg reconfig?

Re: nrpe.cfg reconfig?

Re: nrpe.cfg reconfig?

Re: nrpe.cfg reconfig?