nrpe.cfg reconfig?
nrpe.cfg reconfig?
So I would like to use a plugin for monitoring Cassandra, the only issue is that it requires support for command arguments in the NRPE daemon, which of course when I originally configured it I didn't use the "-enable-command-args" which apparently you need to do.
SO, can I just simply reconfigure NRPE on each remote host with the "-enable-command-args" or will this break something else? (I also have to change the nrpe.cfg file to dont_blame_nrep=1 but that's simple enough).
SO, can I just simply reconfigure NRPE on each remote host with the "-enable-command-args" or will this break something else? (I also have to change the nrpe.cfg file to dont_blame_nrep=1 but that's simple enough).
-
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: nrpe.cfg reconfig?
If you use the same configure options that you used originally, usually just "--with-nagios-user" and "--with-nagios-group" then add in your "--enable-command-args" and you should be good to go. It sounds like you have a good handle on the process. Probably best to run through the process on a test/non-prod system, just for safety's sake.
Let use know if anything goes wrong.
Let use know if anything goes wrong.
Re: nrpe.cfg reconfig?
Thanks, I got it to "work" at least in theory. I used these instructions in case anybody is interested:
http://labs.nagios.com/2014/08/21/monit ... ase-nodes/
However, it really doesn't work. Unfortunately I can't find anybody else on these forums that has tried to set up Cassandra monitoring via Nagios core. I wonder if the problem is that I'm using Nagios Core instead of XI (although that really doesn't make any sense as far as I can tell)?
The problem is I can get the command to not give me an error when run remotely via NRPE, however the output is completely useless, as no matter what it returns a value of 0 live database nodes which is completely incorrect - although the first time I saw the output it gave me quite a scare
It's so frustrating, because the script runs fine locally on the Cassandra node, but when I try to run it remotely it just gives me the erroneous information
http://labs.nagios.com/2014/08/21/monit ... ase-nodes/
However, it really doesn't work. Unfortunately I can't find anybody else on these forums that has tried to set up Cassandra monitoring via Nagios core. I wonder if the problem is that I'm using Nagios Core instead of XI (although that really doesn't make any sense as far as I can tell)?
The problem is I can get the command to not give me an error when run remotely via NRPE, however the output is completely useless, as no matter what it returns a value of 0 live database nodes which is completely incorrect - although the first time I saw the output it gave me quite a scare
It's so frustrating, because the script runs fine locally on the Cassandra node, but when I try to run it remotely it just gives me the erroneous information
-
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: nrpe.cfg reconfig?
This is almost always troubleshootable and fixable.
Can you give us a more verbose indication of the error? What happens if you run the check_cassandra script/bin as the nagios user?
Can you give us a more verbose indication of the error? What happens if you run the check_cassandra script/bin as the nagios user?
Re: nrpe.cfg reconfig?
Ok, first I should mention that there was a permissions error which did not allow nagios user to run the "nodetool" command which is a VERY important part of this script
That said I fixed it, and still it gives me garbage info.
Here's the correct info, when I run the script remotely from the Nagios server via ssh (as nagios user):
ssh mytestcassandranode /usr/local/nagios/libexec/check_cassandra_cluster.sh -H localhost -P 7199 -w 1 -c 0
WARNING - Live Node:1 - 127.0.0.1:Normal,101.6,KB100.00%,-1405306065600736406 | Load_127.0.0.1=KB100.00% Owns_127.0.0.1=-1405306065600736406
This is the correct output.
However, if I try to run the command remotely using NRPE I get this incorrect info:
./check_nrpe -H mytestcassandranode-c check_cassandra_cluster -a '-H mytestcassandranode -P 7199 -w 1 -c 0'
CRITICAL - Live Node:0 - |
The command should take 5-10 seconds to run the command remotely, as it does when I run it using ssh. However, when I run it using NRPE it immediately returns that incorrect info.
That said I fixed it, and still it gives me garbage info.
Here's the correct info, when I run the script remotely from the Nagios server via ssh (as nagios user):
ssh mytestcassandranode /usr/local/nagios/libexec/check_cassandra_cluster.sh -H localhost -P 7199 -w 1 -c 0
WARNING - Live Node:1 - 127.0.0.1:Normal,101.6,KB100.00%,-1405306065600736406 | Load_127.0.0.1=KB100.00% Owns_127.0.0.1=-1405306065600736406
This is the correct output.
However, if I try to run the command remotely using NRPE I get this incorrect info:
./check_nrpe -H mytestcassandranode-c check_cassandra_cluster -a '-H mytestcassandranode -P 7199 -w 1 -c 0'
CRITICAL - Live Node:0 - |
The command should take 5-10 seconds to run the command remotely, as it does when I run it using ssh. However, when I run it using NRPE it immediately returns that incorrect info.
-
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: nrpe.cfg reconfig?
Are you sure you have the argument passing bit working proper? To troubleshoot that I'd statically configure the args in the nrpe.cfg and run it from the Nagios server with just
Code: Select all
check_nrpe -H HOST -t 60 -c check_cassandra
Re: nrpe.cfg reconfig?
Unfortunately that just gives me the same output as trying to run it from the Nagios server with arguments.To troubleshoot that I'd statically configure the args in the nrpe.cfg and run it from the Nagios server
Also as another test, I was able to run the basic check_users command from the Nagios server, both using the hardcoded arguments and then I did it with argument passing and both worked fine.
The only other thing I haven't mentioned yet is that I'm testing this on some VirtualBox instances, but I don't see where this would cause any issue.
Out of ideas for the moment
-
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: nrpe.cfg reconfig?
This is a stretch, but I am noticing that you're not putting localhost into the check args when using NRPE, but you are when using ssh. Is it possible that the node isn't resolving its own hostname properly?xpac wrote:Here's the correct info, when I run the script remotely from the Nagios server via ssh (as nagios user):
ssh mytestcassandranode /usr/local/nagios/libexec/check_cassandra_cluster.sh -H localhost -P 7199 -w 1 -c 0
WARNING - Live Node:1 - 127.0.0.1:Normal,101.6,KB100.00%,-1405306065600736406 | Load_127.0.0.1=KB100.00% Owns_127.0.0.1=-1405306065600736406
This is the correct output.
However, if I try to run the command remotely using NRPE I get this incorrect info:
./check_nrpe -H mytestcassandranode-c check_cassandra_cluster -a '-H mytestcassandranode -P 7199 -w 1 -c 0'
Re: nrpe.cfg reconfig?
Unfortunately that isn't it.This is a stretch, but I am noticing that you're not putting localhost into the check args when using NRPE, but you are when using ssh. Is it possible that the node isn't resolving its own hostname properly?
So what I've done now is located an even simpler script that really only checks to verify that my nodes are alive (just to simplify things). I don't even have to provide the arguments as they're hard coded into the script. So, I ran the script from the host, and also ran the script locally via NRPE on the host (since localhost is allowed in the xinetd.d/nrpe conf file).
Same problem, running the script locally was just fine - when I ran the command via NRPE (locally) I got the same error/output.
It seems as though NRPE and/or xinetd is causing the problem, just not sure how or why.
-
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: nrpe.cfg reconfig?
Just run check_nrpe -H hostname from the Nagios host. Let's see if that even works.
Code: Select all
[jdalrymple@localhost libexec]$ ./check_nrpe -H 127.0.0.1
NRPE v2.15