Page 2 of 2
Re: negate no longer works after I update Centos 6
Posted: Tue Aug 11, 2015 9:56 am
by tecnalb
No, this is not resolved. I'm not sure how it could be since it worked prior to my updating CentOS. So maybe it was broken before and now works, in your eyes, it doesn't give me the indication in XI that I had before, so I need to understand how to resolve that now.
Re: negate no longer works after I update Centos 6
Posted: Tue Aug 11, 2015 10:12 am
by jolson
it doesn't give me the indication in XI that I had before, so I need to understand how to resolve that now.
What indication did you have in XI previously that has been removed?
To clear a few things up, when you run the following:
Code: Select all
./check_nrpe -H xxxx.xxx -c check_services -a "e2fsck"
And get the result:
e2fsck: 1
Does the 'e2fsck' process actually exist on the remote host you're monitoring? I think this is where the confusion is stemming from. If that process does not exist, yet check_nrpe reports that it does, you should verify that it doesn't exist by running the following on your *remote* server:
Re: negate no longer works after I update Centos 6
Posted: Tue Aug 11, 2015 10:15 am
by lmiltchev
...it doesn't give me the indication in XI that I had before...
Can you show us what you are seeing in the web UI at the moment and what think you need to see instead? Are you getting the "expected" output if you don't use the "negate" plugin?
Re: negate no longer works after I update Centos 6
Posted: Tue Aug 11, 2015 1:42 pm
by tecnalb
OK, I had this setup so that IF I WAS running fsck then I would get a red alert. If I WAS NOT running fsck (most of the time) then I would get a green alert. And this command has worked perfectly, until I updated to CentOS 6.7 a few days ago. Now, as you can see, I get a constant red alert even though fsck is not running. I did not change my nagios server at all. It was just a result of the update. And I was really just alerting your team to that fact, but I am curious as to what changed, and would like to find a way back to a valid check.
Re: negate no longer works after I update Centos 6
Posted: Tue Aug 11, 2015 2:07 pm
by jolson
You output appears to indicate that fsck
is running on the remote server in question -
e2fsck: 1. Are you certain that it's not?
On the remote host:
Re: negate no longer works after I update Centos 6
Posted: Tue Aug 11, 2015 2:23 pm
by jdalrymple
Let's start from the beginning and work it through:
From your first post:
tecnalb wrote:Code: Select all
[root@nagios libexec]# ./check_nrpe -H xxxx.xxx -t 30 -c check_services -a "e2fsck"
e2fsck: 1
[root@nagios libexec]# ./negate ./check_nrpe -H xxxx.xxx-a "e2fsck"
e2fsck: 1
It works best if you copy and paste, although assuming you just fat fingered and left out some bits - the output is still exactly as you'd expect. Negate by default will not munge the output of the plugin, only the return code.
tecnalb wrote:No, I updated the target host only. However it seems my negate and your negate are vastly different.
You're implying that negate on the Nagios server was changed by updating a remote nrpe host?
tecnalb wrote:[root@nagios libexec]# ./negate ./check_nrpe -H xxx.xxx -a "e2fsck"
NRPE v2.14
[root@nagios libexec]# echo $?
2
Without negate
[root@nagios libexec]# ./check_nrpe -H xxx.xxx -a "e2fsck"
NRPE v2.14
[root@nagios libexec]# echo $?
0
This negate is doing EXACTLY what it's supposed to - and all it's supposed to. The commands here are screwy, but negate is indeed working.
tecnalb wrote:[root@nagios libexec]# ./check_nrpe -H xxxx.xxx -c check_services -a "e2fsck"
e2fsck: 1
[root@nagios libexec]# echo $?
0
[root@nagios libexec]# ./negate ./check_nrpe -H xxxx.xxx -c check_services -a "e2fsck"
e2fsck: 1
[root@nagios libexec]# echo $?
2
On the host:
[nagios@backup libexec]$ /usr/local/nagios/libexec/check_services -p e2fsck
e2fsck: 1
[nagios@backup libexec]$ echo $?
0
Again, negate is working perfectly. In addition nrpe and check_nrpe are also doing their job well. If negate was working differently than this I would say negate was broken.
This is where I start to think I don't know what check_services is nor what it's supposed to do. I would recommend using check_procs which probably has a more predictable output and we know it adheres to threshold guidelines perfectly.
Code: Select all
[jdalrymple@localhost libexec]$ ./check_procs -a fsck -c :1
PROCS OK: 0 processes with args 'fsck' | procs=0;;:1;0;
If I understand your use case negate would become unnecessary, even though it is doing just fine in all of your examples.