Page 1 of 3

NRPE response not found

Posted: Thu Jul 31, 2014 5:36 am
by turboscrew
What could be wrong here - why is the output null?
The used plugin is the "official" one unaltered.

On the Nagios server side logs:

Code: Select all

[1406800743.128670] [2048.1] [pid=25278]   Done.  Final output: ''/usr/lib64/nagios/plugins/check_nrpe -H 10.27.128.81 -c check_all_disks 20% 10%''
[1406800743.128673] [2048.1] [pid=25278] **** END MACRO PROCESSING *************
[1406800743.128718] [016.1] [pid=25278] Check result output will be written to '/var/log/nagios/spool/checkresults/checkoET2Gt' (fd=7)
[1406800743.128937] [016.2] [pid=25278] Service check is executing in child process (pid=25496)
[1406800743.131941] [016.2] [pid=25496] Moving temp check result file '/var/log/nagios/spool/checkresults/checkoET2Gt' to queue file '/var/log/nagios/spool/checkresults/cgcNt6M'...
[1406800752.135130] [016.0] [pid=25278] Starting to reap check results.
[1406800752.135189] [016.1] [pid=25278] Starting to read check result queue '/var/log/nagios/spool/checkresults'...
[1406800752.135220] [016.1] [pid=25278] Processing check result file: '/var/log/nagios/spool/checkresults/cgcNt6M'
[1406800752.135361] [016.2] [pid=25278] Found a check result (#1) to handle...
[1406800752.135373] [016.1] [pid=25278] Handling check result for service 'Disk usage' on host '10.27.128.81'...
[1406800752.135378] [016.0] [pid=25278] ** Handling check result for service 'Disk usage' on host '10.27.128.81'...
[1406800752.135382] [016.1] [pid=25278] HOST: 10.27.128.81, SERVICE: Disk usage, CHECK TYPE: Active, OPTIONS: 0, SCHEDULED: Yes, RESCHEDULE: Yes, EXITED OK: Yes, RETURN CODE: 127, OUTPUT: (null)
[1406800752.135454] [016.2] [pid=25278] ST: HARD  CA: 3  MA: 3  CS: 2  LS: 2  LHS: 2
[1406800752.135460] [016.1] [pid=25278] Service is in a non-OK state!
[1406800752.135464] [016.1] [pid=25278] Host is currently UP, so we'll recheck its state to make sure...
[1406800752.135468] [016.1] [pid=25278] * Using last known host state: 0
and on the client side logs:

Code: Select all

Jul 26 00:14:07 elukancompute nrpe[10027]: Connection from 10.27.128.80 port 7555
Jul 26 00:14:07 elukancompute nrpe[10027]: Host address is in allowed_hosts
Jul 26 00:14:07 elukancompute nrpe[10027]: Handling the connection...
Jul 26 00:14:07 elukancompute nrpe[10027]: Host is asking for command 'check_all_disks' to be run...
Jul 26 00:14:07 elukancompute nrpe[10027]: Running command: /usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /dev/vda
Jul 26 00:14:07 elukancompute nrpe[10027]: Command completed with return code 0 and output: DISK OK - free space: / 8135 MB
(85% inode=92%);| /=1377MB;7960;8955;0;9951
Jul 26 00:14:07 elukancompute nrpe[10027]: Return Code: 0, Output: DISK OK - free space: / 8135 MB (85% inode=92%);| /=1377M
B;7960;8955;0;9951
Jul 26 00:14:07 elukancompute nrpe[10027]: [30B blob data]
On the web-interface:

Code: Select all

Disk usage  CRITICAL 	07-31-2014 13:09:03 	1d 1h 52m 46s 	3/3 	(Return code of 127 is out of bounds - plugin may be missing) 
On the server side (command line):

Code: Select all

# /usr/lib64/nagios/plugins/check_nrpe -H 10.27.128.81 -c check_all_disks\!20%\!10%
DISK OK - free space: / 8135 MB (85% inode=92%);| /=1377MB;7960;8955;0;9951
And on the client side (command line - NRPE runs under 'nrpe'-account, that's why the 'su'):

Code: Select all

# su -c "/usr/lib64/nagios/plugins/check_nrpe -H localhost -c check_all_disks\!20%\!10% " nrpe
DISK OK - free space: / 8135 MB (85% inode=92%);| /=1377MB;7960;8955;0;9951
Another weird thing: I have a very simple bash-plugin:

Code: Select all

# cat /etc/nagios/check_omat
#!/bin/sh
#if [ ! -e /etc/nagios/outfile.txt ]
#then
#touch /etc/nagios/outfile.txt
#fi

echo $1 $2 $3 $4 $5 $6 $7 $8 > /etc/nagios/outfile.txt
echo SERVICE STATUS: OK
exit 0

Code: Select all

# tail /etc/nagios/nrpe.cfg
include_dir=/etc/nrpe.d/
command[check_all_disks]=/usr/lib64/nagios/plugins/check_disk -w $ARG1$ -c $ARG2$ -p /dev/vda
command[check_load]=/usr/lib64/nagios/plugins/check_load -w 60% -c 80%
command[check_users]=/usr/lib64/nagios/plugins/check_users -w 10 -c 20
command[df_var]=df /var/ | sed -re 's/.* ([0-9]+)%.*/\1/' | grep -E '^[0-9]'
command[check_swap]=/usr/lib64/nagios/plugins/check_swap -w $ARG1$ -c $ARG2$
command[load5]=cut /proc/loadavg -f 1 -d " "
command[xinetd]=/usr/lib64/nagios/plugins/check_procs -c 1: -a xinetd
command[httpd]=/usr/lib64/nagios/plugins/check_procs -c 1: -a httpd
command[check_omat]=/etc/nagios/check_omat -w $ARG1$ -c $ARG2$
It doesn't work even on the client command line.

Code: Select all

# /usr/lib64/nagios/plugins/check_nrpe -H localhost -c check_omat\!20%\!10%
NRPE: Unable to read output
Just to be sure:

Code: Select all

# su -c "/etc/nagios/check_omat 20% 10% 2>/dev/null" nrpe
SERVICE STATUS: OK

Re: NRPE response not found

Posted: Thu Jul 31, 2014 8:20 am
by eloyd
You have a lot for me to take a look at, but I wanted to start with this:
# /usr/lib64/nagios/plugins/check_nrpe -H localhost -c check_omat\!20%\!10%
NRPE: Unable to read output
When executing from the command line, you don't want to pass the ! as part of the arguments. You just want to do:

Code: Select all

# /usr/lib64/nagios/plugins/check_nrpe -H localhost -c check_omat -a "20% 10%"
So armed with that knowledge, can you try again and see if it works? Meanwhile, I'll go back and read everything again.

Re: NRPE response not found

Posted: Thu Jul 31, 2014 9:06 am
by turboscrew
Thanks for "hearing me out" so soon. :-)

About the command:

Code: Select all

Jul 26 05:21:12 elukancompute nrpe[11395]: Connection from 127.0.0.1 port 47275
Jul 26 05:21:12 elukancompute nrpe[11395]: Host address is in allowed_hosts
Jul 26 05:21:12 elukancompute nrpe[11395]: Handling the connection...
Jul 26 05:21:12 elukancompute nrpe[11395]: Host is asking for command 'check_omat' to be run...
Jul 26 05:21:12 elukancompute nrpe[11395]: Running command: /etc/nagios/check_omat 20% 10%
Jul 26 05:21:12 elukancompute nrpe[11395]: Command completed with return code 3 and output:
Jul 26 05:21:12 elukancompute nrpe[11395]: Return Code: 3, Output: NRPE: Unable to read output
Jul 26 05:21:12 elukancompute nrpe[11395]: [30B blob data]
Tried with inly one $ARG1$:

Code: Select all

Jul 26 05:11:09 elukancompute systemd[1]: Starting NRPE...
Jul 26 05:11:09 elukancompute nrpe[11312]: Added command[check_all_disks]=/usr/lib64/nagios/plugins/check_disk -w $ARG1$ -c
Jul 26 05:11:09 elukancompute nrpe[11312]: Added command[check_load]=/usr/lib64/nagios/plugins/check_load -w 60% -c 80%
Jul 26 05:11:09 elukancompute nrpe[11312]: Added command[check_users]=/usr/lib64/nagios/plugins/check_users -w 10 -c 20
Jul 26 05:11:09 elukancompute nrpe[11312]: Added command[df_var]=df /var/ | sed -re 's/.* ([0-9]+)%.*/\1/' | grep -E '^[0-9]
Jul 26 05:11:09 elukancompute nrpe[11312]: Added command[check_swap]=/usr/lib64/nagios/plugins/check_swap -w $ARG1$ -c $ARG2
Jul 26 05:11:09 elukancompute nrpe[11312]: Added command[load5]=cut /proc/loadavg -f 1 -d " "
Jul 26 05:11:09 elukancompute nrpe[11312]: Added command[xinetd]=/usr/lib64/nagios/plugins/check_procs -c 1: -a xinetd
Jul 26 05:11:09 elukancompute nrpe[11312]: Added command[httpd]=/usr/lib64/nagios/plugins/check_procs -c 1: -a httpd
Jul 26 05:11:09 elukancompute nrpe[11312]: Added command[check_omat]=/etc/nagios/check_omat $ARG1$
Jul 26 05:11:09 elukancompute nrpe[11312]: INFO: SSL/TLS initialized. All network traffic will be encrypted.
Jul 26 05:11:09 elukancompute systemd[1]: Started NRPE.
Jul 26 05:11:09 elukancompute nrpe[11313]: Starting up daemon
Jul 26 05:11:09 elukancompute nrpe[11313]: Server listening on 0.0.0.0 port 5666.
Jul 26 05:11:09 elukancompute nrpe[11313]: Server listening on :: port 5666.
Jul 26 05:11:09 elukancompute nrpe[11313]: Warning: Daemon is configured to accept command arguments from clients!
Jul 26 05:11:09 elukancompute nrpe[11313]: Listening for connections on port 0
Jul 26 05:11:09 elukancompute nrpe[11313]: Allowing connections from: 10.27.128.80, 127.0.0.1
Jul 26 05:12:07 elukancompute nrpe[11317]: Connection from 127.0.0.1 port 46251
Jul 26 05:12:07 elukancompute nrpe[11317]: Host address is in allowed_hosts
Jul 26 05:12:07 elukancompute nrpe[11317]: Handling the connection...
Jul 26 05:12:07 elukancompute nrpe[11317]: Host is asking for command 'check_omat' to be run...
Jul 26 05:12:07 elukancompute nrpe[11317]: Running command: /etc/nagios/check_omat 20% 10%
Jul 26 05:12:07 elukancompute nrpe[11317]: Command completed with return code 3 and output:
Jul 26 05:12:07 elukancompute nrpe[11317]: Return Code: 3, Output: NRPE: Unable to read output
At some point I tried that '-a' from the server and got:

Code: Select all

Jul 25 23:03:27 elukancompute nrpe[9622]: Connection from 127.0.0.1 port 44203
Jul 25 23:03:27 elukancompute nrpe[9622]: Host address is in allowed_hosts
Jul 25 23:03:27 elukancompute nrpe[9622]: Handling the connection...
Jul 25 23:03:27 elukancompute nrpe[9622]: Host is asking for command 'check_omat' to be run...
Jul 25 23:03:27 elukancompute nrpe[9622]: Running command: /etc/nagios/check_omat -w -a20% 10% -c
Jul 25 23:03:27 elukancompute nrpe[9622]: Command completed with return code 3 and output:
Jul 25 23:03:27 elukancompute nrpe[9622]: Return Code: 3, Output: NRPE: Unable to read output
Jul 25 23:03:27 elukancompute nrpe[9622]: [30B blob data]
If it has any meaning here, I'm using
RHEL 7 (both machines)

client:

Code: Select all

nagios-plugins-nrpe.x86_64                                         2.15-2.el7                                          @epel
nrpe.x86_64                                                        2.15-2.el7                                          @epel
server:

Code: Select all

nagios-plugins-all.x86_64                                          2.0.1-1.el7                                           @epel
nagios-plugins-nrpe.x86_64                                          2.15-2.el7                                           @epel
nagios.x86_64                                                3.5.1-1.el7                                                 @epel

Re: NRPE response not found

Posted: Thu Jul 31, 2014 9:17 am
by turboscrew
I wonder if there is any bit more detailed descriptions about how Nagios/NRPE works.
Not about installing and instead of "how to", rather " how does...".

Re: NRPE response not found

Posted: Thu Jul 31, 2014 9:25 am
by eloyd
NRPE works just like any other client/server thingy. But that's not important here. :-)

What is important is this:
Jul 25 23:03:27 elukancompute nrpe[9622]: Running command: /etc/nagios/check_omat -w -a20% 10% -c
So it looks like /etc/nagios/check_omat (is that the correct path) is being run, but the output is not properly formatted for Nagios. I will assume that this is a custom plugin you wrote. It looks like it is not properly following Nagios plugin requirements for writing your own plugin (http://nagios.sourceforge.net/docs/3_0/pluginapi.html).

What is the output if you run /etc/nagios/check_omat directly on the end host? And what is the output of:

Code: Select all

echo $?

Re: NRPE response not found

Posted: Thu Jul 31, 2014 9:32 am
by turboscrew
[root@elukancompute ~]# /etc/nagios/check_omat 20% 10%
SERVICE STATUS: OK
[root@elukancompute ~]# echo $?
0

And just in case:
[root@elukancompute ~]# su -c "/etc/nagios/check_omat 20% 10%" nrpe
SERVICE STATUS: OK

due to this:

Code: Select all

[root@elukancompute ~]# ps axu | grep nrpe
nrpe     11392  0.0  0.1  46288  1468 ?        Ss   05:20   0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
root     11688  0.0  0.0 112640   980 pts/0    S+   06:17   0:00 grep --color=auto nrpe
BTW, the commands without comamnd parameters seem to work fine. Both from command line and from Nagios server.

Also to show that the string is written in stdout:

Code: Select all

[root@elukancompute ~]# su -c "/etc/nagios/check_omat 20% 10% 2>/dev/null" nrpe
SERVICE STATUS: OK
[root@elukancompute ~]# su -c "/etc/nagios/check_omat 20% 10% 1>/dev/null" nrpe
[root@elukancompute ~]#

Re: NRPE response not found

Posted: Thu Jul 31, 2014 9:39 am
by eloyd
And now, what if you run it as the Nagios user, assuming your NRPE runs as Nagios?

Re: NRPE response not found

Posted: Thu Jul 31, 2014 9:57 am
by turboscrew
As you probably guessed, I added shell for user 'nrpe', and I also did that for 'nagios' to be able to run the commands locally
(by default, the shell for both 'nrpe' and 'nagios' is /sbin/nologin)
under those accounts:

Code: Select all

[root@elukancompute ~]# su -c "/etc/nagios/check_omat 20% 10% 2>/dev/null" nagios
SERVICE STATUS: OK
[root@elukancompute ~]# su -c "/usr/lib64/nagios/plugins/check_nrpe -H localhost -c check_omat -a '20% 10%'" nrpe
NRPE: Unable to read output
[root@elukancompute ~]# su -c "/usr/lib64/nagios/plugins/check_nrpe -H localhost -c check_omat -a '20% 10%'" nagios
NRPE: Unable to read output

Code: Select all

Jul 26 06:33:19 elukancompute su[11761]: pam_unix(su:session): session opened for user nrpe by ec2-user(uid=0)
Jul 26 06:33:19 elukancompute nrpe[11764]: Connection from 127.0.0.1 port 48043
Jul 26 06:33:19 elukancompute nrpe[11764]: Host address is in allowed_hosts
Jul 26 06:33:19 elukancompute nrpe[11764]: Handling the connection...
Jul 26 06:33:19 elukancompute nrpe[11764]: Host is asking for command 'check_omat' to be run...
Jul 26 06:33:19 elukancompute nrpe[11764]: Running command: /etc/nagios/check_omat 20% 10%
Jul 26 06:33:19 elukancompute nrpe[11764]: Command completed with return code 3 and output:
Jul 26 06:33:19 elukancompute nrpe[11764]: Return Code: 3, Output: NRPE: Unable to read output
Jul 26 06:33:19 elukancompute nrpe[11764]: [30B blob data]
Jul 26 06:33:19 elukancompute su[11761]: pam_unix(su:session): session closed for user nrpe
Jul 26 06:33:27 elukancompute su[11767]: (to nagios) ec2-user on pts/0
Jul 26 06:33:27 elukancompute su[11767]: pam_unix(su:session): session opened for user nagios by ec2-user(uid=0)
Jul 26 06:33:27 elukancompute nrpe[11770]: Connection from 127.0.0.1 port 48299
Jul 26 06:33:27 elukancompute nrpe[11770]: Host address is in allowed_hosts
Jul 26 06:33:27 elukancompute nrpe[11770]: Handling the connection...
Jul 26 06:33:27 elukancompute nrpe[11770]: Host is asking for command 'check_omat' to be run...
Jul 26 06:33:27 elukancompute nrpe[11770]: Running command: /etc/nagios/check_omat 20% 10%
Jul 26 06:33:27 elukancompute nrpe[11770]: Command completed with return code 3 and output:
Jul 26 06:33:27 elukancompute nrpe[11770]: Return Code: 3, Output: NRPE: Unable to read output
Jul 26 06:33:27 elukancompute nrpe[11770]: [30B blob data]
Jul 26 06:33:27 elukancompute su[11767]: pam_unix(su:session): session closed for user nagios

Re: NRPE response not found

Posted: Thu Jul 31, 2014 10:02 am
by turboscrew
Oops, forgot: as you saw from the 'ps' listing, NRPE is run as 'nrpe'.
(but still works when commanded as 'nagios')

Re: NRPE response not found

Posted: Thu Jul 31, 2014 10:19 am
by eloyd
Okay, I have to think about this one. Sorry for not having a quick fix.