NRPE response not found

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
turboscrew
Posts: 23
Joined: Wed Jul 30, 2014 6:15 am

NRPE response not found

Post by turboscrew »

What could be wrong here - why is the output null?
The used plugin is the "official" one unaltered.

On the Nagios server side logs:

Code: Select all

[1406800743.128670] [2048.1] [pid=25278]   Done.  Final output: ''/usr/lib64/nagios/plugins/check_nrpe -H 10.27.128.81 -c check_all_disks 20% 10%''
[1406800743.128673] [2048.1] [pid=25278] **** END MACRO PROCESSING *************
[1406800743.128718] [016.1] [pid=25278] Check result output will be written to '/var/log/nagios/spool/checkresults/checkoET2Gt' (fd=7)
[1406800743.128937] [016.2] [pid=25278] Service check is executing in child process (pid=25496)
[1406800743.131941] [016.2] [pid=25496] Moving temp check result file '/var/log/nagios/spool/checkresults/checkoET2Gt' to queue file '/var/log/nagios/spool/checkresults/cgcNt6M'...
[1406800752.135130] [016.0] [pid=25278] Starting to reap check results.
[1406800752.135189] [016.1] [pid=25278] Starting to read check result queue '/var/log/nagios/spool/checkresults'...
[1406800752.135220] [016.1] [pid=25278] Processing check result file: '/var/log/nagios/spool/checkresults/cgcNt6M'
[1406800752.135361] [016.2] [pid=25278] Found a check result (#1) to handle...
[1406800752.135373] [016.1] [pid=25278] Handling check result for service 'Disk usage' on host '10.27.128.81'...
[1406800752.135378] [016.0] [pid=25278] ** Handling check result for service 'Disk usage' on host '10.27.128.81'...
[1406800752.135382] [016.1] [pid=25278] HOST: 10.27.128.81, SERVICE: Disk usage, CHECK TYPE: Active, OPTIONS: 0, SCHEDULED: Yes, RESCHEDULE: Yes, EXITED OK: Yes, RETURN CODE: 127, OUTPUT: (null)
[1406800752.135454] [016.2] [pid=25278] ST: HARD  CA: 3  MA: 3  CS: 2  LS: 2  LHS: 2
[1406800752.135460] [016.1] [pid=25278] Service is in a non-OK state!
[1406800752.135464] [016.1] [pid=25278] Host is currently UP, so we'll recheck its state to make sure...
[1406800752.135468] [016.1] [pid=25278] * Using last known host state: 0
and on the client side logs:

Code: Select all

Jul 26 00:14:07 elukancompute nrpe[10027]: Connection from 10.27.128.80 port 7555
Jul 26 00:14:07 elukancompute nrpe[10027]: Host address is in allowed_hosts
Jul 26 00:14:07 elukancompute nrpe[10027]: Handling the connection...
Jul 26 00:14:07 elukancompute nrpe[10027]: Host is asking for command 'check_all_disks' to be run...
Jul 26 00:14:07 elukancompute nrpe[10027]: Running command: /usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /dev/vda
Jul 26 00:14:07 elukancompute nrpe[10027]: Command completed with return code 0 and output: DISK OK - free space: / 8135 MB
(85% inode=92%);| /=1377MB;7960;8955;0;9951
Jul 26 00:14:07 elukancompute nrpe[10027]: Return Code: 0, Output: DISK OK - free space: / 8135 MB (85% inode=92%);| /=1377M
B;7960;8955;0;9951
Jul 26 00:14:07 elukancompute nrpe[10027]: [30B blob data]
On the web-interface:

Code: Select all

Disk usage  CRITICAL 	07-31-2014 13:09:03 	1d 1h 52m 46s 	3/3 	(Return code of 127 is out of bounds - plugin may be missing) 
On the server side (command line):

Code: Select all

# /usr/lib64/nagios/plugins/check_nrpe -H 10.27.128.81 -c check_all_disks\!20%\!10%
DISK OK - free space: / 8135 MB (85% inode=92%);| /=1377MB;7960;8955;0;9951
And on the client side (command line - NRPE runs under 'nrpe'-account, that's why the 'su'):

Code: Select all

# su -c "/usr/lib64/nagios/plugins/check_nrpe -H localhost -c check_all_disks\!20%\!10% " nrpe
DISK OK - free space: / 8135 MB (85% inode=92%);| /=1377MB;7960;8955;0;9951
Another weird thing: I have a very simple bash-plugin:

Code: Select all

# cat /etc/nagios/check_omat
#!/bin/sh
#if [ ! -e /etc/nagios/outfile.txt ]
#then
#touch /etc/nagios/outfile.txt
#fi

echo $1 $2 $3 $4 $5 $6 $7 $8 > /etc/nagios/outfile.txt
echo SERVICE STATUS: OK
exit 0

Code: Select all

# tail /etc/nagios/nrpe.cfg
include_dir=/etc/nrpe.d/
command[check_all_disks]=/usr/lib64/nagios/plugins/check_disk -w $ARG1$ -c $ARG2$ -p /dev/vda
command[check_load]=/usr/lib64/nagios/plugins/check_load -w 60% -c 80%
command[check_users]=/usr/lib64/nagios/plugins/check_users -w 10 -c 20
command[df_var]=df /var/ | sed -re 's/.* ([0-9]+)%.*/\1/' | grep -E '^[0-9]'
command[check_swap]=/usr/lib64/nagios/plugins/check_swap -w $ARG1$ -c $ARG2$
command[load5]=cut /proc/loadavg -f 1 -d " "
command[xinetd]=/usr/lib64/nagios/plugins/check_procs -c 1: -a xinetd
command[httpd]=/usr/lib64/nagios/plugins/check_procs -c 1: -a httpd
command[check_omat]=/etc/nagios/check_omat -w $ARG1$ -c $ARG2$
It doesn't work even on the client command line.

Code: Select all

# /usr/lib64/nagios/plugins/check_nrpe -H localhost -c check_omat\!20%\!10%
NRPE: Unable to read output
Just to be sure:

Code: Select all

# su -c "/etc/nagios/check_omat 20% 10% 2>/dev/null" nrpe
SERVICE STATUS: OK
User avatar
eloyd
Cool Title Here
Posts: 2190
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: NRPE response not found

Post by eloyd »

You have a lot for me to take a look at, but I wanted to start with this:
# /usr/lib64/nagios/plugins/check_nrpe -H localhost -c check_omat\!20%\!10%
NRPE: Unable to read output
When executing from the command line, you don't want to pass the ! as part of the arguments. You just want to do:

Code: Select all

# /usr/lib64/nagios/plugins/check_nrpe -H localhost -c check_omat -a "20% 10%"
So armed with that knowledge, can you try again and see if it works? Meanwhile, I'll go back and read everything again.
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
turboscrew
Posts: 23
Joined: Wed Jul 30, 2014 6:15 am

Re: NRPE response not found

Post by turboscrew »

Thanks for "hearing me out" so soon. :-)

About the command:

Code: Select all

Jul 26 05:21:12 elukancompute nrpe[11395]: Connection from 127.0.0.1 port 47275
Jul 26 05:21:12 elukancompute nrpe[11395]: Host address is in allowed_hosts
Jul 26 05:21:12 elukancompute nrpe[11395]: Handling the connection...
Jul 26 05:21:12 elukancompute nrpe[11395]: Host is asking for command 'check_omat' to be run...
Jul 26 05:21:12 elukancompute nrpe[11395]: Running command: /etc/nagios/check_omat 20% 10%
Jul 26 05:21:12 elukancompute nrpe[11395]: Command completed with return code 3 and output:
Jul 26 05:21:12 elukancompute nrpe[11395]: Return Code: 3, Output: NRPE: Unable to read output
Jul 26 05:21:12 elukancompute nrpe[11395]: [30B blob data]
Tried with inly one $ARG1$:

Code: Select all

Jul 26 05:11:09 elukancompute systemd[1]: Starting NRPE...
Jul 26 05:11:09 elukancompute nrpe[11312]: Added command[check_all_disks]=/usr/lib64/nagios/plugins/check_disk -w $ARG1$ -c
Jul 26 05:11:09 elukancompute nrpe[11312]: Added command[check_load]=/usr/lib64/nagios/plugins/check_load -w 60% -c 80%
Jul 26 05:11:09 elukancompute nrpe[11312]: Added command[check_users]=/usr/lib64/nagios/plugins/check_users -w 10 -c 20
Jul 26 05:11:09 elukancompute nrpe[11312]: Added command[df_var]=df /var/ | sed -re 's/.* ([0-9]+)%.*/\1/' | grep -E '^[0-9]
Jul 26 05:11:09 elukancompute nrpe[11312]: Added command[check_swap]=/usr/lib64/nagios/plugins/check_swap -w $ARG1$ -c $ARG2
Jul 26 05:11:09 elukancompute nrpe[11312]: Added command[load5]=cut /proc/loadavg -f 1 -d " "
Jul 26 05:11:09 elukancompute nrpe[11312]: Added command[xinetd]=/usr/lib64/nagios/plugins/check_procs -c 1: -a xinetd
Jul 26 05:11:09 elukancompute nrpe[11312]: Added command[httpd]=/usr/lib64/nagios/plugins/check_procs -c 1: -a httpd
Jul 26 05:11:09 elukancompute nrpe[11312]: Added command[check_omat]=/etc/nagios/check_omat $ARG1$
Jul 26 05:11:09 elukancompute nrpe[11312]: INFO: SSL/TLS initialized. All network traffic will be encrypted.
Jul 26 05:11:09 elukancompute systemd[1]: Started NRPE.
Jul 26 05:11:09 elukancompute nrpe[11313]: Starting up daemon
Jul 26 05:11:09 elukancompute nrpe[11313]: Server listening on 0.0.0.0 port 5666.
Jul 26 05:11:09 elukancompute nrpe[11313]: Server listening on :: port 5666.
Jul 26 05:11:09 elukancompute nrpe[11313]: Warning: Daemon is configured to accept command arguments from clients!
Jul 26 05:11:09 elukancompute nrpe[11313]: Listening for connections on port 0
Jul 26 05:11:09 elukancompute nrpe[11313]: Allowing connections from: 10.27.128.80, 127.0.0.1
Jul 26 05:12:07 elukancompute nrpe[11317]: Connection from 127.0.0.1 port 46251
Jul 26 05:12:07 elukancompute nrpe[11317]: Host address is in allowed_hosts
Jul 26 05:12:07 elukancompute nrpe[11317]: Handling the connection...
Jul 26 05:12:07 elukancompute nrpe[11317]: Host is asking for command 'check_omat' to be run...
Jul 26 05:12:07 elukancompute nrpe[11317]: Running command: /etc/nagios/check_omat 20% 10%
Jul 26 05:12:07 elukancompute nrpe[11317]: Command completed with return code 3 and output:
Jul 26 05:12:07 elukancompute nrpe[11317]: Return Code: 3, Output: NRPE: Unable to read output
At some point I tried that '-a' from the server and got:

Code: Select all

Jul 25 23:03:27 elukancompute nrpe[9622]: Connection from 127.0.0.1 port 44203
Jul 25 23:03:27 elukancompute nrpe[9622]: Host address is in allowed_hosts
Jul 25 23:03:27 elukancompute nrpe[9622]: Handling the connection...
Jul 25 23:03:27 elukancompute nrpe[9622]: Host is asking for command 'check_omat' to be run...
Jul 25 23:03:27 elukancompute nrpe[9622]: Running command: /etc/nagios/check_omat -w -a20% 10% -c
Jul 25 23:03:27 elukancompute nrpe[9622]: Command completed with return code 3 and output:
Jul 25 23:03:27 elukancompute nrpe[9622]: Return Code: 3, Output: NRPE: Unable to read output
Jul 25 23:03:27 elukancompute nrpe[9622]: [30B blob data]
If it has any meaning here, I'm using
RHEL 7 (both machines)

client:

Code: Select all

nagios-plugins-nrpe.x86_64                                         2.15-2.el7                                          @epel
nrpe.x86_64                                                        2.15-2.el7                                          @epel
server:

Code: Select all

nagios-plugins-all.x86_64                                          2.0.1-1.el7                                           @epel
nagios-plugins-nrpe.x86_64                                          2.15-2.el7                                           @epel
nagios.x86_64                                                3.5.1-1.el7                                                 @epel
turboscrew
Posts: 23
Joined: Wed Jul 30, 2014 6:15 am

Re: NRPE response not found

Post by turboscrew »

I wonder if there is any bit more detailed descriptions about how Nagios/NRPE works.
Not about installing and instead of "how to", rather " how does...".
User avatar
eloyd
Cool Title Here
Posts: 2190
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: NRPE response not found

Post by eloyd »

NRPE works just like any other client/server thingy. But that's not important here. :-)

What is important is this:
Jul 25 23:03:27 elukancompute nrpe[9622]: Running command: /etc/nagios/check_omat -w -a20% 10% -c
So it looks like /etc/nagios/check_omat (is that the correct path) is being run, but the output is not properly formatted for Nagios. I will assume that this is a custom plugin you wrote. It looks like it is not properly following Nagios plugin requirements for writing your own plugin (http://nagios.sourceforge.net/docs/3_0/pluginapi.html).

What is the output if you run /etc/nagios/check_omat directly on the end host? And what is the output of:

Code: Select all

echo $?
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
turboscrew
Posts: 23
Joined: Wed Jul 30, 2014 6:15 am

Re: NRPE response not found

Post by turboscrew »

[root@elukancompute ~]# /etc/nagios/check_omat 20% 10%
SERVICE STATUS: OK
[root@elukancompute ~]# echo $?
0

And just in case:
[root@elukancompute ~]# su -c "/etc/nagios/check_omat 20% 10%" nrpe
SERVICE STATUS: OK

due to this:

Code: Select all

[root@elukancompute ~]# ps axu | grep nrpe
nrpe     11392  0.0  0.1  46288  1468 ?        Ss   05:20   0:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
root     11688  0.0  0.0 112640   980 pts/0    S+   06:17   0:00 grep --color=auto nrpe
BTW, the commands without comamnd parameters seem to work fine. Both from command line and from Nagios server.

Also to show that the string is written in stdout:

Code: Select all

[root@elukancompute ~]# su -c "/etc/nagios/check_omat 20% 10% 2>/dev/null" nrpe
SERVICE STATUS: OK
[root@elukancompute ~]# su -c "/etc/nagios/check_omat 20% 10% 1>/dev/null" nrpe
[root@elukancompute ~]#
Last edited by turboscrew on Thu Jul 31, 2014 9:45 am, edited 1 time in total.
User avatar
eloyd
Cool Title Here
Posts: 2190
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: NRPE response not found

Post by eloyd »

And now, what if you run it as the Nagios user, assuming your NRPE runs as Nagios?
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
turboscrew
Posts: 23
Joined: Wed Jul 30, 2014 6:15 am

Re: NRPE response not found

Post by turboscrew »

As you probably guessed, I added shell for user 'nrpe', and I also did that for 'nagios' to be able to run the commands locally
(by default, the shell for both 'nrpe' and 'nagios' is /sbin/nologin)
under those accounts:

Code: Select all

[root@elukancompute ~]# su -c "/etc/nagios/check_omat 20% 10% 2>/dev/null" nagios
SERVICE STATUS: OK
[root@elukancompute ~]# su -c "/usr/lib64/nagios/plugins/check_nrpe -H localhost -c check_omat -a '20% 10%'" nrpe
NRPE: Unable to read output
[root@elukancompute ~]# su -c "/usr/lib64/nagios/plugins/check_nrpe -H localhost -c check_omat -a '20% 10%'" nagios
NRPE: Unable to read output

Code: Select all

Jul 26 06:33:19 elukancompute su[11761]: pam_unix(su:session): session opened for user nrpe by ec2-user(uid=0)
Jul 26 06:33:19 elukancompute nrpe[11764]: Connection from 127.0.0.1 port 48043
Jul 26 06:33:19 elukancompute nrpe[11764]: Host address is in allowed_hosts
Jul 26 06:33:19 elukancompute nrpe[11764]: Handling the connection...
Jul 26 06:33:19 elukancompute nrpe[11764]: Host is asking for command 'check_omat' to be run...
Jul 26 06:33:19 elukancompute nrpe[11764]: Running command: /etc/nagios/check_omat 20% 10%
Jul 26 06:33:19 elukancompute nrpe[11764]: Command completed with return code 3 and output:
Jul 26 06:33:19 elukancompute nrpe[11764]: Return Code: 3, Output: NRPE: Unable to read output
Jul 26 06:33:19 elukancompute nrpe[11764]: [30B blob data]
Jul 26 06:33:19 elukancompute su[11761]: pam_unix(su:session): session closed for user nrpe
Jul 26 06:33:27 elukancompute su[11767]: (to nagios) ec2-user on pts/0
Jul 26 06:33:27 elukancompute su[11767]: pam_unix(su:session): session opened for user nagios by ec2-user(uid=0)
Jul 26 06:33:27 elukancompute nrpe[11770]: Connection from 127.0.0.1 port 48299
Jul 26 06:33:27 elukancompute nrpe[11770]: Host address is in allowed_hosts
Jul 26 06:33:27 elukancompute nrpe[11770]: Handling the connection...
Jul 26 06:33:27 elukancompute nrpe[11770]: Host is asking for command 'check_omat' to be run...
Jul 26 06:33:27 elukancompute nrpe[11770]: Running command: /etc/nagios/check_omat 20% 10%
Jul 26 06:33:27 elukancompute nrpe[11770]: Command completed with return code 3 and output:
Jul 26 06:33:27 elukancompute nrpe[11770]: Return Code: 3, Output: NRPE: Unable to read output
Jul 26 06:33:27 elukancompute nrpe[11770]: [30B blob data]
Jul 26 06:33:27 elukancompute su[11767]: pam_unix(su:session): session closed for user nagios
Last edited by turboscrew on Thu Jul 31, 2014 10:03 am, edited 1 time in total.
turboscrew
Posts: 23
Joined: Wed Jul 30, 2014 6:15 am

Re: NRPE response not found

Post by turboscrew »

Oops, forgot: as you saw from the 'ps' listing, NRPE is run as 'nrpe'.
(but still works when commanded as 'nagios')
User avatar
eloyd
Cool Title Here
Posts: 2190
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: NRPE response not found

Post by eloyd »

Okay, I have to think about this one. Sorry for not having a quick fix.
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
Locked