Sudo updated and now sudo scripts fail: NRPE: Unable to read

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Sudo updated and now sudo scripts fail: NRPE: Unable to read

Post by gormank »

As the subject says, sudo was updated and now sudo scripts fail with NRPE: Unable to read output. The nagios servers were't updated and they don't have the problem. This is happening of ~40 servers in 2 locations using 2 Nagios servers.

I was told the sudo config wasn't changed, just that sudo was updated.

Below you can see the configs, and command output when run from the Nagios server and from one of the servers where the commands fail. There are 3 sudo commands defined in sudoers, and all fail when run via Nagios.

The command completes very quickly as if it isn't getting very far. Is there a way to increase log/debug levels to see more of the process? Nothing is interesting logged in either syslog or in nagios.log.

I've been through the NRPE troubleshooting doc with no improvement in the situation.
I also installed and configured NRPE on another machine and there was no change.

# rpm -qa | grep sudo
sudo-1.8.14-3.el5

# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.10 (Tikanga)

# cat /etc/xinetd.d/nrpe
service nrpe
{
flags = REUSE
socket_type = stream
port = 5666
wait = no
user = nagios
group = nagios
server = /usr/local/nagios/bin/nrpe
server_args = -c /usr/local/nagios/etc/nrpe.cfg --inetd
log_on_success =
log_on_failure += USERID
disable = no
only_from = 127.0.0.1 10.133.134.84 10.133.134.85 10.136.243.84 10.136.243.85
}

# grep -v ^# /usr/local/nagios/etc/nrpe.cfg | sort -u

command[check_alldiskIO]=/usr/local/nagios/libexec/check_all_diskstat.sh
command[check_cpuload2]=/usr/local/nagios/libexec/check_cpu_perf.sh $ARG1$
command[check_init_service]=sudo /usr/local/nagios/libexec/check_init_service $ARG1$
command[check_load]=/usr/local/nagios/libexec/check_load $ARG1$
command[check_log]=/usr/local/nagios/libexec/check_log $ARG1$
command[check_netbackup]=/usr/local/nagios/libexec/check_netbackup.pl $ARG1$
command[check_net_int]=/usr/local/nagios/libexec/check_net_int.sh
command[check_process]=/usr/local/nagios/libexec/check_process $ARG1$
command[check_procs]=/usr/local/nagios/libexec/check_procs $ARG1$
command[check_unix_log]=sudo /usr/local/nagios/libexec/check_unix_log.pl $ARG1$
command[check_vxvm]=sudo /usr/local/nagios/libexec/check_vxvm
command_timeout=60
connection_timeout=300
debug=0
dont_blame_nrpe=1
include_dir=/usr/local/nagios/etc/nrpe
log_facility=syslog
nrpe_group=nagios
nrpe_user=nagios
pid_file=/var/run/nrpe.pid
server_port=5666

# grep -v ^# /etc/sudoers | grep -i -e nagios -e tty
Defaults requiretty
Defaults:nagios !requiretty
nagios ALL=NOPASSWD: /usr/local/nagios/libexec/check_init_service
nagios ALL=NOPASSWD: /usr/local/nagios/libexec/check_vxvm
nagios ALL = NOPASSWD:/usr/local/nagios/libexec/check_unix_log.pl

COMMAND: /usr/local/nagios/libexec/check_nrpe -H app002 -t 30 -c check_unix_log -a '-l /var/log/messages -i nrpe -w error,crit,alert,emerg -f /usr/local/nagios/var/nagios.tmp.messages.stat -p'
OUTPUT: NRPE: Unable to read output

[root@app002 ~]# su - nagios
[nagios@app002 ~]$ sudo /usr/local/nagios/libexec/check_unix_log.pl -l /var/log/messages -w alert
WARNING: /var/log/messages contains 94 new instances of: alert.

[root@nag001 log]# time /usr/local/nagios/libexec/check_nrpe -H txslm2mlapp002 -t 30 -c check_unix_log -a '-l /tmp/messages.test -w error,crit,alert,emerg'
NRPE: Unable to read output

real 0m0.028s
user 0m0.005s
sys 0m0.005s

[root@nag001 log]# /usr/local/nagios/libexec/check_nrpe -H txslm2mlapp002
NRPE v2.15
You do not have the required permissions to view the files attached to this post.
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Sudo updated and now sudo scripts fail: NRPE: Unable to

Post by Box293 »

Very strange error, everything seems correct.

Maybe stop xinetd and kill any possible duplicate services:

Code: Select all

service xinetd stop
killall xinetd
service xinetd start
If that doesn't help, can you run these commands and post the output:

Code: Select all

grep nag /etc/passwd
grep nag /etc/group
ls -al /usr/local/nagios/libexec
We can turn on NRPE debugging to collect more information.

Edit the file:
/usr/local/nagios/etc/nrpe.cfg

Define
debug=1
(it will currently be debug=0)

Save the file and

Code: Select all

service xinetd restart
Now we need to add an option to the rsyslog server so it processes debug messages
Edit the file:
/etc/rsyslogd.conf
Find /var/log/messages
The line in the config file will look like:
*.info;mail.none;authpriv.none;cron.none /var/log/messages

We need to add the following to the line:
*.info;mail.none;authpriv.none;cron.none;daemon.debug /var/log/messages

Save the file and

Code: Select all

service rsyslog restart
Now there should be more information logged in /var/log/messages

Does this produce anything valuable?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: Sudo updated and now sudo scripts fail: NRPE: Unable to

Post by gormank »

I'll change the debug options and post results...

Code: Select all

[root@txslm2mlapp002 ~]# uptime
 15:24:29 up 11:39,  1 user,  load average: 0.11, 0.22, 0.19

[root@txslm2mlapp002 ~]# ps -ef | grep xinetd
root     10491     1  0 03:46 ?        00:00:01 xinetd -stayalive -pidfile /var/run/xinetd.pid
root     12593 12544  0 15:23 pts/1    00:00:00 grep xinetd

[root@txslm2mlapp002 ~]# grep nag /etc/passwd
nagiosnull:x:507:507:Null placeholder:/home/nagios:/sbin/nologin
nagios:x:508:508:Nagios Application account:/home/nagios:/bin/bash
pmpolicy:x:524:524:PM4S Policy Manager Account:/var/opt/quest/qpm4u/pmpolicy:/opt/quest/libexec/pmconfpoluser

[root@txslm2mlapp002 ~]# grep nag /etc/group
nagcmd:x:507:nagios
nagios:x:508:nagios

[root@txslm2mlapp002 ~]# ls -al /usr/local/nagios/libexec
total 7148
drwxrwxr-x 2 nagios nagios   4096 Jun 19 15:10 .
drwxr-xr-x 8 nagios nagios   4096 Apr 21 20:19 ..
-rwxr-xr-x 1 root   root      374 Jun  4 22:20 check_all_diskstat.sh
-rwxr-xr-x 1 root   root   201213 Apr 21 20:19 check_apt
-rwxr-xr-x 1 root   root     6897 Apr 21 20:20 check_asterisk.pl
-rwxr-xr-x 1 root   root     1978 Apr 21 20:20 check_asterisk_sip_peers.sh
-rwxr-xr-x 1 root   root     2242 Apr 21 20:19 check_breeze
-rwxr-xr-x 1 root   root   197506 Apr 21 20:19 check_by_ssh
lrwxrwxrwx 1 root   root        9 Apr 21 20:19 check_clamd -> check_tcp
-rwxr-xr-x 1 root   root   151149 Apr 21 20:19 check_cluster
-rwxr-xr-x 1 root   root     6557 Jun 19 15:10 check_cpu_perf.sh
-rwxr-xr-x 1 root   root     5355 Apr 21 20:20 check_cpu_stats.sh
-rwxr-xr-x 1 root   root   188670 Apr 21 20:19 check_dhcp
-rwxr-xr-x 1 root   root   192444 Apr 21 20:19 check_dig
-rwxr-xr-x 1 root   root   207796 Apr 21 20:19 check_disk
-rwxr-xr-x 1 root   root     9289 Apr 21 20:19 check_disk_smb
-rwxr-xr-x 1 root   root     4835 Jun  4 22:20 check_diskstat.sh
-rwxr-xr-x 1 root   root   207258 Apr 21 20:19 check_dns
-rwxr-xr-x 1 root   root    93388 Apr 21 20:19 check_dummy
-rwxr-xr-x 1 root   root     3349 Apr 21 20:19 check_file_age
-rwxr-xr-x 1 root   root     6315 Apr 21 20:19 check_flexlm
lrwxrwxrwx 1 root   root        9 Apr 21 20:19 check_ftp -> check_tcp
-rwxr-xr-x 1 root   root   364815 Apr 21 20:19 check_http
-rwxr-xr-x 1 root   root   193238 Apr 21 20:19 check_icmp
-rwxr-xr-x 1 root   root   158979 Apr 21 20:19 check_ide_smart
-rwxr-xr-x 1 root   root    15123 Apr 21 20:19 check_ifoperstatus
-rwxr-xr-x 1 root   root    12600 Apr 21 20:19 check_ifstatus
lrwxrwxrwx 1 root   root        9 Apr 21 20:19 check_imap -> check_tcp
-rwxr-xr-x 1 root   nagios    859 Jun 29 15:55 check_init_service
-rwxr-xr-x 1 root   root     6887 Apr 21 20:19 check_ircd
lrwxrwxrwx 1 root   root        9 Apr 21 20:19 check_jabber -> check_tcp
-rwxr-xr-x 1 root   root   184919 Apr 21 20:19 check_load
-rwxr-xr-x 1 root   root     5989 Apr 21 20:19 check_log
-rwxr-xr-x 1 root   root    21480 Apr 21 20:19 check_mailq
-rwxr-xr-x 1 root   root   157437 Apr 21 20:19 check_mrtg
-rwxr-xr-x 1 root   root   158082 Apr 21 20:19 check_mrtgtraf
-rwxr-xr-x 1 root   root   175481 Apr 21 20:19 check_nagios
-rwxr-xr-x 1 root   root     4238 Jun 26 22:14 check_netbackup.pl
-rwxr-xr-x 1 root   root     1489 Jun 26 18:35 check_net_int.sh
-rwxr-xr-x 1 root   root    25602 Apr 21 20:20 check_netstat.pl
lrwxrwxrwx 1 root   root        9 Apr 21 20:19 check_nntp -> check_tcp
lrwxrwxrwx 1 root   root        9 Apr 21 20:19 check_nntps -> check_tcp
-rwxr-xr-x 1 nagios nagios  69790 Apr 21 20:19 check_nrpe
-rwxr-xr-x 1 root   root   188470 Apr 21 20:19 check_nt
-rwxr-xr-x 1 root   root   193409 Apr 21 20:19 check_ntp
-rwxr-xr-x 1 root   root   184994 Apr 21 20:19 check_ntp_peer
-rwxr-xr-x 1 root   root   184107 Apr 21 20:19 check_ntp_time
-rwxr-xr-x 1 root   root   211583 Apr 21 20:19 check_nwstat
-rwxr-xr-x 1 root   root     3259 Apr 21 20:20 check_open_files.pl
-rwxr-xr-x 1 root   root     8779 Apr 21 20:19 check_oracle
-rwxr-xr-x 1 root   root   172377 Apr 21 20:19 check_overcr
-rwxr-xr-x 1 root   root   213009 Apr 21 20:19 check_ping
lrwxrwxrwx 1 root   root        9 Apr 21 20:19 check_pop -> check_tcp
-rwxr-xr-x 1 root   root    24013 Jun 12 18:44 check_process
-rwxr-xr-x 1 root   root   200800 Apr 21 20:19 check_procs
-rwxr-xr-x 1 root   root   170235 Apr 21 20:19 check_real
-rwxr-xr-x 1 root   root     9581 Apr 21 20:19 check_rpc
-rwxr-xr-x 1 root   root     1453 Apr 21 20:19 check_sensors
-rwxr-xr-x 1 root   root     2174 Apr 21 20:20 check_services
lrwxrwxrwx 1 root   root        9 Apr 21 20:19 check_simap -> check_tcp
-rwxr-xr-x 1 root   root     7599 Apr 21 20:20 check_sip
-rwxr-xr-x 1 root   root   254037 Apr 21 20:19 check_smtp
lrwxrwxrwx 1 root   root        9 Apr 21 20:19 check_spop -> check_tcp
-rwxr-xr-x 1 root   root   170231 Apr 21 20:19 check_ssh
lrwxrwxrwx 1 root   root        9 Apr 21 20:19 check_ssmtp -> check_tcp
-rwxr-xr-x 1 root   root   156737 Apr 21 20:19 check_swap
-rwxr-xr-x 1 root   root   230343 Apr 21 20:19 check_tcp
-rwxr-xr-x 1 root   root   173261 Apr 21 20:19 check_time
lrwxrwxrwx 1 root   root        9 Apr 21 20:19 check_udp -> check_tcp
-rwxr-xr-x 1 root   root     5423 Jul 27 20:43 check_unix_log.pl
-rwxr-xr-x 1 root   root   179501 Apr 21 20:19 check_ups
-rwxr-xr-x 1 root   root   151651 Apr 21 20:19 check_uptime
-rwxr-xr-x 1 root   root   150273 Apr 21 20:19 check_users
-rwxr-xr-x 1 root   root     2936 Apr 21 20:19 check_wave
-rwxr-xr-x 1 root   root      710 Apr 21 20:20 check_yum
-rwxr-xr-x 1 root   root     3060 Apr 21 20:20 custom_check_mem
-rwxr-xr-x 1 root   root      915 Apr 21 20:20 custom_check_procs
-rwxr-xr-x 1 root   root     4176 Apr 21 20:20 nagisk.pl
-rwxr-xr-x 1 root   root   142078 Apr 21 20:19 negate
-rwxr-xr-x 1 root   root    58727 Apr 21 20:20 send_nsca
-rwxr-xr-x 1 root   root   148392 Apr 21 20:19 urlize
-rwxr-xr-x 1 root   root     1913 Apr 21 20:19 utils.pm
-rwxr-xr-x 1 root   root     2791 Apr 21 20:19 utils.sh
Last edited by gormank on Thu Sep 10, 2015 10:40 am, edited 1 time in total.
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Sudo updated and now sudo scripts fail: NRPE: Unable to

Post by jdalrymple »

I have a few ideas:

1) Take a look at the information posted above by Box293 regarding implementing nrpe debugging - that may help.
2) Take a look at /var/log/secure to see if there are any auth failures (I see you're using AD auth - that could be at play)
3) Try adding your args to nrpe.cfg temporarily and omitting the args from your nrpe check and make sure that's not where the problem lies
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: Sudo updated and now sudo scripts fail: NRPE: Unable to

Post by gormank »

I guessed the debugging was to be done on the nagios server since rsyslog isn't used on monitored servers...
The logged info is not interesting:
grep -i nrpe /var/log/messages
...
Sep 10 15:47:22 txslm2mlnag001 nrpe[31088]: INFO: SSL/TLS initialized. All network traffic will be encrypted.
Sep 10 15:47:26 txslm2mlnag001 nrpe[31148]: INFO: SSL/TLS initialized. All network traffic will be encrypted.
Sep 10 15:47:51 txslm2mlnag001 nrpe[31327]: INFO: SSL/TLS initialized. All network traffic will be encrypted.
Sep 10 15:47:52 txslm2mlnag001 nrpe[31341]: INFO: SSL/TLS initialized. All network traffic will be encrypted.
Sep 10 15:48:11 txslm2mlnag001 nrpe[31517]: INFO: SSL/TLS initialized. All network traffic will be encrypted.
Sep 10 15:48:11 txslm2mlnag001 nrpe[31518]: INFO: SSL/TLS initialized. All network traffic will be encrypted.
Sep 10 15:48:13 txslm2mlnag001 nrpe[31574]: INFO: SSL/TLS initialized. All network traffic will be encrypted.
Sep 10 15:48:32 txslm2mlnag001 nrpe[31667]: INFO: SSL/TLS initialized. All network traffic will be encrypted.

I did the same change to syslog.conf on a monitored server and the logging is the same as above.
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: Sudo updated and now sudo scripts fail: NRPE: Unable to

Post by gormank »

/var/log/secure has nothing of interest.
The lines below containing nagios are logged when I su - nagios and run the script manually. This makes me wonder (along with the quick execution shown in the original post) if the command is even being attempted on the monitored system.

Sep 10 16:01:01 txslm2mlapp002 crond[19571]: pam_tty_audit(crond:session): restored status to 0
Sep 10 16:06:05 txslm2mlapp002 su[20189]: pam_unix(su-l:session): session opened for user nagios by root(uid=0)
Sep 10 16:06:05 txslm2mlapp002 su[20189]: pam_tty_audit(su-l:session): changed status from 1 to 0
Sep 10 16:06:09 txslm2mlapp002 sudo: nagios : TTY=pts/1 ; PWD=/home/nagios ; USER=root ; COMMAND=/usr/local/nagios/libexec/check_unix_log.pl -l /var/log/messages -i nrpe -w error,crit,alert,emerg -f /usr/local/nagios/var/nagios.tmp.messages.stat -p
Sep 10 16:06:47 txslm2mlapp002 su[20189]: pam_unix(su-l:session): session closed for user nagios
Sep 10 16:06:47 txslm2mlapp002 su[20189]: pam_tty_audit(su-l:session): restored status to 1
Sep 10 16:10:01 txslm2mlapp002 crond[20678]: pam_unix(crond:session): session opened for user root by (uid=0)

If I look at this file on the nagios server, which is just another monitored server, I have lots of entries:

Sep 10 16:22:46 txslm2mlnag001 sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/nagios/libexec/check_unix_log.pl -l /var/log/messages -i nrpe -w error,crit,alert,emerg -f /usr/local/nagios/var/nagios.tmp.messages.stat -p
Sep 10 16:22:48 txslm2mlnag001 sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/nagios/libexec/check_init_service snmptt
Sep 10 16:23:11 txslm2mlnag001 sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/nagios/libexec/check_init_service postgresql
Sep 10 16:23:11 txslm2mlnag001 sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/nagios/libexec/check_init_service ndo2db
Sep 10 16:23:42 txslm2mlnag001 sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/nagios/libexec/check_init_service gearmand
Sep 10 16:23:43 txslm2mlnag001 sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/nagios/libexec/check_init_service mysqld
Sep 10 16:24:20 txslm2mlnag001 sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/nagios/libexec/check_init_service snmptt
Sep 10 16:24:42 txslm2mlnag001 sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/nagios/libexec/check_init_service postgresql
Sep 10 16:25:12 txslm2mlnag001 sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/nagios/libexec/check_init_service httpd
Sep 10 16:25:48 txslm2mlnag001 sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/nagios/libexec/check_init_service httpd
Sep 10 16:25:54 txslm2mlnag001 sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/nagios/libexec/check_init_service gearmand

There are many other checks running on the monitored servers that don't use sudo, all working fine.

Moving the args to nrpe.cfg changes nothing...

[root@txslm2mlapp002 ~]# grep check_unix_log /usr/local/nagios/etc/nrpe.cfg
#command[check_unix_log]=sudo /usr/local/nagios/libexec/check_unix_log.pl $ARG1$
command[check_unix_log]=sudo /usr/local/nagios/libexec/check_unix_log.pl -l $ARG1$ /var/log/messages -i nrpe -w error,crit,alert,emerg -f /usr/local/nagios/var/nagios.tmp.messages.stat -p

COMMAND: /usr/local/nagios/libexec/check_nrpe -H txslm2mlapp002 -t 30 -c check_unix_log
OUTPUT: NRPE: Unable to read output
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Sudo updated and now sudo scripts fail: NRPE: Unable to

Post by tgriep »

Are the checks that use the check_init_service plugin working or are they failing the same?
If so, the group owner for that file is set to nagios, try changing the owner to nagios for the check_unix_log.pl plugin and see if that helps.
When sudo was upgraded, what version did it get upgraded too?
Be sure to check out our Knowledgebase for helpful articles and solutions!
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: Sudo updated and now sudo scripts fail: NRPE: Unable to

Post by gormank »

All sudo checks fail on all the servers that had sudo updated. Sudo wasn't updated on the nagios servers where sudo checks still work.

# rpm -qa | grep sudo
sudo-1.8.14-3.el5
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Sudo updated and now sudo scripts fail: NRPE: Unable to

Post by ssax »

What is the output of this command on a non-working one?

Code: Select all

visudo -c
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: Sudo updated and now sudo scripts fail: NRPE: Unable to

Post by gormank »

/etc/sudoers: parsed OK
Locked