Page 1 of 3
Sudo updated and now sudo scripts fail: NRPE: Unable to read
Posted: Wed Sep 09, 2015 5:03 pm
by gormank
As the subject says, sudo was updated and now sudo scripts fail with NRPE: Unable to read output. The nagios servers were't updated and they don't have the problem. This is happening of ~40 servers in 2 locations using 2 Nagios servers.
I was told the sudo config wasn't changed, just that sudo was updated.
Below you can see the configs, and command output when run from the Nagios server and from one of the servers where the commands fail. There are 3 sudo commands defined in sudoers, and all fail when run via Nagios.
The command completes very quickly as if it isn't getting very far. Is there a way to increase log/debug levels to see more of the process? Nothing is interesting logged in either syslog or in nagios.log.
I've been through the NRPE troubleshooting doc with no improvement in the situation.
I also installed and configured NRPE on another machine and there was no change.
# rpm -qa | grep sudo
sudo-1.8.14-3.el5
# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.10 (Tikanga)
# cat /etc/xinetd.d/nrpe
service nrpe
{
flags = REUSE
socket_type = stream
port = 5666
wait = no
user = nagios
group = nagios
server = /usr/local/nagios/bin/nrpe
server_args = -c /usr/local/nagios/etc/nrpe.cfg --inetd
log_on_success =
log_on_failure += USERID
disable = no
only_from = 127.0.0.1 10.133.134.84 10.133.134.85 10.136.243.84 10.136.243.85
}
# grep -v ^# /usr/local/nagios/etc/nrpe.cfg | sort -u
command[check_alldiskIO]=/usr/local/nagios/libexec/check_all_diskstat.sh
command[check_cpuload2]=/usr/local/nagios/libexec/check_cpu_perf.sh $ARG1$
command[check_init_service]=sudo /usr/local/nagios/libexec/check_init_service $ARG1$
command[check_load]=/usr/local/nagios/libexec/check_load $ARG1$
command[check_log]=/usr/local/nagios/libexec/check_log $ARG1$
command[check_netbackup]=/usr/local/nagios/libexec/check_netbackup.pl $ARG1$
command[check_net_int]=/usr/local/nagios/libexec/check_net_int.sh
command[check_process]=/usr/local/nagios/libexec/check_process $ARG1$
command[check_procs]=/usr/local/nagios/libexec/check_procs $ARG1$
command[check_unix_log]=sudo /usr/local/nagios/libexec/check_unix_log.pl $ARG1$
command[check_vxvm]=sudo /usr/local/nagios/libexec/check_vxvm
command_timeout=60
connection_timeout=300
debug=0
dont_blame_nrpe=1
include_dir=/usr/local/nagios/etc/nrpe
log_facility=syslog
nrpe_group=nagios
nrpe_user=nagios
pid_file=/var/run/nrpe.pid
server_port=5666
# grep -v ^# /etc/sudoers | grep -i -e nagios -e tty
Defaults requiretty
Defaults:nagios !requiretty
nagios ALL=NOPASSWD: /usr/local/nagios/libexec/check_init_service
nagios ALL=NOPASSWD: /usr/local/nagios/libexec/check_vxvm
nagios ALL = NOPASSWD:/usr/local/nagios/libexec/check_unix_log.pl
COMMAND: /usr/local/nagios/libexec/check_nrpe -H app002 -t 30 -c check_unix_log -a '-l /var/log/messages -i nrpe -w error,crit,alert,emerg -f /usr/local/nagios/var/nagios.tmp.messages.stat -p'
OUTPUT: NRPE: Unable to read output
[root@app002 ~]# su - nagios
[nagios@app002 ~]$ sudo /usr/local/nagios/libexec/check_unix_log.pl -l /var/log/messages -w alert
WARNING: /var/log/messages contains 94 new instances of: alert.
[root@nag001 log]# time /usr/local/nagios/libexec/check_nrpe -H txslm2mlapp002 -t 30 -c check_unix_log -a '-l /tmp/messages.test -w error,crit,alert,emerg'
NRPE: Unable to read output
real 0m0.028s
user 0m0.005s
sys 0m0.005s
[root@nag001 log]# /usr/local/nagios/libexec/check_nrpe -H txslm2mlapp002
NRPE v2.15
Re: Sudo updated and now sudo scripts fail: NRPE: Unable to
Posted: Wed Sep 09, 2015 11:28 pm
by Box293
Very strange error, everything seems correct.
Maybe stop xinetd and kill any possible duplicate services:
Code: Select all
service xinetd stop
killall xinetd
service xinetd start
If that doesn't help, can you run these commands and post the output:
Code: Select all
grep nag /etc/passwd
grep nag /etc/group
ls -al /usr/local/nagios/libexec
We can turn on NRPE debugging to collect more information.
Edit the file:
/usr/local/nagios/etc/nrpe.cfg
Define
debug=1
(it will currently be debug=0)
Save the file and
Now we need to add an option to the rsyslog server so it processes debug messages
Edit the file:
/etc/rsyslogd.conf
Find
/var/log/messages
The line in the config file will look like:
*.info;mail.none;authpriv.none;cron.none /var/log/messages
We need to add the following to the line:
*.info;mail.none;authpriv.none;cron.none
;daemon.debug /var/log/messages
Save the file and
Now there should be more information logged in /var/log/messages
Does this produce anything valuable?
Re: Sudo updated and now sudo scripts fail: NRPE: Unable to
Posted: Thu Sep 10, 2015 10:28 am
by gormank
I'll change the debug options and post results...
Code: Select all
[root@txslm2mlapp002 ~]# uptime
15:24:29 up 11:39, 1 user, load average: 0.11, 0.22, 0.19
[root@txslm2mlapp002 ~]# ps -ef | grep xinetd
root 10491 1 0 03:46 ? 00:00:01 xinetd -stayalive -pidfile /var/run/xinetd.pid
root 12593 12544 0 15:23 pts/1 00:00:00 grep xinetd
[root@txslm2mlapp002 ~]# grep nag /etc/passwd
nagiosnull:x:507:507:Null placeholder:/home/nagios:/sbin/nologin
nagios:x:508:508:Nagios Application account:/home/nagios:/bin/bash
pmpolicy:x:524:524:PM4S Policy Manager Account:/var/opt/quest/qpm4u/pmpolicy:/opt/quest/libexec/pmconfpoluser
[root@txslm2mlapp002 ~]# grep nag /etc/group
nagcmd:x:507:nagios
nagios:x:508:nagios
[root@txslm2mlapp002 ~]# ls -al /usr/local/nagios/libexec
total 7148
drwxrwxr-x 2 nagios nagios 4096 Jun 19 15:10 .
drwxr-xr-x 8 nagios nagios 4096 Apr 21 20:19 ..
-rwxr-xr-x 1 root root 374 Jun 4 22:20 check_all_diskstat.sh
-rwxr-xr-x 1 root root 201213 Apr 21 20:19 check_apt
-rwxr-xr-x 1 root root 6897 Apr 21 20:20 check_asterisk.pl
-rwxr-xr-x 1 root root 1978 Apr 21 20:20 check_asterisk_sip_peers.sh
-rwxr-xr-x 1 root root 2242 Apr 21 20:19 check_breeze
-rwxr-xr-x 1 root root 197506 Apr 21 20:19 check_by_ssh
lrwxrwxrwx 1 root root 9 Apr 21 20:19 check_clamd -> check_tcp
-rwxr-xr-x 1 root root 151149 Apr 21 20:19 check_cluster
-rwxr-xr-x 1 root root 6557 Jun 19 15:10 check_cpu_perf.sh
-rwxr-xr-x 1 root root 5355 Apr 21 20:20 check_cpu_stats.sh
-rwxr-xr-x 1 root root 188670 Apr 21 20:19 check_dhcp
-rwxr-xr-x 1 root root 192444 Apr 21 20:19 check_dig
-rwxr-xr-x 1 root root 207796 Apr 21 20:19 check_disk
-rwxr-xr-x 1 root root 9289 Apr 21 20:19 check_disk_smb
-rwxr-xr-x 1 root root 4835 Jun 4 22:20 check_diskstat.sh
-rwxr-xr-x 1 root root 207258 Apr 21 20:19 check_dns
-rwxr-xr-x 1 root root 93388 Apr 21 20:19 check_dummy
-rwxr-xr-x 1 root root 3349 Apr 21 20:19 check_file_age
-rwxr-xr-x 1 root root 6315 Apr 21 20:19 check_flexlm
lrwxrwxrwx 1 root root 9 Apr 21 20:19 check_ftp -> check_tcp
-rwxr-xr-x 1 root root 364815 Apr 21 20:19 check_http
-rwxr-xr-x 1 root root 193238 Apr 21 20:19 check_icmp
-rwxr-xr-x 1 root root 158979 Apr 21 20:19 check_ide_smart
-rwxr-xr-x 1 root root 15123 Apr 21 20:19 check_ifoperstatus
-rwxr-xr-x 1 root root 12600 Apr 21 20:19 check_ifstatus
lrwxrwxrwx 1 root root 9 Apr 21 20:19 check_imap -> check_tcp
-rwxr-xr-x 1 root nagios 859 Jun 29 15:55 check_init_service
-rwxr-xr-x 1 root root 6887 Apr 21 20:19 check_ircd
lrwxrwxrwx 1 root root 9 Apr 21 20:19 check_jabber -> check_tcp
-rwxr-xr-x 1 root root 184919 Apr 21 20:19 check_load
-rwxr-xr-x 1 root root 5989 Apr 21 20:19 check_log
-rwxr-xr-x 1 root root 21480 Apr 21 20:19 check_mailq
-rwxr-xr-x 1 root root 157437 Apr 21 20:19 check_mrtg
-rwxr-xr-x 1 root root 158082 Apr 21 20:19 check_mrtgtraf
-rwxr-xr-x 1 root root 175481 Apr 21 20:19 check_nagios
-rwxr-xr-x 1 root root 4238 Jun 26 22:14 check_netbackup.pl
-rwxr-xr-x 1 root root 1489 Jun 26 18:35 check_net_int.sh
-rwxr-xr-x 1 root root 25602 Apr 21 20:20 check_netstat.pl
lrwxrwxrwx 1 root root 9 Apr 21 20:19 check_nntp -> check_tcp
lrwxrwxrwx 1 root root 9 Apr 21 20:19 check_nntps -> check_tcp
-rwxr-xr-x 1 nagios nagios 69790 Apr 21 20:19 check_nrpe
-rwxr-xr-x 1 root root 188470 Apr 21 20:19 check_nt
-rwxr-xr-x 1 root root 193409 Apr 21 20:19 check_ntp
-rwxr-xr-x 1 root root 184994 Apr 21 20:19 check_ntp_peer
-rwxr-xr-x 1 root root 184107 Apr 21 20:19 check_ntp_time
-rwxr-xr-x 1 root root 211583 Apr 21 20:19 check_nwstat
-rwxr-xr-x 1 root root 3259 Apr 21 20:20 check_open_files.pl
-rwxr-xr-x 1 root root 8779 Apr 21 20:19 check_oracle
-rwxr-xr-x 1 root root 172377 Apr 21 20:19 check_overcr
-rwxr-xr-x 1 root root 213009 Apr 21 20:19 check_ping
lrwxrwxrwx 1 root root 9 Apr 21 20:19 check_pop -> check_tcp
-rwxr-xr-x 1 root root 24013 Jun 12 18:44 check_process
-rwxr-xr-x 1 root root 200800 Apr 21 20:19 check_procs
-rwxr-xr-x 1 root root 170235 Apr 21 20:19 check_real
-rwxr-xr-x 1 root root 9581 Apr 21 20:19 check_rpc
-rwxr-xr-x 1 root root 1453 Apr 21 20:19 check_sensors
-rwxr-xr-x 1 root root 2174 Apr 21 20:20 check_services
lrwxrwxrwx 1 root root 9 Apr 21 20:19 check_simap -> check_tcp
-rwxr-xr-x 1 root root 7599 Apr 21 20:20 check_sip
-rwxr-xr-x 1 root root 254037 Apr 21 20:19 check_smtp
lrwxrwxrwx 1 root root 9 Apr 21 20:19 check_spop -> check_tcp
-rwxr-xr-x 1 root root 170231 Apr 21 20:19 check_ssh
lrwxrwxrwx 1 root root 9 Apr 21 20:19 check_ssmtp -> check_tcp
-rwxr-xr-x 1 root root 156737 Apr 21 20:19 check_swap
-rwxr-xr-x 1 root root 230343 Apr 21 20:19 check_tcp
-rwxr-xr-x 1 root root 173261 Apr 21 20:19 check_time
lrwxrwxrwx 1 root root 9 Apr 21 20:19 check_udp -> check_tcp
-rwxr-xr-x 1 root root 5423 Jul 27 20:43 check_unix_log.pl
-rwxr-xr-x 1 root root 179501 Apr 21 20:19 check_ups
-rwxr-xr-x 1 root root 151651 Apr 21 20:19 check_uptime
-rwxr-xr-x 1 root root 150273 Apr 21 20:19 check_users
-rwxr-xr-x 1 root root 2936 Apr 21 20:19 check_wave
-rwxr-xr-x 1 root root 710 Apr 21 20:20 check_yum
-rwxr-xr-x 1 root root 3060 Apr 21 20:20 custom_check_mem
-rwxr-xr-x 1 root root 915 Apr 21 20:20 custom_check_procs
-rwxr-xr-x 1 root root 4176 Apr 21 20:20 nagisk.pl
-rwxr-xr-x 1 root root 142078 Apr 21 20:19 negate
-rwxr-xr-x 1 root root 58727 Apr 21 20:20 send_nsca
-rwxr-xr-x 1 root root 148392 Apr 21 20:19 urlize
-rwxr-xr-x 1 root root 1913 Apr 21 20:19 utils.pm
-rwxr-xr-x 1 root root 2791 Apr 21 20:19 utils.sh
Re: Sudo updated and now sudo scripts fail: NRPE: Unable to
Posted: Thu Sep 10, 2015 10:37 am
by jdalrymple
I have a few ideas:
1) Take a look at the information posted above by Box293 regarding implementing nrpe debugging - that may help.
2) Take a look at /var/log/secure to see if there are any auth failures (I see you're using AD auth - that could be at play)
3) Try adding your args to nrpe.cfg temporarily and omitting the args from your nrpe check and make sure that's not where the problem lies
Re: Sudo updated and now sudo scripts fail: NRPE: Unable to
Posted: Thu Sep 10, 2015 10:53 am
by gormank
I guessed the debugging was to be done on the nagios server since rsyslog isn't used on monitored servers...
The logged info is not interesting:
grep -i nrpe /var/log/messages
...
Sep 10 15:47:22 txslm2mlnag001 nrpe[31088]: INFO: SSL/TLS initialized. All network traffic will be encrypted.
Sep 10 15:47:26 txslm2mlnag001 nrpe[31148]: INFO: SSL/TLS initialized. All network traffic will be encrypted.
Sep 10 15:47:51 txslm2mlnag001 nrpe[31327]: INFO: SSL/TLS initialized. All network traffic will be encrypted.
Sep 10 15:47:52 txslm2mlnag001 nrpe[31341]: INFO: SSL/TLS initialized. All network traffic will be encrypted.
Sep 10 15:48:11 txslm2mlnag001 nrpe[31517]: INFO: SSL/TLS initialized. All network traffic will be encrypted.
Sep 10 15:48:11 txslm2mlnag001 nrpe[31518]: INFO: SSL/TLS initialized. All network traffic will be encrypted.
Sep 10 15:48:13 txslm2mlnag001 nrpe[31574]: INFO: SSL/TLS initialized. All network traffic will be encrypted.
Sep 10 15:48:32 txslm2mlnag001 nrpe[31667]: INFO: SSL/TLS initialized. All network traffic will be encrypted.
I did the same change to syslog.conf on a monitored server and the logging is the same as above.
Re: Sudo updated and now sudo scripts fail: NRPE: Unable to
Posted: Thu Sep 10, 2015 11:32 am
by gormank
/var/log/secure has nothing of interest.
The lines below containing nagios are logged when I su - nagios and run the script manually. This makes me wonder (along with the quick execution shown in the original post) if the command is even being attempted on the monitored system.
Sep 10 16:01:01 txslm2mlapp002 crond[19571]: pam_tty_audit(crond:session): restored status to 0
Sep 10 16:06:05 txslm2mlapp002 su[20189]: pam_unix(su-l:session): session opened for user nagios by root(uid=0)
Sep 10 16:06:05 txslm2mlapp002 su[20189]: pam_tty_audit(su-l:session): changed status from 1 to 0
Sep 10 16:06:09 txslm2mlapp002 sudo: nagios : TTY=pts/1 ; PWD=/home/nagios ; USER=root ; COMMAND=/usr/local/nagios/libexec/check_unix_log.pl -l /var/log/messages -i nrpe -w error,crit,alert,emerg -f /usr/local/nagios/var/nagios.tmp.messages.stat -p
Sep 10 16:06:47 txslm2mlapp002 su[20189]: pam_unix(su-l:session): session closed for user nagios
Sep 10 16:06:47 txslm2mlapp002 su[20189]: pam_tty_audit(su-l:session): restored status to 1
Sep 10 16:10:01 txslm2mlapp002 crond[20678]: pam_unix(crond:session): session opened for user root by (uid=0)
If I look at this file on the nagios server, which is just another monitored server, I have lots of entries:
Sep 10 16:22:46 txslm2mlnag001 sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/nagios/libexec/check_unix_log.pl -l /var/log/messages -i nrpe -w error,crit,alert,emerg -f /usr/local/nagios/var/nagios.tmp.messages.stat -p
Sep 10 16:22:48 txslm2mlnag001 sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/nagios/libexec/check_init_service snmptt
Sep 10 16:23:11 txslm2mlnag001 sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/nagios/libexec/check_init_service postgresql
Sep 10 16:23:11 txslm2mlnag001 sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/nagios/libexec/check_init_service ndo2db
Sep 10 16:23:42 txslm2mlnag001 sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/nagios/libexec/check_init_service gearmand
Sep 10 16:23:43 txslm2mlnag001 sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/nagios/libexec/check_init_service mysqld
Sep 10 16:24:20 txslm2mlnag001 sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/nagios/libexec/check_init_service snmptt
Sep 10 16:24:42 txslm2mlnag001 sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/nagios/libexec/check_init_service postgresql
Sep 10 16:25:12 txslm2mlnag001 sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/nagios/libexec/check_init_service httpd
Sep 10 16:25:48 txslm2mlnag001 sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/nagios/libexec/check_init_service httpd
Sep 10 16:25:54 txslm2mlnag001 sudo: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/nagios/libexec/check_init_service gearmand
There are many other checks running on the monitored servers that don't use sudo, all working fine.
Moving the args to nrpe.cfg changes nothing...
[root@txslm2mlapp002 ~]# grep check_unix_log /usr/local/nagios/etc/nrpe.cfg
#command[check_unix_log]=sudo /usr/local/nagios/libexec/check_unix_log.pl $ARG1$
command[check_unix_log]=sudo /usr/local/nagios/libexec/check_unix_log.pl -l $ARG1$ /var/log/messages -i nrpe -w error,crit,alert,emerg -f /usr/local/nagios/var/nagios.tmp.messages.stat -p
COMMAND: /usr/local/nagios/libexec/check_nrpe -H txslm2mlapp002 -t 30 -c check_unix_log
OUTPUT: NRPE: Unable to read output
Re: Sudo updated and now sudo scripts fail: NRPE: Unable to
Posted: Thu Sep 10, 2015 2:08 pm
by tgriep
Are the checks that use the check_init_service plugin working or are they failing the same?
If so, the group owner for that file is set to nagios, try changing the owner to nagios for the check_unix_log.pl plugin and see if that helps.
When sudo was upgraded, what version did it get upgraded too?
Re: Sudo updated and now sudo scripts fail: NRPE: Unable to
Posted: Thu Sep 10, 2015 3:00 pm
by gormank
All sudo checks fail on all the servers that had sudo updated. Sudo wasn't updated on the nagios servers where sudo checks still work.
# rpm -qa | grep sudo
sudo-1.8.14-3.el5
Re: Sudo updated and now sudo scripts fail: NRPE: Unable to
Posted: Thu Sep 10, 2015 3:09 pm
by ssax
What is the output of this command on a non-working one?
Re: Sudo updated and now sudo scripts fail: NRPE: Unable to
Posted: Thu Sep 10, 2015 3:43 pm
by gormank
/etc/sudoers: parsed OK