Page 1 of 3

NRPE sudo check_jvm not working on RHEL 8 or Debian 10

Posted: Wed Dec 16, 2020 10:32 am
by drakedts
Hello. I am trying to monitor Tomcat's heap with NRPE. On our older systems (RHEL 7) it works, but the newer ones (RHEL 8, Debian 10) it does not. We run the following mix of software versions:

Code: Select all

Quantity   OS          OpenJDK   Tomcat   Systemd   NRPE    Status
2          RHEL 7      7         7        219       4.0.3   Works
4          RHEL 7      8         7        219       4.0.3   Works
28         RHEL 7      8         8.5      219       4.0.3   Works
7          RHEL 8      8         8.5      239       4.0.3   Fail
2          Debian 10   11        9        241       3.2.1   Fail
This is what i see from the Nagios XI server when i run the check against a client:

Code: Select all

# /usr/local/nagios/libexec/check_nrpe -H lnx-b9ssb-devl -u -t 30 -c tomcat_heap
UNKNOWN Can't connect to the JVM: 
On the client, "tomcat_heap" is defined like so, using sudo:

Code: Select all

command[tomcat_heap]=/usr/bin/sudo -u tomcat8 /usr/lib64/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p heap -w 90 -c 101
For comparison, other NRPE checks (that do not use sudo check_jvm) work just fine:

Code: Select all

# /usr/local/nagios/libexec/check_nrpe -H lnx-b9ssb-devl -u -t 30 -c mem
OK - Memory usage is 48%
NRPE runs as the nrpe user, and Tomcat runs as the tomcat8 user, and i've confirmed those by checking "ps aux". As the nrpe user on the client, the command works:

Code: Select all

[nrpe@lnx-b9ssb-devl ~]$ /usr/bin/sudo -u tomcat8 /usr/lib64/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p heap -w 90 -c 101
OK 23% | max=14616625152;;; commited=14616625152;;; used=3390956184;;;
My sudo configuration from /etc/sudoers looks like this:

Code: Select all

Defaults   !visiblepw
Defaults    always_set_home
Defaults    match_group_by_gid
Defaults    always_query_group_plugin
Defaults    env_reset
Defaults    env_keep =  "COLORS DISPLAY HOSTNAME HISTSIZE KDEDIR LS_COLORS"
Defaults    env_keep += "MAIL PS1 PS2 QTDIR USERNAME LANG LC_ADDRESS LC_CTYPE"
Defaults    env_keep += "LC_COLLATE LC_IDENTIFICATION LC_MEASUREMENT LC_MESSAGES"
Defaults    env_keep += "LC_MONETARY LC_NAME LC_NUMERIC LC_PAPER LC_TELEPHONE"
Defaults    env_keep += "LC_TIME LC_ALL LANGUAGE LINGUAS _XKB_CHARSET XAUTHORITY"
Defaults    secure_path = /sbin:/bin:/usr/sbin:/usr/bin
root	ALL=(ALL) 	ALL
%wheel	ALL=(ALL)	ALL
#includedir /etc/sudoers.d
There's an extra file /etc/sudoers.d/custom that gets included:

Code: Select all

%wheel  ALL=(ALL)       NOPASSWD: ALL
audit ALL=(ALL) ALL
banner ALL=(root) NOPASSWD:/usr/bin/systemctl
Defaults:nrpe !requiretty
#nrpe ALL=(ALL) NOPASSWD:/usr/lib64/nagios/plugins/*
#Just for testing:
nrpe ALL=(ALL) NOPASSWD: ALL
tomcat8 ALL=(ALL) NOPASSWD: ALL
The check_jvm command is from https://fidanov.net/c0d3/nagios-plugins/jvminspector/. And i updated it to the latest version that was just released a few days ago. It has global execute permissions:

Code: Select all

# ls -l /usr/lib64/nagios/plugins/check_jvm
-r-xr-xr-x. 1 root root 6002 2020-12-15 15:04:50 /usr/lib64/nagios/plugins/check_jvm*
While testing, SELinux is set to permissive mode on the RHEL 8 machines that are having trouble. And on Debian, SELinux is not even installed. So i know SELinux is not the problem.

I know systemd has some security that it can impose and i've tried looking for things at that level but no luck yet.

I don't think the problem is with sudo itself; as other commands work. For example, if i temporarily change the definition of tomcat_heap to run "id" instead of "check_jvm":

Code: Select all

command[tomcat_heap]=/usr/bin/sudo -u tomcat8 /usr/bin/id
Then i get expected id output when calling it from the XI server:

Code: Select all

# /usr/local/nagios/libexec/check_nrpe -H lnx-b9ssb-devl -u -t 30 -c tomcat_heap
uid=982(tomcat8) gid=978(tomcat8) groups=978(tomcat8) context=system_u:system_r:nrpe_t:s0
So there seems to be something specific about how check_jvm runs when called by nrpe running as a daemon. It is odd that it works fine from command line when logged in as the nrpe user though. Any ideas?

Awhile back i had a similar problem (see https://support.nagios.com/forum/viewto ... 16&t=59209) that was solved by installing Java 11 on the affected servers and editing the check_jvm script to run Java 11, even though Tomcat was running under Java 8. That solution only worked for a few weeks though, then it stopped working and i don't know why. I never understood why it worked in the first place though, which makes it hard to fix it when it broke. That partial solution only worked on RHEL 8 anyway, not on Debian. I think i just need to get to the root of the problem and figure out what is going on on both RHEL 8 and Debian 10.

Re: NRPE sudo check_jvm not working on RHEL 8 or Debian 10

Posted: Thu Dec 17, 2020 4:07 pm
by dchurch
This strikes me as either an SELinux/AppArmor problem or an environment variable problem.

What is the output from this command:

Code: Select all

getenforce
Are you able to log in as root to the box running Tomcat and run:

Code: Select all

sudo -u tomcat8 /usr/lib64/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p heap -w 90 -c 101
Do you have dont_blame_nrpe set to 1 in your nrpe.cfg file? Are there differences between the nrpe.cfg files on the Debian 10 system vs. the RHEL 7 system?

Re: NRPE sudo check_jvm not working on RHEL 8 or Debian 10

Posted: Thu Dec 17, 2020 4:44 pm
by drakedts
I have tried hard to make sure SELinux is not an issue. I normally run our RHEL servers with SELinux set enforcing, but while working on this problem i have set it permissive to rule out SELinux as a cause.

On the RHEL 8 client that i was using for the examples above, this shows that SELinux is set to permissive and that the check command works perfectly from root:

Code: Select all

[root@lnx-b9ssb-devl ~]# getenforce
Permissive
[root@lnx-b9ssb-devl ~]# sudo -u tomcat8 /usr/lib64/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p heap -w 90 -c 101
CRITICAL 4.0G |max=14583595008;;; commited=14583595008;;; used=4233922576;;;
(As an aside, the fact that it shows "critical" right now is unimportant; check_jvm has a quirk where it does not accept percentages for the thresholds and i'm using the unmodified version for testing right now, without my patch applied which fixes the threshold computation. The important bit is that it does run and gives back actual data.)

I have dont_blame_nrpe=0, as i believe it should be since i don't send command arguments from the XI server to NRPE daemons.

I've compared nrpe.cfg configuration on an RHEL 7 Tomcat server (which works) and an RHEL 8 one (which does not work). The non-comment lines on both are identical, and are equal to:

Code: Select all

log_facility=daemon
debug=0
pid_file=/var/run/nrpe/nrpe.pid
server_port=5666
nrpe_user=nrpe
nrpe_group=nrpe
allowed_hosts=127.0.0.1,::1
dont_blame_nrpe=0
allow_bash_command_substitution=0
command_timeout=60
connection_timeout=300
disable_syslog=0
command[check_users]=/usr/lib64/nagios/plugins/check_users -w 5 -c 10
command[check_load]=/usr/lib64/nagios/plugins/check_load -r -w .15,.10,.05 -c .30,.25,.20
command[check_hda1]=/usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /dev/hda1
command[check_zombie_procs]=/usr/lib64/nagios/plugins/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/lib64/nagios/plugins/check_procs -w 150 -c 200
include_dir=/etc/nrpe.d/
The configuration that is included from /etc/nrpe.d is also the same on the 2 systems, and looks like this:

Code: Select all

allowed_hosts=lnx-dns3-prod.drake.edu,lnx-dns4-prod.drake.edu,lnx-nagios-prod.drake.edu
command[cups_jobs]=/usr/lib64/nagios/plugins/check_cups_jobs -w 10 -c 20 -W 5m -C 60m --perfdata
command[date_ns]=/bin/date +%s.%N
command[disk]=/usr/lib64/nagios/plugins/check_disk -e -w 15% -c 5% -N btrfs -N ext4 -N tmpfs -N vfat -N xfs
command[iostat]=/usr/lib64/nagios/plugins/check_iostat -dbu -dbuw 50 -dbuc 90 -p
command[load]=/usr/lib64/nagios/plugins/check_load -w 20,14.0,8 -c 40,30.0,20
command[mem]=/usr/lib64/nagios/plugins/check_mem 90 95
command[mountpoints]=/usr/lib64/nagios/plugins/check_mountpoints -a
command[ro_mounts]=/usr/lib64/nagios/plugins/check_ro_mounts -X tmpfs -X nfs -X cifs -X squashfs
command[users]=/usr/lib64/nagios/plugins/check_users -w 5 -c 10
command[dhcpd_pools]=/usr/local/bin/dhcpd-pools --config=/etc/dhcp/dhcpd.conf --leases=/var/lib/dhcpd/dhcpd.leases --warning 80 --critical 90
command[namedconf]=/usr/lib64/nagios/plugins/check_namedconf
command[dig-www-dns1]=/usr/lib64/nagios/plugins/check_dig -l www.drake.edu -H lnx-dns1-prod.drake.edu -w 2.0 -c 5.0
command[dig-www-dns2]=/usr/lib64/nagios/plugins/check_dig -l www.drake.edu -H lnx-dns2-prod.drake.edu -w 2.0 -c 5.0
command[dig-www-dns3]=/usr/lib64/nagios/plugins/check_dig -l www.drake.edu -H lnx-dns3-prod.drake.edu -w 2.0 -c 5.0
command[dig-www-dns4]=/usr/lib64/nagios/plugins/check_dig -l www.drake.edu -H lnx-dns4-prod.drake.edu -w 2.0 -c 5.0
command[mailq]=/usr/lib64/nagios/plugins/check_mailq -M postfix -w 20 -c 100
command[smtp-smtpout]=/usr/lib64/nagios/plugins/check_smtp -H smtpout.drake.edu -w 2 -c 5
command[smtp-smtpout2]=/usr/lib64/nagios/plugins/check_smtp -H smtpout2.drake.edu -w 2 -c 5
command[smtp-drakemx]=/usr/lib64/nagios/plugins/check_smtp -H drake-edu.mail.protection.outlook.com -w 2 -c 5
command[photoreader]=/usr/lib64/nagios/plugins/check_mount /net/photoreader cifs
command[procs-engine]=/usr/lib64/nagios/plugins/check_procs -c 1: -a engine.jar
command[procs-jobsub]=/usr/lib64/nagios/plugins/check_jobsub
command[procs-middleware]=/usr/lib64/nagios/plugins/check_procs -c 8: -w 12: -a Middleware
command[procs-slave]=/usr/lib64/nagios/plugins/check_procs -w 1: -a slave
command[ethosctl]=/usr/lib64/nagios/plugins/check_systemd_service ethosctl.service
command[tomcat_service]=/usr/lib64/nagios/plugins/check_systemd_service tomcat8.service
command[tomcat_heap]=/usr/bin/sudo -u tomcat8 /usr/lib64/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p heap -w 90 -c 101
command[tomcat_nonheap]=/usr/bin/sudo -u tomcat8 /usr/lib64/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p non-heap -w 90 -c 95
command[tomcat_classes]=/usr/bin/sudo -u tomcat8 /usr/lib64/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p classes -w 25000 -c 30000
command[tomcat_threads]=/usr/bin/sudo -u tomcat8 /usr/lib64/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p threads -w 200 -c 500
command[tomcat_sessions]=/usr/bin/sudo -u tomcat8 /usr/lib64/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p sessions -w 5000 -c 10000
On a Debian system it is a bit harder to directly compare the configuration as Debian uses slightly different paths. But from a quick look i believe they are effectively equal. Here's nrpe.cfg from Debian (which does not work):

Code: Select all

log_facility=daemon
debug=0
pid_file=/var/run/nagios/nrpe.pid
server_port=5666
nrpe_user=nagios
nrpe_group=nagios
allowed_hosts=127.0.0.1,::1
dont_blame_nrpe=0
allow_bash_command_substitution=0
command_timeout=60
connection_timeout=300
command[check_users]=/usr/lib/nagios/plugins/check_users -w 5 -c 10
command[check_load]=/usr/lib/nagios/plugins/check_load -r -w .15,.10,.05 -c .30,.25,.20
command[check_hda1]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /dev/hda1
command[check_zombie_procs]=/usr/lib/nagios/plugins/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/lib/nagios/plugins/check_procs -w 150 -c 200
include=/etc/nagios/nrpe_local.cfg
include_dir=/etc/nagios/nrpe.d/
The configuration included from /etc/nagios/nrpe.d is again the same, once path differences on Debian are accounted for. The Debian machine does have one extra check defined ("mariadb"), as it has a database that our RHEL machines lack:

Code: Select all

allowed_hosts=lnx-dns3-prod.drake.edu,lnx-dns4-prod.drake.edu,lnx-nagios-prod.drake.edu
command[cups_jobs]=/usr/lib/nagios/plugins/check_cups_jobs -w 10 -c 20 -W 5m -C 60m --perfdata
command[date_ns]=/bin/date +%s.%N
command[disk]=/usr/lib/nagios/plugins/check_disk -e -w 15% -c 5% -N btrfs -N ext4 -N tmpfs -N vfat -N xfs
command[iostat]=/usr/lib/nagios/plugins/check_iostat -dbu -dbuw 50 -dbuc 90 -p
command[load]=/usr/lib/nagios/plugins/check_load -w 20,14.0,8 -c 40,30.0,20
command[mem]=/usr/lib/nagios/plugins/check_mem 90 95
command[mountpoints]=/usr/lib/nagios/plugins/check_mountpoints -a
command[ro_mounts]=/usr/lib/nagios/plugins/check_ro_mounts -X tmpfs -X nfs -X cifs -X squashfs
command[users]=/usr/lib/nagios/plugins/check_users -w 5 -c 10
command[dhcpd_pools]=/usr/local/bin/dhcpd-pools --config=/etc/dhcp/dhcpd.conf --leases=/var/lib/dhcpd/dhcpd.leases --warning 80 --critical 90
command[namedconf]=/usr/lib/nagios/plugins/check_namedconf
command[dig-www-dns1]=/usr/lib/nagios/plugins/check_dig -l www.drake.edu -H lnx-dns1-prod.drake.edu -w 2.0 -c 5.0
command[dig-www-dns2]=/usr/lib/nagios/plugins/check_dig -l www.drake.edu -H lnx-dns2-prod.drake.edu -w 2.0 -c 5.0
command[dig-www-dns3]=/usr/lib/nagios/plugins/check_dig -l www.drake.edu -H lnx-dns3-prod.drake.edu -w 2.0 -c 5.0
command[dig-www-dns4]=/usr/lib/nagios/plugins/check_dig -l www.drake.edu -H lnx-dns4-prod.drake.edu -w 2.0 -c 5.0
command[mailq]=/usr/lib/nagios/plugins/check_mailq -M postfix -w 20 -c 100
command[smtp-smtpout]=/usr/lib/nagios/plugins/check_smtp -H smtpout.drake.edu -w 2 -c 5
command[smtp-smtpout2]=/usr/lib/nagios/plugins/check_smtp -H smtpout2.drake.edu -w 2 -c 5
command[smtp-drakemx]=/usr/lib/nagios/plugins/check_smtp -H drake-edu.mail.protection.outlook.com -w 2 -c 5
command[photoreader]=/usr/lib/nagios/plugins/check_mount /net/photoreader cifs
command[procs-engine]=/usr/lib/nagios/plugins/check_procs -c 1: -a engine.jar
command[procs-jobsub]=/usr/lib/nagios/plugins/check_jobsub
command[procs-middleware]=/usr/lib/nagios/plugins/check_procs -c 8: -w 12: -a Middleware
command[procs-slave]=/usr/lib/nagios/plugins/check_procs -w 1: -a slave
command[ethosctl]=/usr/lib/nagios/plugins/check_systemd_service ethosctl.service
command[mariadb]=/usr/lib/nagios/plugins/check_mysql -n
command[tomcat_service]=/usr/lib/nagios/plugins/check_systemd_service tomcat9.service
command[tomcat_heap]=/usr/bin/sudo -u tomcat /usr/lib/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p heap -w 90 -c 101
command[tomcat_nonheap]=/usr/bin/sudo -u tomcat /usr/lib/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p non-heap -w 90 -c 95
command[tomcat_classes]=/usr/bin/sudo -u tomcat /usr/lib/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p classes -w 25000 -c 30000
command[tomcat_threads]=/usr/bin/sudo -u tomcat /usr/lib/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p threads -w 200 -c 500
command[tomcat_sessions]=/usr/bin/sudo -u tomcat /usr/lib/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p sessions -w 5000 -c 10000
As for an environment variable problem, i've tried adding a line "env | sort" to the check_jvm script just before it runs "java -jar ..." in order to see what the environment looks like. With that modification in place, here's what i get when running the check from the command line as the nrpe user:

Code: Select all

[nrpe@lnx-b9ssb-devl ~]$ /usr/bin/sudo -u tomcat8 /usr/lib64/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p heap -w 90 -c 101
CDPATH=
ENV=
HISTSIZE=1000
HOME=/usr/share/tomcat8
HOSTNAME=lnx-b9ssb-devl.test.drake.edu
LANG=en_US.UTF-8
LC_COLLATE=C
LC_TIME=en_XX.UTF-8
LOGNAME=tomcat8
LS_COLORS=rs=0:di=38;5;33:ln=38;5;51:mh=00:pi=40;38;5;11:so=38;5;13:do=38;5;5:bd=48;5;232;38;5;11:cd=48;5;232;38;5;3:or=48;5;232;38;5;9:mi=01;05;37;41:su=48;5;196;38;5;15:sg=48;5;11;38;5;16:ca=48;5;196;38;5;226:tw=48;5;10;38;5;16:ow=48;5;10;38;5;21:st=48;5;21;38;5;15:ex=38;5;40:*.tar=38;5;9:*.tgz=38;5;9:*.arc=38;5;9:*.arj=38;5;9:*.taz=38;5;9:*.lha=38;5;9:*.lz4=38;5;9:*.lzh=38;5;9:*.lzma=38;5;9:*.tlz=38;5;9:*.txz=38;5;9:*.tzo=38;5;9:*.t7z=38;5;9:*.zip=38;5;9:*.z=38;5;9:*.dz=38;5;9:*.gz=38;5;9:*.lrz=38;5;9:*.lz=38;5;9:*.lzo=38;5;9:*.xz=38;5;9:*.zst=38;5;9:*.tzst=38;5;9:*.bz2=38;5;9:*.bz=38;5;9:*.tbz=38;5;9:*.tbz2=38;5;9:*.tz=38;5;9:*.deb=38;5;9:*.rpm=38;5;9:*.jar=38;5;9:*.war=38;5;9:*.ear=38;5;9:*.sar=38;5;9:*.rar=38;5;9:*.alz=38;5;9:*.ace=38;5;9:*.zoo=38;5;9:*.cpio=38;5;9:*.7z=38;5;9:*.rz=38;5;9:*.cab=38;5;9:*.wim=38;5;9:*.swm=38;5;9:*.dwm=38;5;9:*.esd=38;5;9:*.jpg=38;5;13:*.jpeg=38;5;13:*.mjpg=38;5;13:*.mjpeg=38;5;13:*.gif=38;5;13:*.bmp=38;5;13:*.pbm=38;5;13:*.pgm=38;5;13:*.ppm=38;5;13:*.tga=38;5;13:*.xbm=38;5;13:*.xpm=38;5;13:*.tif=38;5;13:*.tiff=38;5;13:*.png=38;5;13:*.svg=38;5;13:*.svgz=38;5;13:*.mng=38;5;13:*.pcx=38;5;13:*.mov=38;5;13:*.mpg=38;5;13:*.mpeg=38;5;13:*.m2v=38;5;13:*.mkv=38;5;13:*.webm=38;5;13:*.ogm=38;5;13:*.mp4=38;5;13:*.m4v=38;5;13:*.mp4v=38;5;13:*.vob=38;5;13:*.qt=38;5;13:*.nuv=38;5;13:*.wmv=38;5;13:*.asf=38;5;13:*.rm=38;5;13:*.rmvb=38;5;13:*.flc=38;5;13:*.avi=38;5;13:*.fli=38;5;13:*.flv=38;5;13:*.gl=38;5;13:*.dl=38;5;13:*.xcf=38;5;13:*.xwd=38;5;13:*.yuv=38;5;13:*.cgm=38;5;13:*.emf=38;5;13:*.ogv=38;5;13:*.ogx=38;5;13:*.aac=38;5;45:*.au=38;5;45:*.flac=38;5;45:*.m4a=38;5;45:*.mid=38;5;45:*.midi=38;5;45:*.mka=38;5;45:*.mp3=38;5;45:*.mpc=38;5;45:*.ogg=38;5;45:*.ra=38;5;45:*.wav=38;5;45:*.oga=38;5;45:*.opus=38;5;45:*.spx=38;5;45:*.xspf=38;5;45:
MAIL=/var/spool/mail/nrpe
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/run/nrpe
SHELL=/bin/bash
SHLVL=1
SUDO_COMMAND=/usr/lib64/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p heap -w 90 -c 101
SUDO_GID=982
SUDO_UID=986
SUDO_USER=nrpe
TERM=xterm-256color
USER=tomcat8
_=/usr/bin/env
CRITICAL 3.4G |max=14599847936;;; commited=14599847936;;; used=3555842200;;;
When running the check from the Nagios XI server, there's a smaller environment:

Code: Select all

[root@lnx-nagios-prod ~]# /usr/local/nagios/libexec/check_nrpe -H lnx-b9ssb-devl -u -t 30 -c tomcat_heap
CDPATH=
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/982/bus
ENV=
HOME=/usr/share/tomcat8
LOGNAME=tomcat8
MAIL=/var/mail/tomcat8
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
SHELL=/bin/bash
SHLVL=1
SUDO_COMMAND=/usr/lib64/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p heap -w 90 -c 101
SUDO_GID=982
SUDO_UID=986
SUDO_USER=nrpe
TERM=unknown
USER=tomcat8
XDG_RUNTIME_DIR=/run/user/982
XDG_SESSION_ID=c572
_=/usr/bin/env
UNKNOWN Can't connect to the JVM: 
I don't know if any of the differences actually matter. For testing purposes, i have tried adding lines to check_jvm like "export LANG=en_US.UTF-8", "export LC_COLLATE=C", and so on, for the variables that are missing when it is called by the nrpe daemon. But so far i've not found any of those variables to make a difference.

Re: NRPE sudo check_jvm not working on RHEL 8 or Debian 10

Posted: Fri Dec 18, 2020 2:40 pm
by dchurch
I noticed something about your configuration:
drakedts wrote: I've compared nrpe.cfg configuration on an RHEL 7 Tomcat server (which works) and an RHEL 8 one (which does not work). The non-comment lines on both are identical, and are equal to:

Code: Select all

nrpe_user=nrpe
nrpe_group=nrpe
drakedts wrote: [...] Here's nrpe.cfg from Debian (which does not work):

Code: Select all

nrpe_user=nagios
nrpe_group=nagios
The daemons are running as the nrpe user under RHEL7/8, and nagios under Debian? Are you able to sudo -u tomcat8 as the different user in both environments? Can you post the contents of /var/log/auth.log? It would indicate if the sudo invocation failed.
drakedts wrote: There's an extra file /etc/sudoers.d/custom that gets included:

Code: Select all

%wheel  ALL=(ALL)       NOPASSWD: ALL
Defaults:nrpe !requiretty
#Just for testing:
nrpe ALL=(ALL) NOPASSWD: ALL
That would have to read nagios ALL=(ALL) NOPASSWD: ALL to work under Debian.

By default when you install using the fullinstall on Debian, the daemon runs thru xinetd and nrpe_user is set to nagios. If you don't change this after installation, it'll stay that way.

Also note that "service xinetd restart" doesn't always close off the nrpe port and kill the listener daemon. I ended up having to murder it by pid to get the updated config to be loaded under Debian 10.
drakedts wrote:

Code: Select all

[nrpe@lnx-b9ssb-devl ~]$ /usr/bin/sudo -u tomcat8 /usr/lib64/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p heap -w 90 -c 101
SUDO_USER=nrpe
CRITICAL 3.4G |max=14599847936;;; commited=14599847936;;; used=3555842200;;;
This wouldn't demonstrate that the check can run if the daemon is configured to run as "nagios".

So far I've been concentrating on diagnosing your Debian configuration. It would be great if the issues had the same cause, but maybe they don't.

Re: NRPE sudo check_jvm not working on RHEL 8 or Debian 10

Posted: Sun Dec 20, 2020 9:19 am
by drakedts
Yes, Debian has a number of normal differences from RHEL; besides path differences, different user names for services. Most of the configuration is created by Ansible, which glosses over those, such as putting "nagios" as the user in the sudoers file instead of "nrpe". (Tomcat on Debian runs as a different username too, "tomcat" vs "tomcat8" on RHEL.) I apologize for being sloppy and not pointing out the differences in usernames the service runs as; i've been focused on RHEL 8. One thing that is consistent across distros though: xinetd is not installed on Debian and NRPE is managed by systemd.

I have not done nearly as much diagnosing of the issue on Debian as i have on RHEL (really, just noticing that our Nagios Operations Screen shows the same error for our Debian machines as it does for our RHEL 8 machines). There's something weird going on on Debian though. I can't run the check as the nagios user:

Code: Select all

[nagios@lnx-cms-test ~]$ /usr/bin/sudo -u tomcat /usr/lib/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p heap -w 90 -c 101
UNKNOWN Can't connect to the JVM: 
And it doesn't work as root either!

Code: Select all

[root@lnx-cms-test ~]# /usr/bin/sudo -u tomcat /usr/lib/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p heap -w 90 -c 101
UNKNOWN Can't connect to the JVM: 
For what it's worth, here's the /var/log/auth.log entries from Debian when i tried running the command as the nagios user:

Code: Select all

Dec 20 07:52:31 lnx-cms-test sudo:   nagios : TTY=unknown ; PWD=/ ; USER=tomcat ; COMMAND=/usr/lib/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p sessions -w 5000 -c 10000
Dec 20 07:52:31 lnx-cms-test sudo: pam_unix(sudo:session): session opened for user tomcat by (uid=0)
Dec 20 07:52:31 lnx-cms-test sudo: pam_unix(sudo:session): session closed for user tomcat
I'll have to do some more investigation on the Debian side; i've been so focused on RHEL 8 that i had not noticed how broken the check is, and that it won't even run from the command line. I don't know if AppArmor restricts access to the running Tomcat in any way, but that's probably the first thing i'll look into. It would probably have been easier if i had not mentioned Debian at all and we just focused on the RHEL 8 machines.

Re: NRPE sudo check_jvm not working on RHEL 8 or Debian 10

Posted: Mon Dec 21, 2020 11:55 am
by dchurch
There's a bug in the check_jvm script that's preventing it from working.

It seems to do a naive detection of anything in the command path or arguments containing the class name, and take that as the PID it should send to JvmInspector.jar. It matches anything with "java" and the class name in the entire command line (e.g. check_jvm -n MyClassName will pick up on /usr/bin/cat "java some text moretext moretext MyClassName"). I suspect it may be detecting some other non-JVM process running on the server with a lower pid.

For instance, if I have a "vim Foo.java" running and then run run check_jvm -n Foo there's a chance it could pick up on the vim process and use its pid as the JVM process.

In my lab, I was able to get it to fail by having "vim TestClass.java" and running the check with "-n TestClass".

If you replace line 100 in the script, it should work. Here's a patch:

Code: Select all

--- check_jvm.orig      2020-12-18 19:32:09.126473975 -0600
+++ check_jvm   2020-12-18 19:42:05.533237641 -0600
@@ -97,7 +97,7 @@
 expr "${CRITICAL}"  : '[0-9]\+$' >/dev/null || p_unknown "Invalid critical threshold"
 [ -f "$JVMINSPECTOR" ] || p_unknown "Can't find JvmInspector.jar, please install it and set JVMINSPECTOR var in this script"

-PSLINE="$(ps axo pid,uid,command | grep [j]ava | grep "$NAME" | head -1)"
+PSLINE=$(ps axo pid,uid,command -q $(echo $(pgrep java) |tr ' ' , | head -1) |grep "$NAME" |head -1)
 PID="$(echo "$PSLINE" | awk '{print $1}')"
 PUID="$(echo "$PSLINE" | awk '{print $2}')"
I'll send this patch along to the author as per the GPL it's licensed under.

Re: NRPE sudo check_jvm not working on RHEL 8 or Debian 10

Posted: Tue Jan 05, 2021 10:00 am
by drakedts
Hello again! Sorry for the slow reply; we were closed down for holiday break. That PSLINE patch is a good catch. It doesn't fix things on my system though. And i can just manually run the old PSLINE and the new PSLINE at the command line and see that the output is the same:

Code: Select all

# NAME=org.apache.catalina.startup.Bootstrap

# ps axo pid,uid,command | grep [j]ava | grep "$NAME" | head -1
 289830   982 /usr/lib/jvm/jre/bin/java -Xms14336M -Xmx14336M -Dbanner.logging.dir=/var/log/tomcat8 -XX:MaxMetaspaceSize=2048m -classpath /usr/share/tomcat8/bin/bootstrap.jar:/usr/share/tomcat8/bin/tomcat-juli.jar: -Dcatalina.base=/usr/share/tomcat8 -Dcatalina.home=/usr/share/tomcat8 -Djava.endorsed.dirs= -Djava.io.tmpdir=/var/cache/tomcat8/temp -Djava.util.logging.config.file=/usr/share/tomcat8/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager org.apache.catalina.startup.Bootstrap start

# ps axo pid,uid,command -q $(echo $(pgrep java) |tr ' ' , | head -1) |grep "$NAME" |head -1
 289830   982 /usr/lib/jvm/jre/bin/java -Xms14336M -Xmx14336M -Dbanner.logging.dir=/var/log/tomcat8 -XX:MaxMetaspaceSize=2048m -classpath /usr/share/tomcat8/bin/bootstrap.jar:/usr/share/tomcat8/bin/tomcat-juli.jar: -Dcatalina.base=/usr/share/tomcat8 -Dcatalina.home=/usr/share/tomcat8 -Djava.endorsed.dirs= -Djava.io.tmpdir=/var/cache/tomcat8/temp -Djava.util.logging.config.file=/usr/share/tomcat8/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager org.apache.catalina.startup.Bootstrap start

Re: NRPE sudo check_jvm not working on RHEL 8 or Debian 10

Posted: Tue Jan 05, 2021 2:11 pm
by dchurch
I was able to reproduce the issue in Debian10 where it would output UNKNOWN Can't connect to the JVM: and I was able to fix it using the "PSLINE" patch from earlier in this forum post.

What does the following output produce when running on the NRPE receiver (the Tomcat8 server on either RHEL8 or DEB10)?

Code: Select all

ps axo pid,uid,command | grep java
Turns out that without the "PSLINE" patch, if the java process to be monitored is invoked thru sudo, it'll also trip up the plugin; "sudo java MyClass" will match what it's searching for.

We don't normally provide support for third-party plugins such as this one. I'd get in contact with the plugin developer.

Re: NRPE sudo check_jvm not working on RHEL 8 or Debian 10

Posted: Thu Jan 07, 2021 9:27 am
by drakedts
This is from a RHEL 8 test machine:

Code: Select all

# ps axo pid,uid,command | grep java
 289830   982 /usr/lib/jvm/jre/bin/java -Xms14336M -Xmx14336M -Dbanner.logging.dir=/var/log/tomcat8 -XX:MaxMetaspaceSize=2048m -classpath /usr/share/tomcat8/bin/bootstrap.jar:/usr/share/tomcat8/bin/tomcat-juli.jar: -Dcatalina.base=/usr/share/tomcat8 -Dcatalina.home=/usr/share/tomcat8 -Djava.endorsed.dirs= -Djava.io.tmpdir=/var/cache/tomcat8/temp -Djava.util.logging.config.file=/usr/share/tomcat8/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager org.apache.catalina.startup.Bootstrap start
 786776     0 grep --color=auto java
And here's the same command from Debian 10:

Code: Select all

# ps axo pid,uid,command | grep java
  659   995 /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Djava.util.logging.config.file=/var/lib/tomcat9/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.awt.headless=true -Xms4096M -Xmx4096M -Djdk.tls.ephemeralDHKeySize=2048 -Djava.protocol.handler.pkgs=org.apache.catalina.webresources -Dorg.apache.catalina.security.SecurityListener.UMASK=0027 -Dignore.endorsed.dirs= -classpath /usr/share/tomcat9/bin/bootstrap.jar:/usr/share/tomcat9/bin/tomcat-juli.jar -Dcatalina.base=/var/lib/tomcat9 -Dcatalina.home=/usr/share/tomcat9 -Djava.io.tmpdir=/tmp org.apache.catalina.startup.Bootstrap start
15974     0 grep --color=auto java
The PSLINE patch doesn't fix the plugin on either OS for me.

I did some testing on Debian and i think the issue on the 2 OSes might be different.

On RHEL 8, i can log in as the user NRPE runs as and run the check from the command line (check_jvm wrapped with sudo) and it works fine. Or i can log in as the user Tomcat runs as and run the check_jvm without sudo and it works fine. So on RHEL it seems the problem is with NRPE running the check. The check itself works great, just as long as it is run from the command line and not from the NRPE daemon. I don't know why the NRPE daemon cannot run the plugin though. This is the only plugin we have trouble with. But it is also the only plugin where NRPE has to use sudo to run a plugin.

But on Debian, i cannot run the check_jvm in any scenario. Now, the RHEL machines have Java 8 and Debian has Java 11, so maybe that's the difference. I do have the correct (Java 11) version of JvmInspector installed on the Debian machine, but the check still just doesn't work (it always gives "UNKNOWN Can't connect to the JVM:"). So on Debian it sounds like a problem with the check_jvm plugin, and i agree that i should probably work with the check_jvm developer on the Debian issue.

Re: NRPE sudo check_jvm not working on RHEL 8 or Debian 10

Posted: Thu Jan 07, 2021 1:25 pm
by dchurch
Can you please run this command an post the entire output:

Code: Select all

/usr/bin/sudo -u tomcat bash -x /usr/lib/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p heap -w 90 -c 101