I have tried hard to make sure SELinux is not an issue. I normally run our RHEL servers with SELinux set enforcing, but while working on this problem i have set it permissive to rule out SELinux as a cause.
On the RHEL 8 client that i was using for the examples above, this shows that SELinux is set to permissive and that the check command works perfectly from root:
Code: Select all
[root@lnx-b9ssb-devl ~]# getenforce
Permissive
[root@lnx-b9ssb-devl ~]# sudo -u tomcat8 /usr/lib64/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p heap -w 90 -c 101
CRITICAL 4.0G |max=14583595008;;; commited=14583595008;;; used=4233922576;;;
(As an aside, the fact that it shows "critical" right now is unimportant; check_jvm has a quirk where it does not accept percentages for the thresholds and i'm using the unmodified version for testing right now, without my patch applied which fixes the threshold computation. The important bit is that it does run and gives back actual data.)
I have dont_blame_nrpe=0, as i believe it should be since i don't send command arguments from the XI server to NRPE daemons.
I've compared nrpe.cfg configuration on an RHEL 7 Tomcat server (which works) and an RHEL 8 one (which does not work). The non-comment lines on both are identical, and are equal to:
Code: Select all
log_facility=daemon
debug=0
pid_file=/var/run/nrpe/nrpe.pid
server_port=5666
nrpe_user=nrpe
nrpe_group=nrpe
allowed_hosts=127.0.0.1,::1
dont_blame_nrpe=0
allow_bash_command_substitution=0
command_timeout=60
connection_timeout=300
disable_syslog=0
command[check_users]=/usr/lib64/nagios/plugins/check_users -w 5 -c 10
command[check_load]=/usr/lib64/nagios/plugins/check_load -r -w .15,.10,.05 -c .30,.25,.20
command[check_hda1]=/usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /dev/hda1
command[check_zombie_procs]=/usr/lib64/nagios/plugins/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/lib64/nagios/plugins/check_procs -w 150 -c 200
include_dir=/etc/nrpe.d/
The configuration that is included from /etc/nrpe.d is also the same on the 2 systems, and looks like this:
Code: Select all
allowed_hosts=lnx-dns3-prod.drake.edu,lnx-dns4-prod.drake.edu,lnx-nagios-prod.drake.edu
command[cups_jobs]=/usr/lib64/nagios/plugins/check_cups_jobs -w 10 -c 20 -W 5m -C 60m --perfdata
command[date_ns]=/bin/date +%s.%N
command[disk]=/usr/lib64/nagios/plugins/check_disk -e -w 15% -c 5% -N btrfs -N ext4 -N tmpfs -N vfat -N xfs
command[iostat]=/usr/lib64/nagios/plugins/check_iostat -dbu -dbuw 50 -dbuc 90 -p
command[load]=/usr/lib64/nagios/plugins/check_load -w 20,14.0,8 -c 40,30.0,20
command[mem]=/usr/lib64/nagios/plugins/check_mem 90 95
command[mountpoints]=/usr/lib64/nagios/plugins/check_mountpoints -a
command[ro_mounts]=/usr/lib64/nagios/plugins/check_ro_mounts -X tmpfs -X nfs -X cifs -X squashfs
command[users]=/usr/lib64/nagios/plugins/check_users -w 5 -c 10
command[dhcpd_pools]=/usr/local/bin/dhcpd-pools --config=/etc/dhcp/dhcpd.conf --leases=/var/lib/dhcpd/dhcpd.leases --warning 80 --critical 90
command[namedconf]=/usr/lib64/nagios/plugins/check_namedconf
command[dig-www-dns1]=/usr/lib64/nagios/plugins/check_dig -l www.drake.edu -H lnx-dns1-prod.drake.edu -w 2.0 -c 5.0
command[dig-www-dns2]=/usr/lib64/nagios/plugins/check_dig -l www.drake.edu -H lnx-dns2-prod.drake.edu -w 2.0 -c 5.0
command[dig-www-dns3]=/usr/lib64/nagios/plugins/check_dig -l www.drake.edu -H lnx-dns3-prod.drake.edu -w 2.0 -c 5.0
command[dig-www-dns4]=/usr/lib64/nagios/plugins/check_dig -l www.drake.edu -H lnx-dns4-prod.drake.edu -w 2.0 -c 5.0
command[mailq]=/usr/lib64/nagios/plugins/check_mailq -M postfix -w 20 -c 100
command[smtp-smtpout]=/usr/lib64/nagios/plugins/check_smtp -H smtpout.drake.edu -w 2 -c 5
command[smtp-smtpout2]=/usr/lib64/nagios/plugins/check_smtp -H smtpout2.drake.edu -w 2 -c 5
command[smtp-drakemx]=/usr/lib64/nagios/plugins/check_smtp -H drake-edu.mail.protection.outlook.com -w 2 -c 5
command[photoreader]=/usr/lib64/nagios/plugins/check_mount /net/photoreader cifs
command[procs-engine]=/usr/lib64/nagios/plugins/check_procs -c 1: -a engine.jar
command[procs-jobsub]=/usr/lib64/nagios/plugins/check_jobsub
command[procs-middleware]=/usr/lib64/nagios/plugins/check_procs -c 8: -w 12: -a Middleware
command[procs-slave]=/usr/lib64/nagios/plugins/check_procs -w 1: -a slave
command[ethosctl]=/usr/lib64/nagios/plugins/check_systemd_service ethosctl.service
command[tomcat_service]=/usr/lib64/nagios/plugins/check_systemd_service tomcat8.service
command[tomcat_heap]=/usr/bin/sudo -u tomcat8 /usr/lib64/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p heap -w 90 -c 101
command[tomcat_nonheap]=/usr/bin/sudo -u tomcat8 /usr/lib64/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p non-heap -w 90 -c 95
command[tomcat_classes]=/usr/bin/sudo -u tomcat8 /usr/lib64/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p classes -w 25000 -c 30000
command[tomcat_threads]=/usr/bin/sudo -u tomcat8 /usr/lib64/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p threads -w 200 -c 500
command[tomcat_sessions]=/usr/bin/sudo -u tomcat8 /usr/lib64/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p sessions -w 5000 -c 10000
On a Debian system it is a bit harder to directly compare the configuration as Debian uses slightly different paths. But from a quick look i believe they are effectively equal. Here's nrpe.cfg from Debian (which does not work):
Code: Select all
log_facility=daemon
debug=0
pid_file=/var/run/nagios/nrpe.pid
server_port=5666
nrpe_user=nagios
nrpe_group=nagios
allowed_hosts=127.0.0.1,::1
dont_blame_nrpe=0
allow_bash_command_substitution=0
command_timeout=60
connection_timeout=300
command[check_users]=/usr/lib/nagios/plugins/check_users -w 5 -c 10
command[check_load]=/usr/lib/nagios/plugins/check_load -r -w .15,.10,.05 -c .30,.25,.20
command[check_hda1]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /dev/hda1
command[check_zombie_procs]=/usr/lib/nagios/plugins/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/lib/nagios/plugins/check_procs -w 150 -c 200
include=/etc/nagios/nrpe_local.cfg
include_dir=/etc/nagios/nrpe.d/
The configuration included from /etc/nagios/nrpe.d is again the same, once path differences on Debian are accounted for. The Debian machine does have one extra check defined ("mariadb"), as it has a database that our RHEL machines lack:
Code: Select all
allowed_hosts=lnx-dns3-prod.drake.edu,lnx-dns4-prod.drake.edu,lnx-nagios-prod.drake.edu
command[cups_jobs]=/usr/lib/nagios/plugins/check_cups_jobs -w 10 -c 20 -W 5m -C 60m --perfdata
command[date_ns]=/bin/date +%s.%N
command[disk]=/usr/lib/nagios/plugins/check_disk -e -w 15% -c 5% -N btrfs -N ext4 -N tmpfs -N vfat -N xfs
command[iostat]=/usr/lib/nagios/plugins/check_iostat -dbu -dbuw 50 -dbuc 90 -p
command[load]=/usr/lib/nagios/plugins/check_load -w 20,14.0,8 -c 40,30.0,20
command[mem]=/usr/lib/nagios/plugins/check_mem 90 95
command[mountpoints]=/usr/lib/nagios/plugins/check_mountpoints -a
command[ro_mounts]=/usr/lib/nagios/plugins/check_ro_mounts -X tmpfs -X nfs -X cifs -X squashfs
command[users]=/usr/lib/nagios/plugins/check_users -w 5 -c 10
command[dhcpd_pools]=/usr/local/bin/dhcpd-pools --config=/etc/dhcp/dhcpd.conf --leases=/var/lib/dhcpd/dhcpd.leases --warning 80 --critical 90
command[namedconf]=/usr/lib/nagios/plugins/check_namedconf
command[dig-www-dns1]=/usr/lib/nagios/plugins/check_dig -l www.drake.edu -H lnx-dns1-prod.drake.edu -w 2.0 -c 5.0
command[dig-www-dns2]=/usr/lib/nagios/plugins/check_dig -l www.drake.edu -H lnx-dns2-prod.drake.edu -w 2.0 -c 5.0
command[dig-www-dns3]=/usr/lib/nagios/plugins/check_dig -l www.drake.edu -H lnx-dns3-prod.drake.edu -w 2.0 -c 5.0
command[dig-www-dns4]=/usr/lib/nagios/plugins/check_dig -l www.drake.edu -H lnx-dns4-prod.drake.edu -w 2.0 -c 5.0
command[mailq]=/usr/lib/nagios/plugins/check_mailq -M postfix -w 20 -c 100
command[smtp-smtpout]=/usr/lib/nagios/plugins/check_smtp -H smtpout.drake.edu -w 2 -c 5
command[smtp-smtpout2]=/usr/lib/nagios/plugins/check_smtp -H smtpout2.drake.edu -w 2 -c 5
command[smtp-drakemx]=/usr/lib/nagios/plugins/check_smtp -H drake-edu.mail.protection.outlook.com -w 2 -c 5
command[photoreader]=/usr/lib/nagios/plugins/check_mount /net/photoreader cifs
command[procs-engine]=/usr/lib/nagios/plugins/check_procs -c 1: -a engine.jar
command[procs-jobsub]=/usr/lib/nagios/plugins/check_jobsub
command[procs-middleware]=/usr/lib/nagios/plugins/check_procs -c 8: -w 12: -a Middleware
command[procs-slave]=/usr/lib/nagios/plugins/check_procs -w 1: -a slave
command[ethosctl]=/usr/lib/nagios/plugins/check_systemd_service ethosctl.service
command[mariadb]=/usr/lib/nagios/plugins/check_mysql -n
command[tomcat_service]=/usr/lib/nagios/plugins/check_systemd_service tomcat9.service
command[tomcat_heap]=/usr/bin/sudo -u tomcat /usr/lib/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p heap -w 90 -c 101
command[tomcat_nonheap]=/usr/bin/sudo -u tomcat /usr/lib/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p non-heap -w 90 -c 95
command[tomcat_classes]=/usr/bin/sudo -u tomcat /usr/lib/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p classes -w 25000 -c 30000
command[tomcat_threads]=/usr/bin/sudo -u tomcat /usr/lib/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p threads -w 200 -c 500
command[tomcat_sessions]=/usr/bin/sudo -u tomcat /usr/lib/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p sessions -w 5000 -c 10000
As for an environment variable problem, i've tried adding a line "env | sort" to the check_jvm script just before it runs "java -jar ..." in order to see what the environment looks like. With that modification in place, here's what i get when running the check from the command line as the nrpe user:
Code: Select all
[nrpe@lnx-b9ssb-devl ~]$ /usr/bin/sudo -u tomcat8 /usr/lib64/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p heap -w 90 -c 101
CDPATH=
ENV=
HISTSIZE=1000
HOME=/usr/share/tomcat8
HOSTNAME=lnx-b9ssb-devl.test.drake.edu
LANG=en_US.UTF-8
LC_COLLATE=C
LC_TIME=en_XX.UTF-8
LOGNAME=tomcat8
LS_COLORS=rs=0:di=38;5;33:ln=38;5;51:mh=00:pi=40;38;5;11:so=38;5;13:do=38;5;5:bd=48;5;232;38;5;11:cd=48;5;232;38;5;3:or=48;5;232;38;5;9:mi=01;05;37;41:su=48;5;196;38;5;15:sg=48;5;11;38;5;16:ca=48;5;196;38;5;226:tw=48;5;10;38;5;16:ow=48;5;10;38;5;21:st=48;5;21;38;5;15:ex=38;5;40:*.tar=38;5;9:*.tgz=38;5;9:*.arc=38;5;9:*.arj=38;5;9:*.taz=38;5;9:*.lha=38;5;9:*.lz4=38;5;9:*.lzh=38;5;9:*.lzma=38;5;9:*.tlz=38;5;9:*.txz=38;5;9:*.tzo=38;5;9:*.t7z=38;5;9:*.zip=38;5;9:*.z=38;5;9:*.dz=38;5;9:*.gz=38;5;9:*.lrz=38;5;9:*.lz=38;5;9:*.lzo=38;5;9:*.xz=38;5;9:*.zst=38;5;9:*.tzst=38;5;9:*.bz2=38;5;9:*.bz=38;5;9:*.tbz=38;5;9:*.tbz2=38;5;9:*.tz=38;5;9:*.deb=38;5;9:*.rpm=38;5;9:*.jar=38;5;9:*.war=38;5;9:*.ear=38;5;9:*.sar=38;5;9:*.rar=38;5;9:*.alz=38;5;9:*.ace=38;5;9:*.zoo=38;5;9:*.cpio=38;5;9:*.7z=38;5;9:*.rz=38;5;9:*.cab=38;5;9:*.wim=38;5;9:*.swm=38;5;9:*.dwm=38;5;9:*.esd=38;5;9:*.jpg=38;5;13:*.jpeg=38;5;13:*.mjpg=38;5;13:*.mjpeg=38;5;13:*.gif=38;5;13:*.bmp=38;5;13:*.pbm=38;5;13:*.pgm=38;5;13:*.ppm=38;5;13:*.tga=38;5;13:*.xbm=38;5;13:*.xpm=38;5;13:*.tif=38;5;13:*.tiff=38;5;13:*.png=38;5;13:*.svg=38;5;13:*.svgz=38;5;13:*.mng=38;5;13:*.pcx=38;5;13:*.mov=38;5;13:*.mpg=38;5;13:*.mpeg=38;5;13:*.m2v=38;5;13:*.mkv=38;5;13:*.webm=38;5;13:*.ogm=38;5;13:*.mp4=38;5;13:*.m4v=38;5;13:*.mp4v=38;5;13:*.vob=38;5;13:*.qt=38;5;13:*.nuv=38;5;13:*.wmv=38;5;13:*.asf=38;5;13:*.rm=38;5;13:*.rmvb=38;5;13:*.flc=38;5;13:*.avi=38;5;13:*.fli=38;5;13:*.flv=38;5;13:*.gl=38;5;13:*.dl=38;5;13:*.xcf=38;5;13:*.xwd=38;5;13:*.yuv=38;5;13:*.cgm=38;5;13:*.emf=38;5;13:*.ogv=38;5;13:*.ogx=38;5;13:*.aac=38;5;45:*.au=38;5;45:*.flac=38;5;45:*.m4a=38;5;45:*.mid=38;5;45:*.midi=38;5;45:*.mka=38;5;45:*.mp3=38;5;45:*.mpc=38;5;45:*.ogg=38;5;45:*.ra=38;5;45:*.wav=38;5;45:*.oga=38;5;45:*.opus=38;5;45:*.spx=38;5;45:*.xspf=38;5;45:
MAIL=/var/spool/mail/nrpe
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/run/nrpe
SHELL=/bin/bash
SHLVL=1
SUDO_COMMAND=/usr/lib64/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p heap -w 90 -c 101
SUDO_GID=982
SUDO_UID=986
SUDO_USER=nrpe
TERM=xterm-256color
USER=tomcat8
_=/usr/bin/env
CRITICAL 3.4G |max=14599847936;;; commited=14599847936;;; used=3555842200;;;
When running the check from the Nagios XI server, there's a smaller environment:
Code: Select all
[root@lnx-nagios-prod ~]# /usr/local/nagios/libexec/check_nrpe -H lnx-b9ssb-devl -u -t 30 -c tomcat_heap
CDPATH=
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/982/bus
ENV=
HOME=/usr/share/tomcat8
LOGNAME=tomcat8
MAIL=/var/mail/tomcat8
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
SHELL=/bin/bash
SHLVL=1
SUDO_COMMAND=/usr/lib64/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p heap -w 90 -c 101
SUDO_GID=982
SUDO_UID=986
SUDO_USER=nrpe
TERM=unknown
USER=tomcat8
XDG_RUNTIME_DIR=/run/user/982
XDG_SESSION_ID=c572
_=/usr/bin/env
UNKNOWN Can't connect to the JVM:
I don't know if any of the differences actually matter. For testing purposes, i have tried adding lines to check_jvm like "export LANG=en_US.UTF-8", "export LC_COLLATE=C", and so on, for the variables that are missing when it is called by the nrpe daemon. But so far i've not found any of those variables to make a difference.