NRPE sudo check_jvm not working on RHEL 8 or Debian 10
Posted: Wed Dec 16, 2020 10:32 am
Hello. I am trying to monitor Tomcat's heap with NRPE. On our older systems (RHEL 7) it works, but the newer ones (RHEL 8, Debian 10) it does not. We run the following mix of software versions:
This is what i see from the Nagios XI server when i run the check against a client:
On the client, "tomcat_heap" is defined like so, using sudo:
For comparison, other NRPE checks (that do not use sudo check_jvm) work just fine:
NRPE runs as the nrpe user, and Tomcat runs as the tomcat8 user, and i've confirmed those by checking "ps aux". As the nrpe user on the client, the command works:
My sudo configuration from /etc/sudoers looks like this:
There's an extra file /etc/sudoers.d/custom that gets included:
The check_jvm command is from https://fidanov.net/c0d3/nagios-plugins/jvminspector/. And i updated it to the latest version that was just released a few days ago. It has global execute permissions:
While testing, SELinux is set to permissive mode on the RHEL 8 machines that are having trouble. And on Debian, SELinux is not even installed. So i know SELinux is not the problem.
I know systemd has some security that it can impose and i've tried looking for things at that level but no luck yet.
I don't think the problem is with sudo itself; as other commands work. For example, if i temporarily change the definition of tomcat_heap to run "id" instead of "check_jvm":
Then i get expected id output when calling it from the XI server:
So there seems to be something specific about how check_jvm runs when called by nrpe running as a daemon. It is odd that it works fine from command line when logged in as the nrpe user though. Any ideas?
Awhile back i had a similar problem (see https://support.nagios.com/forum/viewto ... 16&t=59209) that was solved by installing Java 11 on the affected servers and editing the check_jvm script to run Java 11, even though Tomcat was running under Java 8. That solution only worked for a few weeks though, then it stopped working and i don't know why. I never understood why it worked in the first place though, which makes it hard to fix it when it broke. That partial solution only worked on RHEL 8 anyway, not on Debian. I think i just need to get to the root of the problem and figure out what is going on on both RHEL 8 and Debian 10.
Code: Select all
Quantity OS OpenJDK Tomcat Systemd NRPE Status
2 RHEL 7 7 7 219 4.0.3 Works
4 RHEL 7 8 7 219 4.0.3 Works
28 RHEL 7 8 8.5 219 4.0.3 Works
7 RHEL 8 8 8.5 239 4.0.3 Fail
2 Debian 10 11 9 241 3.2.1 FailCode: Select all
# /usr/local/nagios/libexec/check_nrpe -H lnx-b9ssb-devl -u -t 30 -c tomcat_heap
UNKNOWN Can't connect to the JVM: Code: Select all
command[tomcat_heap]=/usr/bin/sudo -u tomcat8 /usr/lib64/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p heap -w 90 -c 101Code: Select all
# /usr/local/nagios/libexec/check_nrpe -H lnx-b9ssb-devl -u -t 30 -c mem
OK - Memory usage is 48%Code: Select all
[nrpe@lnx-b9ssb-devl ~]$ /usr/bin/sudo -u tomcat8 /usr/lib64/nagios/plugins/check_jvm -n org.apache.catalina.startup.Bootstrap -p heap -w 90 -c 101
OK 23% | max=14616625152;;; commited=14616625152;;; used=3390956184;;;Code: Select all
Defaults !visiblepw
Defaults always_set_home
Defaults match_group_by_gid
Defaults always_query_group_plugin
Defaults env_reset
Defaults env_keep = "COLORS DISPLAY HOSTNAME HISTSIZE KDEDIR LS_COLORS"
Defaults env_keep += "MAIL PS1 PS2 QTDIR USERNAME LANG LC_ADDRESS LC_CTYPE"
Defaults env_keep += "LC_COLLATE LC_IDENTIFICATION LC_MEASUREMENT LC_MESSAGES"
Defaults env_keep += "LC_MONETARY LC_NAME LC_NUMERIC LC_PAPER LC_TELEPHONE"
Defaults env_keep += "LC_TIME LC_ALL LANGUAGE LINGUAS _XKB_CHARSET XAUTHORITY"
Defaults secure_path = /sbin:/bin:/usr/sbin:/usr/bin
root ALL=(ALL) ALL
%wheel ALL=(ALL) ALL
#includedir /etc/sudoers.dCode: Select all
%wheel ALL=(ALL) NOPASSWD: ALL
audit ALL=(ALL) ALL
banner ALL=(root) NOPASSWD:/usr/bin/systemctl
Defaults:nrpe !requiretty
#nrpe ALL=(ALL) NOPASSWD:/usr/lib64/nagios/plugins/*
#Just for testing:
nrpe ALL=(ALL) NOPASSWD: ALL
tomcat8 ALL=(ALL) NOPASSWD: ALLCode: Select all
# ls -l /usr/lib64/nagios/plugins/check_jvm
-r-xr-xr-x. 1 root root 6002 2020-12-15 15:04:50 /usr/lib64/nagios/plugins/check_jvm*I know systemd has some security that it can impose and i've tried looking for things at that level but no luck yet.
I don't think the problem is with sudo itself; as other commands work. For example, if i temporarily change the definition of tomcat_heap to run "id" instead of "check_jvm":
Code: Select all
command[tomcat_heap]=/usr/bin/sudo -u tomcat8 /usr/bin/idCode: Select all
# /usr/local/nagios/libexec/check_nrpe -H lnx-b9ssb-devl -u -t 30 -c tomcat_heap
uid=982(tomcat8) gid=978(tomcat8) groups=978(tomcat8) context=system_u:system_r:nrpe_t:s0Awhile back i had a similar problem (see https://support.nagios.com/forum/viewto ... 16&t=59209) that was solved by installing Java 11 on the affected servers and editing the check_jvm script to run Java 11, even though Tomcat was running under Java 8. That solution only worked for a few weeks though, then it stopped working and i don't know why. I never understood why it worked in the first place though, which makes it hard to fix it when it broke. That partial solution only worked on RHEL 8 anyway, not on Debian. I think i just need to get to the root of the problem and figure out what is going on on both RHEL 8 and Debian 10.