check_nrpe dosnt excute (pcs commands) on remote server

fsodah · Post by **fsodah** » Mon Feb 15, 2021 11:46 am

Hello,

I have a script which is responsible for executing multiple commands to check cluster behaviour like (pcs status). this script is installed on remote servers and it works fine locally. but when I try to execute this script via check_nrpe from Nagios server the scripts run without executing the "pcs status " command.

I think it's related to some permission.

can you please help?

Post by **tgriep** » Mon Feb 15, 2021 3:15 pm

Try this, edit the nrpe.cfg file on the remote host and change the following from

debug=0

to

Code: Select all

debug=1

Save the change and restart the NRPE agent.

Check the /var/log/messages file for any errors when the plugin is ran on the remote system and post them here.

Also, post the command that you defined in the nrpe.cfg file on the remote server and the script you are trying to run so we can see what it is doing.

One more thing, the NRPE agent runs the commands as the nagios user so make sure it can run the applications needed to gather the data and that is can find the commands on the path.

fsodah · Post by **fsodah** » Tue Feb 16, 2021 1:19 am

the client output /var/log/messages when executing the NRPE script from Nagios server :

Feb 16 08:09:21 dascsdbo00001b nrpe[14016]: CONN_CHECK_PEER: checking if host is allowed: 10.1.23.222 port 20100
Feb 16 08:09:21 dascsdbo00001b nrpe[14016]: is_an_allowed_host (AF_INET): is host >10.1.23.222< an allowed host >10.1.23.222<
Feb 16 08:09:21 dascsdbo00001b nrpe[14016]: is_an_allowed_host (AF_INET): is host >10.1.23.222< an allowed host >10.1.23.222<
Feb 16 08:09:21 dascsdbo00001b nrpe[14016]: is_an_allowed_host (AF_INET): host is in allowed host list!
Feb 16 08:09:21 dascsdbo00001b nrpe[14017]: WARNING: my_system() seteuid(0): Operation not permitted
Feb 16 08:09:21 dascsdbo00001b dbus[1097]: [system] Activating service name='org.fedoraproject.Setroubleshootd' (using servicehelper)
Feb 16 08:09:21 dascsdbo00001b dbus[1097]: [system] Successfully activated service 'org.fedoraproject.Setroubleshootd'
Feb 16 08:09:21 dascsdbo00001b setroubleshoot: SELinux is preventing /usr/bin/python2.7 from execute access on the file /usr/sbin/corosync. For complete SELinux messages run: sealert -l 147ebee6-e792-44a6-b763-7ec6c5992f0a
Feb 16 08:09:21 dascsdbo00001b python: SELinux is preventing /usr/bin/python2.7 from execute access on the file /usr/sbin/corosync.#012#012***** Plugin catchall (100. confidence) suggests **************************#012#012If you believe that python2.7 should be allowed execute access on the corosync file by default.#012Then you should report this as a bug.#012You can generate a local policy module to allow this access.#012Do#012allow this access for now by executing:#012# ausearch -c 'pcs' --raw | audit2allow -M my-pcs#012# semodule -i my-pcs.pp#012
Feb 16 08:09:22 dascsdbo00001b setroubleshoot: SELinux is preventing check_cluster.s from getattr access on the file /usr/bin/sudo. For complete SELinux messages run: sealert -l eec466b1-71b7-4d18-b8a6-545868e07d17
Feb 16 08:09:22 dascsdbo00001b python: SELinux is preventing check_cluster.s from getattr access on the file /usr/bin/sudo.#012#012***** Plugin catchall_boolean (89.3 confidence) suggests ******************#012#012If you want to allow nagios to run sudo#012Then you must tell SELinux about this by enabling the 'nagios_run_sudo' boolean.#012#012Do#012setsebool -P nagios_run_sudo 1#012#012***** Plugin catchall (11.6 confidence) suggests **************************#012#012If you believe that check_cluster.s should be allowed getattr access on the sudo file by default.#012Then you should report this as a bug.#012You can generate a local policy module to allow this access.#012Do#012allow this access for now by executing:#012# ausearch -c 'check_cluster.s' --raw | audit2allow -M my-checkclusters#012# semodule -i my-checkclusters.pp#012
Feb 16 08:09:22 dascsdbo00001b setroubleshoot: SELinux is preventing check_cluster.s from getattr access on the file /usr/bin/sudo. For complete SELinux messages run: sealert -l eec466b1-71b7-4d18-b8a6-545868e07d17
Feb 16 08:09:22 dascsdbo00001b python: SELinux is preventing check_cluster.s from getattr access on the file /usr/bin/sudo.#012#012***** Plugin catchall_boolean (89.3 confidence) suggests ******************#012#012If you want to allow nagios to run sudo#012Then you must tell SELinux about this by enabling the 'nagios_run_sudo' boolean.#012#012Do#012setsebool -P nagios_run_sudo 1#012#012***** Plugin catchall (11.6 confidence) suggests **************************#012#012If you believe that check_cluster.s should be allowed getattr access on the sudo file by default.#012Then you should report this as a bug.#012You can generate a local policy module to allow this access.#012Do#012allow this access for now by executing:#012# ausearch -c 'check_cluster.s' --raw | audit2allow -M my-checkclusters#012# semodule -i my-checkclusters.pp#012

the command that you defined in the nrpe.cfg:
command[check_cluster_env]=/usr/lib64/nagios/plugins/check_cluster.sh $ARG1$

Post by **tgriep** » Tue Feb 16, 2021 10:50 am

Selinux is enabled on the remote server and is blocking the plugin from running.
In the output from the /var/log/messages file, it ezplaigns that is it blocked and it has the commands to enable it to by-pass selinux.
Here are the commands to bypass selinux.

Run them as root on the remote server.

Code: Select all

#012# ausearch -c 'pcs' --raw | audit2allow -M my-pcs#012# semodule -i my-pcs.pp#012
#012# ausearch -c 'check_cluster.s' --raw | audit2allow -M my-checkclusters#012# semodule -i my-checkclusters.pp#012
#012# ausearch -c 'check_cluster.s' --raw | audit2allow -M my-checkclusters#012# semodule -i my-checkclusters.pp#012

Then see if you can run the NRPE command from the Nagios server.

If it does not run, check the /var/log/messages file for any new log entries for selinux blocking the plugin.
Run the commands to bypass it until the plugin runs.

fsodah · Post by **fsodah** » Tue Feb 16, 2021 12:28 pm

Thanks for your reply,
after applying all SElinux commands, still, the NRPE script doesn't work well,

The output of /var/log/messages on the client-side is :
Feb 16 19:20:56 dascsdbo00001b nrpe[14227]: CONN_CHECK_PEER: checking if host is allowed: 10.1.23.222 port 43245
Feb 16 19:20:56 dascsdbo00001b nrpe[14227]: is_an_allowed_host (AF_INET): is host >10.1.23.222< an allowed host >10.1.23.222<
Feb 16 19:20:56 dascsdbo00001b nrpe[14227]: is_an_allowed_host (AF_INET): is host >10.1.23.222< an allowed host >10.1.23.222<
Feb 16 19:20:56 dascsdbo00001b nrpe[14227]: is_an_allowed_host (AF_INET): host is in allowed host list!
Feb 16 19:20:56 dascsdbo00001b nrpe[14228]: WARNING: my_system() seteuid(0): Operation not permitted
Feb 16 19:20:56 dascsdbo00001b systemd: Started Session c164372 of user root.
Feb 16 19:20:56 dascsdbo00001b dbus[1097]: [system] Activating service name='org.fedoraproject.Setroubleshootd' (using servicehelper)
Feb 16 19:20:57 dascsdbo00001b dbus[1097]: [system] Successfully activated service 'org.fedoraproject.Setroubleshootd'
Feb 16 19:20:57 dascsdbo00001b setroubleshoot: Exception during AVC analysis: must be encoded string without NULL bytes, not str
Feb 16 19:21:00 dascsdbo00001b setroubleshoot: Exception during AVC analysis: must be encoded string without NULL bytes, not st

Post by **tgriep** » Tue Feb 16, 2021 1:14 pm

Post the full nrpe.cfg file and the /usr/lib64/nagios/plugins/check_cluster.sh script so we can look at it.

Run this as root on the remote system and post the output.

Code: Select all

ps -ef --cols=300 |grep nrpe

If the plugin requires root permissions to run, try doing this.

Edit the /etc/sudoers file and add the following entry

Code: Select all

nrpe ALL=NOPASSWD: /usr/lib64/nagios/plugins/check_cluster.sh

You may need to add a line to the pcs command in the sudoers file as well.
Here is an example. Make sure you update the path to the pcs command.

Code: Select all

nrpe ALL=NOPASSWD: /usr/bin/pcs

Next edit the nrpe.cfg file and add sudo to it.

Code: Select all

command[check_cluster_env]=sudo /usr/lib64/nagios/plugins/check_cluster.sh $ARG1$

Save the change and restart the nrpe service and see if that helps.

fsodah · Post by **fsodah** » Tue Feb 16, 2021 1:59 pm

The output of (ps -ef --cols=300 |grep nrpe) on the client side :
nrpe 3799 1 0 20:09 ? 00:00:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -f
root 57798 57036 0 20:51 pts/1 00:00:00 grep --color=auto nrpe
i added nrpe to sudors as you suggested, but now when call the script from Nagios server using NRPE :

[root@nagios libexec]# ./check_nrpe -H XX.XX.XX.XX -c check_cluster_env -a 'Stonith'
NRPE: Unable to read output

attached nrpe.cfg and check_cluster.sh scripts

Post by **tgriep** » Tue Feb 16, 2021 4:14 pm

Login to the remote server and change to the nrpe user by running the following.

Code: Select all

su - nrpe

Then run this to see what the output of the plugin is when it runs.

Code: Select all

bash -x /usr/lib64/nagios/plugins/check_cluster.sh Stonith
echo $?
/usr/sbin/pcs stonith show

Post all of the output.

fsodah · Post by **fsodah** » Wed Feb 17, 2021 1:43 am

Please note that nrpe is no login user
nrpe

996:992:NRPE user for the NRPE service:/var/run/nrpe:/sbin/nologin

so we cannot switch to that user

- so i did the following:
sudo -s -u nrpe
- NOW IAM nrpe user

- The output of (bash -x /usr/lib64/nagios/plugins/check_cluster.sh Stonith) is:
bash-4.2$ bash -x /usr/lib64/nagios/plugins/check_cluster.sh Stonith
+ CRM=/usr/sbin/pcs
+ CRMV=/usr/sbin/crm_verify
+ STATE_OK=0
+ STATE_WARNING=1
+ STATE_CRITICAL=2
+ STATE_UNKNOWN=3
+ '[' 1 -lt 1 ']'
++ /usr/sbin/pcs config show
++ grep -A1 'Corosync Nodes:'
++ tail -n1
Error: error running crm_mon, is pacemaker running?
+ nodelist=
+ case $1 in
+ checkstonith
++ sudo /usr/sbin/pcs stonith show
++ grep -i Started
++ wc -l

We trust you have received the usual lecture from the local System
Administrator. It usually boils down to these three things:

#1) Respect the privacy of others.
#2) Think before you type.
#3) With great power comes great responsibility.

- the Output of (echo $?) is:
1

- the output of (/usr/sbin/pcs stonith show)

bash-4.2$ /usr/sbin/pcs stonith show
Error: unable to get cluster status from crm_mon

Error: cluster is not available on this node

Post by **tgriep** » Wed Feb 17, 2021 10:09 am

Since the plugin is using a shell, the nrpe user has to be able to login.

Change this from

Code: Select all

nrpe:x:996:992:NRPE user for the NRPE service:/var/run/nrpe:/sbin/nologin

to

Code: Select all

nrpe:x:996:992:NRPE user for the NRPE service:/var/run/nrpe:/bin/bash

Next, you need to figure out why you cannot get the Cluster status when running this.

/usr/sbin/pcs stonith show
Error: unable to get cluster status from crm_mon
Error: cluster is not available on this node

Nagios Support Forum

check_nrpe dosnt excute (pcs commands) on remote server

check_nrpe dosnt excute (pcs commands) on remote server

Re: check_nrpe dosnt excute (pcs commands) on remote server

Re: check_nrpe dosnt excute (pcs commands) on remote server

Re: check_nrpe dosnt excute (pcs commands) on remote server

Re: check_nrpe dosnt excute (pcs commands) on remote server

Re: check_nrpe dosnt excute (pcs commands) on remote server

Re: check_nrpe dosnt excute (pcs commands) on remote server

Re: check_nrpe dosnt excute (pcs commands) on remote server

Re: check_nrpe dosnt excute (pcs commands) on remote server

Re: check_nrpe dosnt excute (pcs commands) on remote server