Page 1 of 2

Communication error between Oracle and Nagios

Posted: Thu Nov 11, 2021 2:55 pm
by pdelgado1989
I'm trying to adapt some Perl scripts to Bash to run them in Nagios XI.

The complete script, so far, is this one:

Code: Select all

. /home/oracle/.profile_RAC
ORACLE_HOME=/oracle/app/grid/19300
ORACLE_BASE=/oracle/app/base
nagios_exit_codes=('UNKNOWN', 3, 'OK', 0, 'WARNING', 1, 'CRITICAL', 2)
status='OK'
ok=1
action=$1

case $action in
        "votedisk")
                #command=`/oracle/app/grid/19300/bin/crsctl query css votedisk | grep asm`
                #command=$(/oracle/app/grid/19300/bin/crsctl query css votedisk)
                command=`/oracle/app/grid/19300/bin/crsctl query css votedisk`

                case $comando in
                        *"failed"*|*"OFFLINE"*|*"PROC"*)
                                status='CRITICAL'
                                output_msg="Voting disk status check failed!"
                        ;;

                        * )
                                output_msg="Voting disks status check succeeded"
                        ;;
                esac

                output="[$status] $output_msg - $command"

        ;;

        "clusterstatus")
                comando=`/oracle/app/grid/19300/bin/crsctl query crs releaseversion`
                output_msg="All clusterware services are up (clusterware version: $comando)"
                output="$output_msg"

        ;;
esac


echo -e $output
exit 0
Running this script locally, the result is this:

Code: Select all

[root@bbddmachine plugins]# sh ./script_prueba.sh votedisk
[OK] Voting disks status check succeeded - ## STATE File Universal Id File Name Disk group -- ----- ----------------- --------- --------- 1. ONLINE 8dfc2a9528244f95bf87bb394e793995 (/dev/mapper/asm_ocr1) [OCR] Located 1 voting disk(s).
But on the Nagios machine, the result is wrong:

Code: Select all

[nagios@ng1esp libexec]$ ./check_nrpe -2 -H 172.47.62.12 -t 60 -c check_crs_votedisk
[OK] Voting disks status check succeeded - Unable to communicate with the Cluster Synchronization Services daemon.
However, if I launch the other option in the script called clusterstatus everything works fine:

Code: Select all

[nagios@ng1esp libexec]$ ./check_nrpe -2 -H 172.47.62.12 -t 60 -c check_crs_clusterstatus
All clusterware services are up (clusterware version: Oracle High Availability Services release version on the local node is [19.0.0.0.0])

Re: Communication error between Oracle and Nagios

Posted: Fri Nov 12, 2021 11:18 am
by ssax
The proper way to test on bbddmachine is to do this:

Code: Select all

su - nagios
# cd into the plugins directory
sh ./script_prueba.sh votedisk
If that doesn't work, add a -x to it and send the output:

Code: Select all

sh -x ./script_prueba.sh votedisk
The assumption is that nagios doesn't have permissions for something, you may need to run it through sudo if you're unable to adjust the permissions.

Re: Communication error between Oracle and Nagios

Posted: Wed Nov 17, 2021 5:08 am
by pdelgado1989
Hi @ssax.

Following your indications, I have executed in bbddmachine with the user nrpe (equivalent to the user nagios). The output is the following, which is successful execution:

Code: Select all

[root@bbddmachine plugins]# sudo su - nrpe
[nrpe@bbddmachine plugins]$ sh ./script_prueba.sh votedisk
[OK] Voting disks status check succeeded - ## STATE File Universal Id File Name Disk group -- ----- ----------------- --------- --------- 1. ONLINE 8dfc2a9528244f95bf87bb394e793995 (/dev/mapper/asm_ocr1) [OCR] Located 1 voting disk(s).
But the execution on the Nagios machine still does not work:

Code: Select all

[nagios@ng1esp libexec]$ ./check_nrpe -2 -H 172.27.68.132 -t 60 -c check_crs_votedisk
[OK] Voting disks status check succeeded - Unable to communicate with the Cluster Synchronization Services daemon.

Re: Communication error between Oracle and Nagios

Posted: Wed Nov 17, 2021 6:04 pm
by benjaminsmith
Hi,

Try testing this once again as the nagios user account ( instead of nrpe) and let us know if you get different results. Thanks.

Code: Select all

su - nagios
# cd into the plugins directory
sh ./script_prueba.sh votedisk

Re: Communication error between Oracle and Nagios

Posted: Thu Nov 18, 2021 4:43 am
by pdelgado1989
Hi.

The user that we have defined in /etc/nagios/nrpe.cfg to execute the scripts is nrpe:

Code: Select all

# NRPE USER
# This determines the effective user that the NRPE daemon should run as.
# You can either supply a username or a UID.
#
# NOTE: This option is ignored if NRPE is running under either inetd or xinetd

nrpe_user=nrpe

# NRPE GROUP
# This determines the effective group that the NRPE daemon should run as.
# You can either supply a group name or a GID.
#
# NOTE: This option is ignored if NRPE is running under either inetd or xinetd

nrpe_group=nrpe
So we run the script locally with the user nrpe.

Re: Communication error between Oracle and Nagios

Posted: Thu Nov 18, 2021 2:13 pm
by benjaminsmith
Hi,

Do you recall if you installed this from source or using the Linux Agent installer?

Also, please post the output to the following command.

Code: Select all

cat /etc/xinetd.d/nrpe
Thanks,
Benjamin

Re: Communication error between Oracle and Nagios

Posted: Fri Nov 19, 2021 4:26 am
by pdelgado1989
Hi.

It was installed via rpm package. The packages installed are:

nagios-common-4.4.5-1.el8.x86_64.rpm
nrpe-4.0.3-1.el8.x86_64.rpm
nagios-plugins-2.3.3-4.el8.x86_64.rpm
nagios-plugins-disk-2.3.3-4.el8.x86_64.rpm
nagios-plugins-load-2.3.3-4.el8.x86_64.rpm
nagios-plugins-procs-2.3.3-4.el8.x86_64.rpm
nagios-plugins-swap-2.3.3-4.el8.x86_64.rpm
nagios-plugins-users-2.3.3-4.el8.x86_64.rpm

OS version is Red Hat Enterprise Linux release 8.0 (Ootpa)

For the command

Code: Select all

cat /etc/xinetd.d/nrpe
we do not have the xinetd.d package installed.

Re: Communication error between Oracle and Nagios

Posted: Fri Nov 19, 2021 11:22 am
by benjaminsmith
Hi,

The yum install is not maintained by Nagios so it setups the agent up a little differently than our installer script ( see: https://assets.nagios.com/downloads/nag ... _Agent.pdf ).

Let's check the permission on the plugin, I believe those would be in the following directory but you may have to modify the command below to your system.

Code: Select all

 ls -l /usr/lib64/nagios/plugins
My system looks like this:

Code: Select all

[root@localhost plugins]# ls -l /usr/lib64/nagios/plugins
total 256
-rwxrwxr-x. 1 root root 110320 Apr  2  2021 check_http
-rwxrwxr-x. 1 root root  55328 Apr  2  2021 check_load
drwxr-xr-x. 2 root root      6 Mar  7  2021 eventhandlers
-rwxr-xr-x. 1 root root  42760 Apr  2  2021 negate
-rwxr-xr-x. 1 root root  42528 Apr  2  2021 urlize
-rwxr-xr-x. 1 root root   2791 Apr  2  2021 utils.sh
Also, can you pm the nrpe.cfg file from the system, I'd like to check the command definitions as well.

Thanks,
Benjamin

Re: Communication error between Oracle and Nagios

Posted: Wed Nov 24, 2021 7:10 am
by pdelgado1989
Hi.

My system looks like this:

Code: Select all

[root@bdx1edara plugins]# ls -la
total 444
drwxr-xr-x. 3 root root   267 nov 24 12:57 .
drwxr-xr-x. 3 root root    21 sep  2  2020 ..
-rw-r--r--  1 root root  8532 nov  9 17:04 1
-rwxrwxr-x  1 root root  9842 nov 12 13:20 check_crs
-rwxrwxr-x  1 root root  8730 nov 12 12:04 check_crs_bkp
-rwxrwxr-x. 1 root root 94104 jun 30  2020 check_disk
-rwxrwxr-x. 1 root root 55312 jun 30  2020 check_load
-rwxrwxr-x. 1 root root  3418 sep  2  2020 check_mem
-rwxrwxr-x. 1 root root 64112 jun 30  2020 check_procs
-rwxrwxr-x. 1 root root 47056 jun 30  2020 check_swap
-rwxrwxr-x. 1 root root 42840 jun 30  2020 check_users
drwxr-xr-x. 2 root root     6 ago 29  2019 eventhandlers
-rw-r--r--  1 root root   184 nov 16 13:09 fich.tmp
-rwxr-xr-x. 1 root root 42736 jun 30  2020 negate
-rwxrwxrwx  1 root root  2741 nov 17 14:39 script_prueba.sh
-rwxr-xr-x. 1 root root 42520 jun 30  2020 urlize
-rwxr-xr-x. 1 root root  2791 jun 30  2020 utils.sh
This is how I have the commands defined in the nrpe.cfg file:

Code: Select all

command[check_crs_votedisk]=/usr/lib64/nagios/plugins/script_prueba.sh votedisk
command[check_crs_clusterstatus]=/usr/lib64/nagios/plugins/script_prueba.sh clusterstatus
command[check_crs_dbservicelocation]=/usr/lib64/nagios/plugins/script_prueba.sh dbservicelocation

Re: Communication error between Oracle and Nagios

Posted: Wed Nov 24, 2021 3:08 pm
by benjaminsmith
Hi,

Thanks for checking that, it looks good. Can you share this script? I'd like to see if I can understand how it's checking the Cluster Synchronization Services daemon and if it might be taking too long when running from the Nagios server.
[OK] Voting disks status check succeeded - Unable to communicate with the Cluster Synchronization Services daemon