Page 1 of 1

Service not working

Posted: Mon Nov 02, 2020 12:45 pm
by kalyanpabolu
Hello Team,

We are monitoring some services/components status from nagios Core. If the component/service is not Alive (running), Nagios should send alert.

On few servers, we noticed that the service status is showing critical/down in Nagios Core but it is actually UP on remote machine. Other thing is, the command that we are using to check the service, it gives correct output on remote machine but gives critical output from Nagios command line.

We have checked all permissions, we have set the permission of script (the one that is running to check service status) to 777. Nagios has access to that path, there is no issue with the permission, still getting alert. Below is the script that is running:

[root@retprdapp01a plugins]# cat check_opmnctl
#!/bin/bash

LOG_FILE=/var/log/check_opmnctl.log
export USER_NAME=$1
export COMPONENT=$2

if [ -z "$1" -o -z "$2" ]
then
echo "usage: `basename $0` <username> <componentname>"
exit 1
fi

/usr/bin/sudo -u $USER_NAME -i /usr/lib64/nagios/plugins/check_opmnctl.sh $COMPONENT -- | grep -q Alive
if [ $? = 0 ]
then
echo "OK - component $COMPONENT is alive"
exit 0
else
echo "CRITICAL - component $COMPONENT is NOT alive"
exit 2
fi


Output from nagios server:

[root@monprdmgtss03 servers]# /usr/lib64/nagios/plugins/check_nrpe -H 10.50.10.25 -c check_opmnctl -a appwls ohs1
CRITICAL - component ohs1 is NOT alive

Output from remote server:

[root@retprdapp01a plugins]# /usr/lib64/nagios/plugins/check_opmnctl appwls ohs1
OK - component ohs1 is alive

We tried executing above command with nagios user as well, still getting same error.

Below is the command:

command[check_opmnctl]=/usr/lib64/nagios/plugins/check_opmnctl $ARG1$ $ARG2$


Below is the service definition:

define service{
use generic-service-basic
host_name retprdapp01a.mac-erp.net
service_description opmnctl ohs1
check_command check_opmnctl!appwls!ohs1
}


Let me know if you need any other information.

Thanks in advance!!

Re: Service not working

Posted: Mon Nov 02, 2020 2:21 pm
by scottwilkerson
When you are running it on the remote computer you are running as root but nrpe is gonna run it as the nagios user

Test on remote server by running:

Code: Select all

su nagios
/usr/lib64/nagios/plugins/check_opmnctl appwls ohs1

Re: Service not working

Posted: Tue Nov 03, 2020 1:50 am
by kalyanpabolu
Hello,

Its running fine with Nagios user as well.

[root@retprdapp01a ~]# sudo su - nagios
Last login: Tue Nov 3 10:49:17 +04 2020 on pts/0
-sh-4.1$
-sh-4.1$ /usr/lib64/nagios/plugins/check_opmnctl appwls ohs1
OK - component ohs1 is alive
-sh-4.1$ logout
[root@retprdapp01a ~]#

Re: Service not working

Posted: Tue Nov 03, 2020 4:21 am
by TethiS
Hi,

this rings a bell for me also when script was behaving perfectly in shell but failling when executed through the agent.

(I assume /usr/lib64/nagios/plugins/check_opmnctl.sh and /usr/lib64/nagios/plugins/check_opmnctl are different files)

1. I would check first the shebangs of the scripts implied in the check (the first line #!/bin/bash, etc) making sure the bash exists at that path. Executing from an already running bash would not yield an error. From agent or cron it would probably fail
2. If the above checks then I would capture the output from the line:

/usr/bin/sudo -u $USER_NAME -i /usr/lib64/nagios/plugins/check_opmnctl.sh $COMPONENT -- | grep -q Alive

by modifing it temporarily and make it write it to a file and see what exactly from the command changes the result code to non-zero output:

/usr/bin/sudo -u $USER_NAME -i /usr/lib64/nagios/plugins/check_opmnctl.sh $COMPONENT -- >> /tmp/check1.log

Let it run normally for some minutes (triggered by nagios, NOT from the command line):

the output from that /tmp/check1.log might give you some hints on the way to follow from here.

Regards,
Sebastian

Re: Service not working

Posted: Tue Nov 03, 2020 8:52 am
by kalyanpabolu
Hello,

Thanks for your inputs!!

Both scripts are using bash and from nagios as well, we are using bash.

I ran the command with Nagios user in place of appwls user and got below error.

Remote machine output:

[root@retprdapp01a plugins]# /usr/lib64/nagios/plugins/check_opmnctl appwls ohs1
OK - component ohs1 is alive
[root@retprdapp01a plugins]#
[root@retprdapp01a plugins]# /usr/lib64/nagios/plugins/check_opmnctl nagios ohs1
/usr/lib64/nagios/plugins/check_opmnctl.sh: ./opmnctl: /sw/weblogic/as_1/perl/bin/perl: bad interpreter: Permission denied
CRITICAL - component ohs1 is NOT alive
[root@retprdapp01a plugins]#


Nagios machine output:

[root@monprdmgtss03 tmp]# /usr/lib64/nagios/plugins/check_nrpe -H 10.50.10.25 -c check_opmnctl -a appwls ohs1
CRITICAL - component ohs1 is NOT alive
[root@monprdmgtss03 tmp]#
[root@monprdmgtss03 tmp]#
[root@monprdmgtss03 tmp]#
[root@monprdmgtss03 tmp]# /usr/lib64/nagios/plugins/check_nrpe -H 10.50.10.25 -c check_opmnctl -a nagios ohs1
CRITICAL - component ohs1 is NOT alive
[root@monprdmgtss03 tmp]#


When I am running the command given in script, below is the output:

Remote machine:
[root@retprdapp01a plugins]# /usr/bin/sudo -u appwls -i /usr/lib64/nagios/plugins/check_opmnctl.sh ohs1
Alive

[root@retprdapp01a plugins]# /usr/bin/sudo -u nagios -i /usr/lib64/nagios/plugins/check_opmnctl.sh ohs1
/usr/lib64/nagios/plugins/check_opmnctl.sh: ./opmnctl: /sw/weblogic/as_1/perl/bin/perl: bad interpreter: Permission denied
[root@retprdapp01a plugins]#


Nagios machine:
[root@monprdmgtss03 tmp]# /usr/bin/sudo -u appwls -i /usr/lib64/nagios/plugins/check_opmnctl.sh ohs1
sudo: unknown user: appwls
[root@monprdmgtss03 tmp]#
[root@monprdmgtss03 tmp]# /usr/bin/sudo -u nagios -i /usr/lib64/nagios/plugins/check_opmnctl.sh ohs1
/usr/lib64/nagios/plugins/check_opmnctl.sh: line 2: cd: /home/appaia: No such file or directory
/usr/lib64/nagios/plugins/check_opmnctl.sh: line 3: ./.bash_profile: No such file or directory
/usr/lib64/nagios/plugins/check_opmnctl.sh: line 4: ./SOA.env: No such file or directory
/usr/lib64/nagios/plugins/check_opmnctl.sh: line 5: ./OHS1.env: No such file or directory
/usr/lib64/nagios/plugins/check_opmnctl.sh: line 6: opmnctl: command not found
[root@monprdmgtss03 tmp]#


I could not capture anything in log file as the command itself is not running.

Please suggest!!

Re: Service not working

Posted: Tue Nov 03, 2020 12:21 pm
by scottwilkerson
Can you share the content of the plugin /usr/lib64/nagios/plugins/check_opmnctl.sh