Hello Team,
We are monitoring some services/components status from nagios Core. If the component/service is not Alive (running), Nagios should send alert.
On few servers, we noticed that the service status is showing critical/down in Nagios Core but it is actually UP on remote machine. Other thing is, the command that we are using to check the service, it gives correct output on remote machine but gives critical output from Nagios command line.
We have checked all permissions, we have set the permission of script (the one that is running to check service status) to 777. Nagios has access to that path, there is no issue with the permission, still getting alert. Below is the script that is running:
[root@retprdapp01a plugins]# cat check_opmnctl
#!/bin/bash
LOG_FILE=/var/log/check_opmnctl.log
export USER_NAME=$1
export COMPONENT=$2
if [ -z "$1" -o -z "$2" ]
then
echo "usage: `basename $0` <username> <componentname>"
exit 1
fi
/usr/bin/sudo -u $USER_NAME -i /usr/lib64/nagios/plugins/check_opmnctl.sh $COMPONENT -- | grep -q Alive
if [ $? = 0 ]
then
echo "OK - component $COMPONENT is alive"
exit 0
else
echo "CRITICAL - component $COMPONENT is NOT alive"
exit 2
fi
Output from nagios server:
[root@monprdmgtss03 servers]# /usr/lib64/nagios/plugins/check_nrpe -H 10.50.10.25 -c check_opmnctl -a appwls ohs1
CRITICAL - component ohs1 is NOT alive
Output from remote server:
[root@retprdapp01a plugins]# /usr/lib64/nagios/plugins/check_opmnctl appwls ohs1
OK - component ohs1 is alive
We tried executing above command with nagios user as well, still getting same error.
Below is the command:
command[check_opmnctl]=/usr/lib64/nagios/plugins/check_opmnctl $ARG1$ $ARG2$
Below is the service definition:
define service{
use generic-service-basic
host_name retprdapp01a.mac-erp.net
service_description opmnctl ohs1
check_command check_opmnctl!appwls!ohs1
}
Let me know if you need any other information.
Thanks in advance!!
Service not working
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Service not working
When you are running it on the remote computer you are running as root but nrpe is gonna run it as the nagios user
Test on remote server by running:
Test on remote server by running:
Code: Select all
su nagios
/usr/lib64/nagios/plugins/check_opmnctl appwls ohs1-
kalyanpabolu
- Posts: 246
- Joined: Fri Jul 03, 2020 4:18 am
Re: Service not working
Hello,
Its running fine with Nagios user as well.
[root@retprdapp01a ~]# sudo su - nagios
Last login: Tue Nov 3 10:49:17 +04 2020 on pts/0
-sh-4.1$
-sh-4.1$ /usr/lib64/nagios/plugins/check_opmnctl appwls ohs1
OK - component ohs1 is alive
-sh-4.1$ logout
[root@retprdapp01a ~]#
Its running fine with Nagios user as well.
[root@retprdapp01a ~]# sudo su - nagios
Last login: Tue Nov 3 10:49:17 +04 2020 on pts/0
-sh-4.1$
-sh-4.1$ /usr/lib64/nagios/plugins/check_opmnctl appwls ohs1
OK - component ohs1 is alive
-sh-4.1$ logout
[root@retprdapp01a ~]#
Re: Service not working
Hi,
this rings a bell for me also when script was behaving perfectly in shell but failling when executed through the agent.
(I assume /usr/lib64/nagios/plugins/check_opmnctl.sh and /usr/lib64/nagios/plugins/check_opmnctl are different files)
1. I would check first the shebangs of the scripts implied in the check (the first line #!/bin/bash, etc) making sure the bash exists at that path. Executing from an already running bash would not yield an error. From agent or cron it would probably fail
2. If the above checks then I would capture the output from the line:
/usr/bin/sudo -u $USER_NAME -i /usr/lib64/nagios/plugins/check_opmnctl.sh $COMPONENT -- | grep -q Alive
by modifing it temporarily and make it write it to a file and see what exactly from the command changes the result code to non-zero output:
/usr/bin/sudo -u $USER_NAME -i /usr/lib64/nagios/plugins/check_opmnctl.sh $COMPONENT -- >> /tmp/check1.log
Let it run normally for some minutes (triggered by nagios, NOT from the command line):
the output from that /tmp/check1.log might give you some hints on the way to follow from here.
Regards,
Sebastian
this rings a bell for me also when script was behaving perfectly in shell but failling when executed through the agent.
(I assume /usr/lib64/nagios/plugins/check_opmnctl.sh and /usr/lib64/nagios/plugins/check_opmnctl are different files)
1. I would check first the shebangs of the scripts implied in the check (the first line #!/bin/bash, etc) making sure the bash exists at that path. Executing from an already running bash would not yield an error. From agent or cron it would probably fail
2. If the above checks then I would capture the output from the line:
/usr/bin/sudo -u $USER_NAME -i /usr/lib64/nagios/plugins/check_opmnctl.sh $COMPONENT -- | grep -q Alive
by modifing it temporarily and make it write it to a file and see what exactly from the command changes the result code to non-zero output:
/usr/bin/sudo -u $USER_NAME -i /usr/lib64/nagios/plugins/check_opmnctl.sh $COMPONENT -- >> /tmp/check1.log
Let it run normally for some minutes (triggered by nagios, NOT from the command line):
the output from that /tmp/check1.log might give you some hints on the way to follow from here.
Regards,
Sebastian
-
kalyanpabolu
- Posts: 246
- Joined: Fri Jul 03, 2020 4:18 am
Re: Service not working
Hello,
Thanks for your inputs!!
Both scripts are using bash and from nagios as well, we are using bash.
I ran the command with Nagios user in place of appwls user and got below error.
Remote machine output:
[root@retprdapp01a plugins]# /usr/lib64/nagios/plugins/check_opmnctl appwls ohs1
OK - component ohs1 is alive
[root@retprdapp01a plugins]#
[root@retprdapp01a plugins]# /usr/lib64/nagios/plugins/check_opmnctl nagios ohs1
/usr/lib64/nagios/plugins/check_opmnctl.sh: ./opmnctl: /sw/weblogic/as_1/perl/bin/perl: bad interpreter: Permission denied
CRITICAL - component ohs1 is NOT alive
[root@retprdapp01a plugins]#
Nagios machine output:
[root@monprdmgtss03 tmp]# /usr/lib64/nagios/plugins/check_nrpe -H 10.50.10.25 -c check_opmnctl -a appwls ohs1
CRITICAL - component ohs1 is NOT alive
[root@monprdmgtss03 tmp]#
[root@monprdmgtss03 tmp]#
[root@monprdmgtss03 tmp]#
[root@monprdmgtss03 tmp]# /usr/lib64/nagios/plugins/check_nrpe -H 10.50.10.25 -c check_opmnctl -a nagios ohs1
CRITICAL - component ohs1 is NOT alive
[root@monprdmgtss03 tmp]#
When I am running the command given in script, below is the output:
Remote machine:
[root@retprdapp01a plugins]# /usr/bin/sudo -u appwls -i /usr/lib64/nagios/plugins/check_opmnctl.sh ohs1
Alive
[root@retprdapp01a plugins]# /usr/bin/sudo -u nagios -i /usr/lib64/nagios/plugins/check_opmnctl.sh ohs1
/usr/lib64/nagios/plugins/check_opmnctl.sh: ./opmnctl: /sw/weblogic/as_1/perl/bin/perl: bad interpreter: Permission denied
[root@retprdapp01a plugins]#
Nagios machine:
[root@monprdmgtss03 tmp]# /usr/bin/sudo -u appwls -i /usr/lib64/nagios/plugins/check_opmnctl.sh ohs1
sudo: unknown user: appwls
[root@monprdmgtss03 tmp]#
[root@monprdmgtss03 tmp]# /usr/bin/sudo -u nagios -i /usr/lib64/nagios/plugins/check_opmnctl.sh ohs1
/usr/lib64/nagios/plugins/check_opmnctl.sh: line 2: cd: /home/appaia: No such file or directory
/usr/lib64/nagios/plugins/check_opmnctl.sh: line 3: ./.bash_profile: No such file or directory
/usr/lib64/nagios/plugins/check_opmnctl.sh: line 4: ./SOA.env: No such file or directory
/usr/lib64/nagios/plugins/check_opmnctl.sh: line 5: ./OHS1.env: No such file or directory
/usr/lib64/nagios/plugins/check_opmnctl.sh: line 6: opmnctl: command not found
[root@monprdmgtss03 tmp]#
I could not capture anything in log file as the command itself is not running.
Please suggest!!
Thanks for your inputs!!
Both scripts are using bash and from nagios as well, we are using bash.
I ran the command with Nagios user in place of appwls user and got below error.
Remote machine output:
[root@retprdapp01a plugins]# /usr/lib64/nagios/plugins/check_opmnctl appwls ohs1
OK - component ohs1 is alive
[root@retprdapp01a plugins]#
[root@retprdapp01a plugins]# /usr/lib64/nagios/plugins/check_opmnctl nagios ohs1
/usr/lib64/nagios/plugins/check_opmnctl.sh: ./opmnctl: /sw/weblogic/as_1/perl/bin/perl: bad interpreter: Permission denied
CRITICAL - component ohs1 is NOT alive
[root@retprdapp01a plugins]#
Nagios machine output:
[root@monprdmgtss03 tmp]# /usr/lib64/nagios/plugins/check_nrpe -H 10.50.10.25 -c check_opmnctl -a appwls ohs1
CRITICAL - component ohs1 is NOT alive
[root@monprdmgtss03 tmp]#
[root@monprdmgtss03 tmp]#
[root@monprdmgtss03 tmp]#
[root@monprdmgtss03 tmp]# /usr/lib64/nagios/plugins/check_nrpe -H 10.50.10.25 -c check_opmnctl -a nagios ohs1
CRITICAL - component ohs1 is NOT alive
[root@monprdmgtss03 tmp]#
When I am running the command given in script, below is the output:
Remote machine:
[root@retprdapp01a plugins]# /usr/bin/sudo -u appwls -i /usr/lib64/nagios/plugins/check_opmnctl.sh ohs1
Alive
[root@retprdapp01a plugins]# /usr/bin/sudo -u nagios -i /usr/lib64/nagios/plugins/check_opmnctl.sh ohs1
/usr/lib64/nagios/plugins/check_opmnctl.sh: ./opmnctl: /sw/weblogic/as_1/perl/bin/perl: bad interpreter: Permission denied
[root@retprdapp01a plugins]#
Nagios machine:
[root@monprdmgtss03 tmp]# /usr/bin/sudo -u appwls -i /usr/lib64/nagios/plugins/check_opmnctl.sh ohs1
sudo: unknown user: appwls
[root@monprdmgtss03 tmp]#
[root@monprdmgtss03 tmp]# /usr/bin/sudo -u nagios -i /usr/lib64/nagios/plugins/check_opmnctl.sh ohs1
/usr/lib64/nagios/plugins/check_opmnctl.sh: line 2: cd: /home/appaia: No such file or directory
/usr/lib64/nagios/plugins/check_opmnctl.sh: line 3: ./.bash_profile: No such file or directory
/usr/lib64/nagios/plugins/check_opmnctl.sh: line 4: ./SOA.env: No such file or directory
/usr/lib64/nagios/plugins/check_opmnctl.sh: line 5: ./OHS1.env: No such file or directory
/usr/lib64/nagios/plugins/check_opmnctl.sh: line 6: opmnctl: command not found
[root@monprdmgtss03 tmp]#
I could not capture anything in log file as the command itself is not running.
Please suggest!!
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Service not working
Can you share the content of the plugin /usr/lib64/nagios/plugins/check_opmnctl.sh