Service not working
Posted: Mon Nov 02, 2020 12:45 pm
Hello Team,
We are monitoring some services/components status from nagios Core. If the component/service is not Alive (running), Nagios should send alert.
On few servers, we noticed that the service status is showing critical/down in Nagios Core but it is actually UP on remote machine. Other thing is, the command that we are using to check the service, it gives correct output on remote machine but gives critical output from Nagios command line.
We have checked all permissions, we have set the permission of script (the one that is running to check service status) to 777. Nagios has access to that path, there is no issue with the permission, still getting alert. Below is the script that is running:
[root@retprdapp01a plugins]# cat check_opmnctl
#!/bin/bash
LOG_FILE=/var/log/check_opmnctl.log
export USER_NAME=$1
export COMPONENT=$2
if [ -z "$1" -o -z "$2" ]
then
echo "usage: `basename $0` <username> <componentname>"
exit 1
fi
/usr/bin/sudo -u $USER_NAME -i /usr/lib64/nagios/plugins/check_opmnctl.sh $COMPONENT -- | grep -q Alive
if [ $? = 0 ]
then
echo "OK - component $COMPONENT is alive"
exit 0
else
echo "CRITICAL - component $COMPONENT is NOT alive"
exit 2
fi
Output from nagios server:
[root@monprdmgtss03 servers]# /usr/lib64/nagios/plugins/check_nrpe -H 10.50.10.25 -c check_opmnctl -a appwls ohs1
CRITICAL - component ohs1 is NOT alive
Output from remote server:
[root@retprdapp01a plugins]# /usr/lib64/nagios/plugins/check_opmnctl appwls ohs1
OK - component ohs1 is alive
We tried executing above command with nagios user as well, still getting same error.
Below is the command:
command[check_opmnctl]=/usr/lib64/nagios/plugins/check_opmnctl $ARG1$ $ARG2$
Below is the service definition:
define service{
use generic-service-basic
host_name retprdapp01a.mac-erp.net
service_description opmnctl ohs1
check_command check_opmnctl!appwls!ohs1
}
Let me know if you need any other information.
Thanks in advance!!
We are monitoring some services/components status from nagios Core. If the component/service is not Alive (running), Nagios should send alert.
On few servers, we noticed that the service status is showing critical/down in Nagios Core but it is actually UP on remote machine. Other thing is, the command that we are using to check the service, it gives correct output on remote machine but gives critical output from Nagios command line.
We have checked all permissions, we have set the permission of script (the one that is running to check service status) to 777. Nagios has access to that path, there is no issue with the permission, still getting alert. Below is the script that is running:
[root@retprdapp01a plugins]# cat check_opmnctl
#!/bin/bash
LOG_FILE=/var/log/check_opmnctl.log
export USER_NAME=$1
export COMPONENT=$2
if [ -z "$1" -o -z "$2" ]
then
echo "usage: `basename $0` <username> <componentname>"
exit 1
fi
/usr/bin/sudo -u $USER_NAME -i /usr/lib64/nagios/plugins/check_opmnctl.sh $COMPONENT -- | grep -q Alive
if [ $? = 0 ]
then
echo "OK - component $COMPONENT is alive"
exit 0
else
echo "CRITICAL - component $COMPONENT is NOT alive"
exit 2
fi
Output from nagios server:
[root@monprdmgtss03 servers]# /usr/lib64/nagios/plugins/check_nrpe -H 10.50.10.25 -c check_opmnctl -a appwls ohs1
CRITICAL - component ohs1 is NOT alive
Output from remote server:
[root@retprdapp01a plugins]# /usr/lib64/nagios/plugins/check_opmnctl appwls ohs1
OK - component ohs1 is alive
We tried executing above command with nagios user as well, still getting same error.
Below is the command:
command[check_opmnctl]=/usr/lib64/nagios/plugins/check_opmnctl $ARG1$ $ARG2$
Below is the service definition:
define service{
use generic-service-basic
host_name retprdapp01a.mac-erp.net
service_description opmnctl ohs1
check_command check_opmnctl!appwls!ohs1
}
Let me know if you need any other information.
Thanks in advance!!