Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
define service{
use local-service ; Name of service template to use
host_name XXXXX
service_description JVM status
check_command check_nrpe!check_jvm
check_interval 2 ; Actively check the host every 5 minutes
retry_interval 1 ; Schedule host check retries at 1 minute intervals
max_check_attempts 2 ; Check each Linux host 10 times (max time)
event_handler restart-jvm
event_handler_enabled 1
}
#script for restarting the web server on the local machine
#
# Note: This script will only restart the web server if the service is
# retried 3 times (in a "soft" state) or if the web service somehow
# manages to fall into a "hard" error state.
#
# What state is the HTTP service in?
case "$1" in
OK)
;;
WARNING)
;;
UNKNOWN)
;;
CRITICAL)
case "$2" in
SOFT)
case "$3" in
3)
echo -n "Restarting JVM (3rd soft critical state)..."
/etc/init.d/tomcatd recycle
;;
esac
;;
HARD)
echo -n "Restarting JVM..."
/etc/init.d/tomcatd recycle
;;
esac
;;
esac
exit 0
But the above script is not working.
In some links I see that we need to add nagios user to the sudoers file. Can you please let me know the alternative way for this.
Last edited by dwhitfield on Wed Jan 03, 2018 1:59 pm, edited 1 time in total.
Reason:code blocks FTW
Can you clarify what you mean by it's not working? Is it even running? Have you tried having the script do something else(like writing to a file as seen in this XI doc https://assets.nagios.com/downloads/nag ... ios-XI.pdf)? You don't have to edit sudoers, but you must make sure /etc/init.d/tomcatd can be executed by nagios when it's called. This can be achieved by setting different permissions on the file or making sure the user belongs to the correct group.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
define service{
use local-service ; Name of service template to use
host_name XXXX
service_description JVM status
check_command check_nrpe!check_jvm
check_interval 3 ; Actively check the host every 5 minutes
retry_interval 1 ; Schedule host check retries at 1 minute intervals
max_check_attempts 5 ; Check each Linux host 10 times (max)i
event_handler restart-jvm
}
OK)
# The service just came back up, so don't do anything...
;;
WARNING)
# We don't really care about warning states, since the service is probably still running...
;;
UNKNOWN)
# We don't know what might be causing an unknown error, so don't do anything...
;;
CRITICAL)
# perhaps we should restart the server...
# Is this a "soft" or a "hard" state?
case "$2" in
# We're in a "soft" state, meaning that Nagios is in the middle of retrying the
# check before it turns into a "hard" state and contacts get notified...
SOFT)
case "$3" in
3)
mail -s "Errors in the logs" sample@mail.com
echo -n "Restarting JVM (3rd soft critical state)..."
sudo -su wasadmin
/etc/init.d/tomcatd start
mail -s "Sampls" sample@mail.com
;;
esac
;;
HARD)
echo -n "Restarting JVM..."
sudo -su wasadmin
/etc/init.d/tomcatd start
;;
esac
;;
esac
exit 0
~
Whenever the server is down I am getting mails as per my script but it is not restarting the tomcat. can you please help me with this.
Last edited by dwhitfield on Wed Jan 17, 2018 5:57 pm, edited 1 time in total.
Reason:code blocks FTW
What are the permissions set to to on the script? Run ll /usr/local/nagios/etc/event_handlers/script.sh to get them and make sure there are execute permissions.
I modified the script slightly to write an entry to a log if the SOFT, HARD or OK state are seen. See below:
#script for restarting the web server on the local machine
#
# Note: This script will only restart the web server if the service is
# retried 3 times (in a "soft" state) or if the web service somehow
# manages to fall into a "hard" error state.
#
# What state is the HTTP service in?
case "$1" in
OK)
echo OK >> /tmp/test.txt
date >> /tmp/test.txt
;;
WARNING)
;;
UNKNOWN)
;;
CRITICAL)
case "$2" in
SOFT)
case "$3" in
3)
echo -n "Restarting JVM (3rd soft critical state)..."
echo SOFT >> /tmp/test.txt
date >> /tmp/test.txt
/etc/init.d/tomcatd recycle
;;
esac
;;
HARD)
echo -n "Restarting JVM..."
echo HARD >> /tmp/test.txt
date >> /tmp/test.txt
/etc/init.d/tomcatd recycle
;;
esac
;;
esac
exit 0
This seems to work. Note this requires a /tmp/test.txt:
touch /tmp/test.txt
chmod a+rw /tmp/test.txt
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
case "$1" in
OK)
# The service just came back up, so don't do anything...
;;
WARNING)
# We don't really care about warning states, since the service is probably still running...
;;
UNKNOWN)
# We don't know what might be causing an unknown error, so don't do anything...
;;
CRITICAL)
case "$2" in
SOFT)
case "$3" in
3)
echo -n "Restarting JVM (3rd soft critical state)..."
sudo -su wasadmin /etc/init.d/tomcatd start
;;
esac
;;
HARD)
echo -n "Restarting JVM..."
sudo -su wasadmin /etc/init.d/tomcatd start
;;
esac
;;
esac
exit 0
Last edited by dwhitfield on Mon Jan 29, 2018 2:08 pm, edited 1 time in total.
Reason:code blocks FTW
case "$1" in
OK)
# The service just came back up, so don't do anything...
;;
WARNING)
# We don't really care about warning states, since the service is probably still running...
;;
UNKNOWN)
# We don't know what might be causing an unknown error, so don't do anything...
;;
CRITICAL)
case "$2" in
SOFT)
case "$3" in
3)
sudo -su wasadmin etc/init.d/tomcatd start
;;
esac
;;
HARD)
sudo -su wasadmin /etc/init.d/tomcatd start
;;
esac
;;
esac
exit 0
ERROR:
SERVICE EVENT HANDLER:XXX;JVM status;CRITICAL;SOFT;1;restart-jvm!restart-jvm
[1518505203] wproc: SERVICE EVENTHANDLER job 17 from worker Core Worker 19975 is a non-check helper but exited with return code 3
[1518505203] wproc: early_timeout=0; exited_ok=1; wait_status=768; error_code=0;
[1518505203] wproc: stdout line 01: NRPE: Unable to read output
Last edited by Anonymous on Tue Feb 13, 2018 3:21 pm, edited 1 time in total.
Reason:code blocks
@anusha, I don't quite understand. You're checking the JVM using NRPE on the remote server, right? Then why is your event handler restarting tomcat locally? Here's a little example of how it should be: