I have a remote host with a service (it's named -smg) running on it. It is critical to receive alerts wether this service sending some (OUT) messages to another host.
The service has its own log file has to be checked for "OUT:" records every 1 minute.
I've installed check_timed_logs.pl from Nagios site and it works - checks for records every one minute, number of hits (OUT records) for previous minute.
Now I have an idea - let's say check if log file has OUT records for the last 2 minutes. If not - restart smg service on remote host (no notifications yet), check 2 minutes more. If still no OUT records in log file - send notifications for manual intervation.
Now I can see 2 possible ways to do that:
1. make a script running on remote host via cron (for checking status and restarting service, if needed) and use NSCA to send service status to nagios server when manual intervation needed. Nagios will send notifications.
2. on nagios server create service definition like this:
Code: Select all
define service{
host_name smg1
service_description check_smg_service_running
check_command check_smg
contact_groups admins
max_check_attempts 4
event_handler restart-smg-service
}
Code: Select all
define command{
command_name restart-smg-service
command_line /path/to/event-handler-script.sh $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ $HOSTADDRESS$ $HOSTDOWNTIME$ $SERVICEDOWNTIME$
}
Code: Select all
sudo /sbin/service smg restart
Actually I have 2 questions: 1. Which way would you recommend me to use?
2. Will 2-nd way work as described or not? will nagios run script on remote host or I have to use nrpe?
Thank you