Page 1 of 1

Event Handlers strange behaviour

Posted: Mon May 13, 2013 8:07 am
by TSCAdmin
Hi,

We are using Nagios XI 2009R1.3 on CentOS release 5.4 (Final). On a bunch of hosts we want to notify the server owners if /tmp partition is over a threshold limit via e-mail. To achieve this I was implementing event handlers. This is how my service definition looks like:

Code: Select all

define service {
        host_name                       LinuxServer
        service_description             Disk Monitor /tmp
        use                             xiwizard_linuxserver_disk_service
        check_command                   check_snmp_storage_custom!community!20!50!/tmp
        max_check_attempts              5
        check_interval                  10
        retry_interval                  1
        event_handler                   event_handler_tmp_directory_listing
        event_handler_enabled           1
        flap_detection_enabled          0
        notification_options            w,u,r,c
        contacts                        linux-admin
        register                        1
        }
I have event_handler_tmp_directory_listing defined in the commands.cfg. These are first few lines of the event handler script:

Code: Select all

#!/bin/bash

# event handler script
# to get and e-mail the listing of /tmp directory

echo "$1 $2 $3 $4 $5" >> /usr/local/nagios/libexec/eventhandlers/inputs
For testing purpose I filled the /tmp directory on the monitored host (LinuxServer), the problem is that the event handler script is only called when the service returns to OK state. Here are the contents of inputs file:

Code: Select all

OK HARD 5 LinuxServer OK
OK SOFT 3 LinuxServer OK
OK SOFT 3 LinuxServer OK
OK HARD 5 LinuxServer OK
OK HARD 5 LinuxServer OK
I'm not sure why it is not being executed when the service goes in WARNING|CRITICAL state. Is there something missing?

Thanks

Re: Event Handlers strange behaviour

Posted: Mon May 13, 2013 11:08 am
by abrist
Does your event handler script include logic for the PROBLEM-STATE?
See the bottom of the following document:
http://nagios.sourceforge.net/docs/3_0/ ... dlers.html

Re: Event Handlers strange behaviour

Posted: Mon May 13, 2013 12:47 pm
by TSCAdmin
Hi,

Yes it does include the logic of problem state. Here is how the command has been defined:

Code: Select all

define command{
	command_name	event_handler_tmp_directory_listing
	command_line	/usr/local/nagios/libexec/eventhandlers/event_handler_tmp_directory_listing.sh $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ $HOSTNAME$ $SERVICEOUTPUT$
}
For some reasons, the event_handler is called only when the service returns to OK state, it does not execute if it enters WARNING or CRITICAL state. For the troubleshooting purpose I added these this line at the top of my script:

Code: Select all

#!/bin/bash

# event handler script
# to get and e-mail the listing of /tmp directory

[b]echo "$1 $2 $3 $4 $5" >> /usr/local/nagios/libexec/eventhandlers/inputs[/b]
now if my understanding is correct this will execute every time event handler is called. I was wondering if event_handler also require something like: "[w,u,c,r,f,s]" options to be executed?

In other words what conditions needs to meet for event handlers to execute? How can I ensure that event handler executed - logs or something?

Thanks

Re: Event Handlers strange behaviour

Posted: Mon May 13, 2013 4:55 pm
by abrist
Lets get some more information into the logs. Change the following line in the file /usr/local/nagios/etc/nagios.cfg:

Code: Select all

log_event_handlers=0
To:

Code: Select all

log_event_handlers=1
Restart Nagios:

Code: Select all

service nagios restart
Tail the /usr/local/nagios/var/nagios.log file for any line pertaining to event handlers and then force something into a failed state:

Code: Select all

tail -f /usr/local/nagios/var/nagios.log

Re: Event Handlers strange behaviour

Posted: Tue May 14, 2013 6:06 am
by TSCAdmin
Hi,

I enabled the log_event_handlers in nagios.cfg. Here are the complete details:

event handler definition:

Code: Select all

define command {
       command_name                             xi_tmp_dir_event_handler
       command_line                             /usr/local/nagios/libexec/eventhandlers/tmp_dir_event_handler.sh $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ $HOSTADDRESS$ $SERVICEOUTPUT$
}

event handler script:

Code: Select all

#!/bin/bash
now=$(date +%s)
echo "[$now] $1 $2 $3 $4 $5" >> /usr/local/nagios/libexec/eventhandlers/new
service definition:

Code: Select all

define service {
        host_name                       gb-doc-svb-0302
        service_description             Disk Monitor /home
        use                             xiwizard_linuxserver_disk_service
        check_command                   check_snmp_storage_custom!dhMonitor!40!70!/home
        max_check_attempts              3
        check_interval                  10
        retry_interval                  1
        check_period                    24x7
        event_handler                   xi_tmp_dir_event_handler
        event_handler_enabled           1
        flap_detection_enabled          0
        notification_interval           60
        notification_period             24x7
        notification_options            w,u,r,c
        contacts                        ashishkumar
        _xiwizard                       dh_linux_server
        register                        1
        }
Here are the results:

Problem detected, WARNING - SOFT state 1

Code: Select all

[1368528937] SERVICE ALERT: gb-doc-svb-0302;Disk Monitor /home;WARNING;SOFT;1;WARNING : /home: 48%used(4806MB/9919MB)  : > 40 %
[1368528937] GLOBAL SERVICE EVENT HANDLER: gb-doc-svb-0302;Disk Monitor /home;WARNING;SOFT;1;xi_service_event_handler
[1368528937] SERVICE EVENT HANDLER: gb-doc-svb-0302;Disk Monitor /home;WARNING;SOFT;1;xi_tmp_dir_event_handler
WARNING - SOFT state 2

Code: Select all

[1368528997] SERVICE ALERT: gb-doc-svb-0302;Disk Monitor /home;WARNING;SOFT;2;WARNING : /home: 48%used(4806MB/9919MB)  : > 40 %
[1368528997] GLOBAL SERVICE EVENT HANDLER: gb-doc-svb-0302;Disk Monitor /home;WARNING;SOFT;2;xi_service_event_handler
[1368528997] SERVICE EVENT HANDLER: gb-doc-svb-0302;Disk Monitor /home;WARNING;SOFT;2;xi_tmp_dir_event_handler
WARNING - HARD state

Code: Select all

[1368529057] SERVICE ALERT: gb-doc-svb-0302;Disk Monitor /home;WARNING;HARD;3;WARNING : /home: 48%used(4806MB/9919MB)  : > 40 %
[1368529057] SERVICE NOTIFICATION: ashishkumar;gb-doc-svb-0302;Disk Monitor /home;WARNING;notify-service-by-email;WARNING : /home: 48%used(4806MB/9919MB)  :  40 %
[1368529057] GLOBAL SERVICE EVENT HANDLER: gb-doc-svb-0302;Disk Monitor /home;WARNING;HARD;3;xi_service_event_handler
[1368529057] SERVICE EVENT HANDLER: gb-doc-svb-0302;Disk Monitor /home;WARNING;HARD;3;xi_tmp_dir_event_handler
OK state

Code: Select all

[1368529099] SERVICE ALERT: gb-doc-svb-0302;Disk Monitor /home;OK;HARD;3;OK :  : < 40 %
[1368529099] SERVICE NOTIFICATION: ashishkumar;gb-doc-svb-0302;Disk Monitor /home;OK;notify-service-by-email;OK :  :  40 %
[1368529099] GLOBAL SERVICE EVENT HANDLER: gb-doc-svb-0302;Disk Monitor /home;OK;HARD;3;xi_service_event_handler
[1368529099] SERVICE EVENT HANDLER: gb-doc-svb-0302;Disk Monitor /home;OK;HARD;3;xi_tmp_dir_event_handler
event handler file contents

Code: Select all

$ cat /usr/local/nagios/libexec/eventhandlers/new
[1368529099] OK HARD 3 gb-doc-svb-0302 OK
It seems event handler xi_tmp_dir_event_handler was called at every step but it actually executed and displayed results only when the service returned to OK state.

Please let me know if more information is required to investigate this further.

Thanks

Re: Event Handlers strange behaviour

Posted: Tue May 14, 2013 4:49 pm
by abrist
I am still digging on this one. What are you using the global event handler for? Could there be a conflict (writing to the same file, etc)?

Re: Event Handlers strange behaviour

Posted: Mon May 20, 2013 3:11 pm
by TSCAdmin
Hi,

I think I have cracked it! It was a mistake at my end, apologies for the trouble.

Everything was good and working except the CRITICAL/WARNING messages. The only catch was to quote the final argument, $SERVICEOUTPUT$", that were being passed to the event handler script and boooooooom!

Code: Select all

define command {
       command_name                             xi_tmp_dir_event_handler
       command_line                             /usr/local/nagios/libexec/eventhandlers/tmp_dir_event_handler.sh $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ $HOSTADDRESS$ "$SERVICEOUTPUT$"
}
Thanks.

Re: Event Handlers strange behaviour

Posted: Mon May 20, 2013 3:19 pm
by slansing
Ahh! So it must have been lopping off that output when it was unconstrained by quotes.. interesting, thanks for the heads up and the find!