Event Handler stopped working!

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
User avatar
Pitone_Maledetto
Posts: 69
Joined: Fri Jul 01, 2016 4:11 am
Location: Liverpool, United Kingdom

Event Handler stopped working!

Post by Pitone_Maledetto »

Hi all,
I had an event handler which worked fine until a couple of days ago and I can't understand why.
If anyone can help me I would be grateful.

This is the error:

Code: Select all

[1528896858] SERVICE ALERT: stag-hr-gal-gw-01;LService;CRITICAL;HARD;1;LService is stopped and can be resumed, trying to restart it
[1528896858] SERVICE EVENT HANDLER: stag-hr-gal-gw-01;LService;CRITICAL;HARD;1;resume-hr-LService
[1528896858] wproc: SERVICE EVENTHANDLER job 274 from worker Core Worker 727 is a non-check helper but exited with return code 2
[1528896858] wproc:   early_timeout=0; exited_ok=1; wait_status=512; error_code=0;
[1528896858] wproc:   stderr line 01: /usr/local/nagios/libexec/eventhandlers/resume-hr-LService: 14: /usr/local/nagios/libexec/eventhandlers/resume-hr-LService: $=CRITICAL: not found
[1528896858] wproc:   stderr line 02: /usr/local/nagios/libexec/eventhandlers/resume-hr-LService: 15: /usr/local/nagios/libexec/eventhandlers/resume-hr-:LService: $=HARD: not found
[1528896858] wproc:   stderr line 03: /usr/local/nagios/libexec/eventhandlers/resume-hr-LService: 16: /usr/local/nagios/libexec/eventhandlers/resume-hr-LService: $=1: not found
[1528896858] wproc:   stderr line 04: /usr/local/nagios/libexec/eventhandlers/resume-hr-LService: 17: /usr/local/nagios/libexec/eventhandlers/resume-hr-LService: $=10.101.22.76: not found
[1528896858] wproc:   stderr line 05: /usr/local/nagios/libexec/eventhandlers/resume-hr-LService: 18: /usr/local/nagios/libexec/eventhandlers/resume-hr-LService: $=$: not found
[1528896858] wproc:   stderr line 06: /usr/local/nagios/libexec/eventhandlers/resume-hr-LService: 53: /usr/local/nagios/libexec/eventhandlers/resume-hr-LService: Syntax error: end of file unexpected (expecting ";;")
it is as if it can't get the values of the macros passed via event-handlers-commands.cfg

Code: Select all

define command{
   command_name    resume-hr-LService
   command_line    /usr/local/nagios/libexec/eventhandlers/resume-hr-LService  $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ $HOSTADDRESS$ $ISVALIDETIME$
   }
This is my event handler script:

Code: Select all

#!/bin/sh

    $SERVICESTATE$=$1
    $SERVICESTATETYPE$=$2
    $SERVICEATTEMPT$=$3
    $HOSTADDRESS$=$4
    $ISVALIDETIME$=$5

# What state is the service in?
case "$1" in
OK)
    # The service just came back up, so don't do anything...
    ;;

WARNING)
    # Warning don't do anything...
    ;;

CRITICAL)
    # Is this a "soft" or a "hard" state?
    case "$2" in

    HARD)

        case "$3" in
        1)
            if [ $5:hr-s3bt-nm$ = 1 ] ; then
            # Trying to resume LService.
            /usr/local/nagios/libexec/check_by_ssh -t 45 -H $4 -l nagios -C "sudo /usr/local/bin/start_ls.sh"
            fi
            ;;
            esac
        ;;

UNKNOWN)
    # We don't know what might be causing an unknown error, so don't do anything...
    ;;

esac
exit 0
Could anyone please point me to the right direction?
Thank you
"It is impossible to work in information technology without also engaging in social engineering"
Jaron Lanier
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Event Handler stopped working!

Post by scottwilkerson »

I think some of these lines at the top are supposed to be commented out
try this

Code: Select all

#!/bin/sh

#    $SERVICESTATE$=$1
#    $SERVICESTATETYPE$=$2
#   $SERVICEATTEMPT$=$3
#    $HOSTADDRESS$=$4
#    $ISVALIDETIME$=$5

# What state is the service in?
case "$1" in
OK)
    # The service just came back up, so don't do anything...
    ;;

WARNING)
    # Warning don't do anything...
    ;;

CRITICAL)
    # Is this a "soft" or a "hard" state?
    case "$2" in

    HARD)

        case "$3" in
        1)
            if [ $5:hr-s3bt-nm$ = 1 ] ; then
            # Trying to resume LService.
            /usr/local/nagios/libexec/check_by_ssh -t 45 -H $4 -l nagios -C "sudo /usr/local/bin/start_ls.sh"
            fi
            ;;
            esac
        ;;

UNKNOWN)
    # We don't know what might be causing an unknown error, so don't do anything...
    ;;

esac
exit 0
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
User avatar
Pitone_Maledetto
Posts: 69
Joined: Fri Jul 01, 2016 4:11 am
Location: Liverpool, United Kingdom

Re: Event Handler stopped working!

Post by Pitone_Maledetto »

Hi Scott,

so passing $SERVICESTATE$ like so?

Code: Select all

# What state is the service in?
case "$SERVICESTATE$" in
OK)
    # The service just came back up, so don't do anything...
    ;;
would not that achieve the same?
"It is impossible to work in information technology without also engaging in social engineering"
Jaron Lanier
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Event Handler stopped working!

Post by scottwilkerson »

No when you put it in the command like this

Code: Select all

define command{
   command_name    resume-hr-LService
   command_line    /usr/local/nagios/libexec/eventhandlers/resume-hr-LService  $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ $HOSTADDRESS$ $ISVALIDETIME$
   }
it is the first argument and becomes $1

It should run just as I proposed above
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
User avatar
Pitone_Maledetto
Posts: 69
Joined: Fri Jul 01, 2016 4:11 am
Location: Liverpool, United Kingdom

Re: Event Handler stopped working!

Post by Pitone_Maledetto »

Hi Scott,
I see what you mean! Just to say that I am using $ISVALIDTIME$ in order to run the event handler just during the night.

now I have the following:

Code: Select all

[1528903309] SERVICE EVENT HANDLER: stag-hr-gal-gw-01;LService;CRITICAL;HARD;1;resume-hr-LService
[1528903309] wproc: SERVICE EVENTHANDLER job 50 from worker Core Worker 18693 is a non-check helper but exited with return code 2
[1528903309] wproc:   early_timeout=0; exited_ok=1; wait_status=512; error_code=0;
[1528903309] wproc:   stderr line 01: /usr/local/nagios/libexec/eventhandlers/resume-hr-LService: 53: /usr/local/nagios/libexec/eventhandlers/resume-hr-LService: Syntax error: end of file unexpected (expecting ";;")

and changed the event-handlers-commands.cfg:

Code: Select all

define command{
   command_name    resume-hr-LService
   command_line    /usr/local/nagios/libexec/eventhandlers/resume-hr-LService  $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ $HOSTADDRESS$ $ISVALIDTIME:hr-s3bt-nm$
   }
It seems that I can't pass the $ISVALIDTIME$ macro correctly.

Thanks
"It is impossible to work in information technology without also engaging in social engineering"
Jaron Lanier
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Event Handler stopped working!

Post by scottwilkerson »

id hr-s3bt-nm the name of a nagios timeperiod?

Also, in the script I didn't see this before

Code: Select all

if [ $5:hr-s3bt-nm$ = 1 ] ; then
you would need to change this to just $5 you can't add them together like this

Either way it seems like the script needs some debugging, trying to make sure it runs correctly if you just pass values where the macros go
before trying to have nagios execute it.
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
User avatar
Pitone_Maledetto
Posts: 69
Joined: Fri Jul 01, 2016 4:11 am
Location: Liverpool, United Kingdom

Re: Event Handler stopped working!

Post by Pitone_Maledetto »

Hi Scott,
I have debugged the code and use the macro $ISVALIDTIME$ in the correct manner like so.

Code: Select all

#!/bin/sh

#   $SERVICESTATE$=$1
#   $SERVICESTATETYPE$=$2
#   $SERVICEATTEMPT$=$3
#   $HOSTADDRESS$=$4
#   $ISVALIDTIME:hr-s3bt-nm$=$5

# What state is the service in?
case "$1" in
OK)
    # The service just came back up, so don't do anything...
    ;;

WARNING)
    # Warning don't do anything...
    ;;

CRITICAL)
    # Is this a "soft" or a "hard" state?
    case "$2" in

    HARD)

        case "$3" in
        1)
            if [ $5 = 1 ] ; then
            # Trying to resume LService.
            /usr/local/nagios/libexec/check_by_ssh -t 120 -H $4 -l nagios -C "sudo /usr/local/bin/start_LService.sh"
            fi
            ;;
        esac
        ;;
    esac
    ;;

UNKNOWN)
    # We don't know what might be causing an unknown error, so don't do anything...
    ;;

esac
exit 0
and the command like so:

Code: Select all

define command{
   command_name    resume-hr-LService
   command_line    /usr/local/nagios/libexec/eventhandlers/resume-hr-LService  $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ $HOSTADDRESS$ $ISVALIDTIME:hr-s3bt-nm$
   }
It seems much healthier and I don't have the previous errors in the log any more.
I am waiting to see if all works fine when the service goes down during the night.
Thank you very much for your help.
Please feel free to add anything that could be useful for the use of the event handler using a time-period.
Best Regards
"It is impossible to work in information technology without also engaging in social engineering"
Jaron Lanier
User avatar
Pitone_Maledetto
Posts: 69
Joined: Fri Jul 01, 2016 4:11 am
Location: Liverpool, United Kingdom

Re: Event Handler stopped working!

Post by Pitone_Maledetto »

Well now it ran but I get:

Code: Select all

Warning: Service event handler command '/usr/local/nagios/libexec/eventhandlers/resume-hr-LService  CRITICAL HARD 1 10.101.22.76 0' timed out after 0.00 seconds
not sure what that means...but it did not work as expected.

from nagios.cfg

Code: Select all

event_handler_timeout=65
Thanks
"It is impossible to work in information technology without also engaging in social engineering"
Jaron Lanier
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Event Handler stopped working!

Post by scottwilkerson »

It means the event handler (your script) didn't return.

you should be able to test it from the commnad line with the args it gave you

Code: Select all

sudo su nagios -c '/usr/local/nagios/libexec/eventhandlers/resume-hr-LService  CRITICAL HARD 1 10.101.22.76 0'
One thing to note, I see it is marking this as an invalid time 0
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
User avatar
Pitone_Maledetto
Posts: 69
Joined: Fri Jul 01, 2016 4:11 am
Location: Liverpool, United Kingdom

Re: Event Handler stopped working!

Post by Pitone_Maledetto »

Hi Scott,
Thank you...yes I changed the script to accept 0 since I wanted to test during working hours.
Anyway it seems fine and it successfully ran although I need to figuring out how to have it to write in the log when running as an event handler bash wrapper; when manually ran as you suggested the log gets written but not otherwise.
Thank you again for your time and help.
I think we can close this request.
Have a good day!
Regards
"It is impossible to work in information technology without also engaging in social engineering"
Jaron Lanier
Locked