Event Handler stopped working!

An open discussion forum for obtaining help with Nagios Core. Nagios Core users of all experience levels are welcome here. Subforum have been created for the discussion of Nagios Core and Nagios Plugin development.

NOTE: The SourceForge.net mailing lists have been deprecated in favor of this forum in order to expedite support and provide additional features not available on the old mailing list.

Event Handler stopped working!

Postby Pitone_Maledetto » Wed Jun 13, 2018 8:47 am

Hi all,
I had an event handler which worked fine until a couple of days ago and I can't understand why.
If anyone can help me I would be grateful.

This is the error:
Code: Select all
[1528896858] SERVICE ALERT: stag-hr-gal-gw-01;LService;CRITICAL;HARD;1;LService is stopped and can be resumed, trying to restart it
[1528896858] SERVICE EVENT HANDLER: stag-hr-gal-gw-01;LService;CRITICAL;HARD;1;resume-hr-LService
[1528896858] wproc: SERVICE EVENTHANDLER job 274 from worker Core Worker 727 is a non-check helper but exited with return code 2
[1528896858] wproc:   early_timeout=0; exited_ok=1; wait_status=512; error_code=0;
[1528896858] wproc:   stderr line 01: /usr/local/nagios/libexec/eventhandlers/resume-hr-LService: 14: /usr/local/nagios/libexec/eventhandlers/resume-hr-LService: $=CRITICAL: not found
[1528896858] wproc:   stderr line 02: /usr/local/nagios/libexec/eventhandlers/resume-hr-LService: 15: /usr/local/nagios/libexec/eventhandlers/resume-hr-:LService: $=HARD: not found
[1528896858] wproc:   stderr line 03: /usr/local/nagios/libexec/eventhandlers/resume-hr-LService: 16: /usr/local/nagios/libexec/eventhandlers/resume-hr-LService: $=1: not found
[1528896858] wproc:   stderr line 04: /usr/local/nagios/libexec/eventhandlers/resume-hr-LService: 17: /usr/local/nagios/libexec/eventhandlers/resume-hr-LService: $=10.101.22.76: not found
[1528896858] wproc:   stderr line 05: /usr/local/nagios/libexec/eventhandlers/resume-hr-LService: 18: /usr/local/nagios/libexec/eventhandlers/resume-hr-LService: $=$: not found
[1528896858] wproc:   stderr line 06: /usr/local/nagios/libexec/eventhandlers/resume-hr-LService: 53: /usr/local/nagios/libexec/eventhandlers/resume-hr-LService: Syntax error: end of file unexpected (expecting ";;")


it is as if it can't get the values of the macros passed via event-handlers-commands.cfg
Code: Select all
define command{
   command_name    resume-hr-LService
   command_line    /usr/local/nagios/libexec/eventhandlers/resume-hr-LService  $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ $HOSTADDRESS$ $ISVALIDETIME$
   }


This is my event handler script:
Code: Select all
#!/bin/sh

    $SERVICESTATE$=$1
    $SERVICESTATETYPE$=$2
    $SERVICEATTEMPT$=$3
    $HOSTADDRESS$=$4
    $ISVALIDETIME$=$5

# What state is the service in?
case "$1" in
OK)
    # The service just came back up, so don't do anything...
    ;;

WARNING)
    # Warning don't do anything...
    ;;

CRITICAL)
    # Is this a "soft" or a "hard" state?
    case "$2" in

    HARD)

        case "$3" in
        1)
            if [ $5:hr-s3bt-nm$ = 1 ] ; then
            # Trying to resume LService.
            /usr/local/nagios/libexec/check_by_ssh -t 45 -H $4 -l nagios -C "sudo /usr/local/bin/start_ls.sh"
            fi
            ;;
            esac
        ;;

UNKNOWN)
    # We don't know what might be causing an unknown error, so don't do anything...
    ;;

esac
exit 0


Could anyone please point me to the right direction?
Thank you
"It is impossible to work in information technology without also engaging in social engineering"
Jaron Lanier
User avatar
Pitone_Maledetto
 
Posts: 62
Joined: Fri Jul 01, 2016 4:11 am
Location: Liverpool, United Kingdom

Re: Event Handler stopped working!

Postby scottwilkerson » Wed Jun 13, 2018 9:53 am

I think some of these lines at the top are supposed to be commented out
try this
Code: Select all
#!/bin/sh

#    $SERVICESTATE$=$1
#    $SERVICESTATETYPE$=$2
#   $SERVICEATTEMPT$=$3
#    $HOSTADDRESS$=$4
#    $ISVALIDETIME$=$5

# What state is the service in?
case "$1" in
OK)
    # The service just came back up, so don't do anything...
    ;;

WARNING)
    # Warning don't do anything...
    ;;

CRITICAL)
    # Is this a "soft" or a "hard" state?
    case "$2" in

    HARD)

        case "$3" in
        1)
            if [ $5:hr-s3bt-nm$ = 1 ] ; then
            # Trying to resume LService.
            /usr/local/nagios/libexec/check_by_ssh -t 45 -H $4 -l nagios -C "sudo /usr/local/bin/start_ls.sh"
            fi
            ;;
            esac
        ;;

UNKNOWN)
    # We don't know what might be causing an unknown error, so don't do anything...
    ;;

esac
exit 0
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
scottwilkerson
DevOps Engineer
 
Posts: 12056
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Event Handler stopped working!

Postby Pitone_Maledetto » Wed Jun 13, 2018 10:15 am

Hi Scott,

so passing $SERVICESTATE$ like so?

Code: Select all
# What state is the service in?
case "$SERVICESTATE$" in
OK)
    # The service just came back up, so don't do anything...
    ;;


would not that achieve the same?
"It is impossible to work in information technology without also engaging in social engineering"
Jaron Lanier
User avatar
Pitone_Maledetto
 
Posts: 62
Joined: Fri Jul 01, 2016 4:11 am
Location: Liverpool, United Kingdom

Re: Event Handler stopped working!

Postby scottwilkerson » Wed Jun 13, 2018 10:28 am

No when you put it in the command like this
Code: Select all
define command{
   command_name    resume-hr-LService
   command_line    /usr/local/nagios/libexec/eventhandlers/resume-hr-LService  $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ $HOSTADDRESS$ $ISVALIDETIME$
   }

it is the first argument and becomes $1

It should run just as I proposed above
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
scottwilkerson
DevOps Engineer
 
Posts: 12056
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Event Handler stopped working!

Postby Pitone_Maledetto » Wed Jun 13, 2018 10:29 am

Hi Scott,
I see what you mean! Just to say that I am using $ISVALIDTIME$ in order to run the event handler just during the night.

now I have the following:

Code: Select all
[1528903309] SERVICE EVENT HANDLER: stag-hr-gal-gw-01;LService;CRITICAL;HARD;1;resume-hr-LService
[1528903309] wproc: SERVICE EVENTHANDLER job 50 from worker Core Worker 18693 is a non-check helper but exited with return code 2
[1528903309] wproc:   early_timeout=0; exited_ok=1; wait_status=512; error_code=0;
[1528903309] wproc:   stderr line 01: /usr/local/nagios/libexec/eventhandlers/resume-hr-LService: 53: /usr/local/nagios/libexec/eventhandlers/resume-hr-LService: Syntax error: end of file unexpected (expecting ";;")



and changed the event-handlers-commands.cfg:

Code: Select all
define command{
   command_name    resume-hr-LService
   command_line    /usr/local/nagios/libexec/eventhandlers/resume-hr-LService  $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ $HOSTADDRESS$ $ISVALIDTIME:hr-s3bt-nm$
   }

It seems that I can't pass the $ISVALIDTIME$ macro correctly.

Thanks
"It is impossible to work in information technology without also engaging in social engineering"
Jaron Lanier
User avatar
Pitone_Maledetto
 
Posts: 62
Joined: Fri Jul 01, 2016 4:11 am
Location: Liverpool, United Kingdom

Re: Event Handler stopped working!

Postby scottwilkerson » Wed Jun 13, 2018 11:44 am

id hr-s3bt-nm the name of a nagios timeperiod?

Also, in the script I didn't see this before
Code: Select all
if [ $5:hr-s3bt-nm$ = 1 ] ; then

you would need to change this to just $5 you can't add them together like this

Either way it seems like the script needs some debugging, trying to make sure it runs correctly if you just pass values where the macros go
before trying to have nagios execute it.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
scottwilkerson
DevOps Engineer
 
Posts: 12056
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Event Handler stopped working!

Postby Pitone_Maledetto » Thu Jun 14, 2018 2:42 am

Hi Scott,
I have debugged the code and use the macro $ISVALIDTIME$ in the correct manner like so.

Code: Select all
#!/bin/sh

#   $SERVICESTATE$=$1
#   $SERVICESTATETYPE$=$2
#   $SERVICEATTEMPT$=$3
#   $HOSTADDRESS$=$4
#   $ISVALIDTIME:hr-s3bt-nm$=$5

# What state is the service in?
case "$1" in
OK)
    # The service just came back up, so don't do anything...
    ;;

WARNING)
    # Warning don't do anything...
    ;;

CRITICAL)
    # Is this a "soft" or a "hard" state?
    case "$2" in

    HARD)

        case "$3" in
        1)
            if [ $5 = 1 ] ; then
            # Trying to resume LService.
            /usr/local/nagios/libexec/check_by_ssh -t 120 -H $4 -l nagios -C "sudo /usr/local/bin/start_LService.sh"
            fi
            ;;
        esac
        ;;
    esac
    ;;

UNKNOWN)
    # We don't know what might be causing an unknown error, so don't do anything...
    ;;

esac
exit 0


and the command like so:

Code: Select all
define command{
   command_name    resume-hr-LService
   command_line    /usr/local/nagios/libexec/eventhandlers/resume-hr-LService  $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ $HOSTADDRESS$ $ISVALIDTIME:hr-s3bt-nm$
   }


It seems much healthier and I don't have the previous errors in the log any more.
I am waiting to see if all works fine when the service goes down during the night.
Thank you very much for your help.
Please feel free to add anything that could be useful for the use of the event handler using a time-period.
Best Regards
"It is impossible to work in information technology without also engaging in social engineering"
Jaron Lanier
User avatar
Pitone_Maledetto
 
Posts: 62
Joined: Fri Jul 01, 2016 4:11 am
Location: Liverpool, United Kingdom

Re: Event Handler stopped working!

Postby Pitone_Maledetto » Thu Jun 14, 2018 4:03 am

Well now it ran but I get:

Code: Select all
Warning: Service event handler command '/usr/local/nagios/libexec/eventhandlers/resume-hr-LService  CRITICAL HARD 1 10.101.22.76 0' timed out after 0.00 seconds


not sure what that means...but it did not work as expected.

from nagios.cfg

Code: Select all
event_handler_timeout=65


Thanks
"It is impossible to work in information technology without also engaging in social engineering"
Jaron Lanier
User avatar
Pitone_Maledetto
 
Posts: 62
Joined: Fri Jul 01, 2016 4:11 am
Location: Liverpool, United Kingdom

Re: Event Handler stopped working!

Postby scottwilkerson » Thu Jun 14, 2018 8:48 am

It means the event handler (your script) didn't return.

you should be able to test it from the commnad line with the args it gave you
Code: Select all
sudo su nagios -c '/usr/local/nagios/libexec/eventhandlers/resume-hr-LService  CRITICAL HARD 1 10.101.22.76 0'


One thing to note, I see it is marking this as an invalid time 0
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
scottwilkerson
DevOps Engineer
 
Posts: 12056
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Event Handler stopped working!

Postby Pitone_Maledetto » Thu Jun 14, 2018 11:07 am

Hi Scott,
Thank you...yes I changed the script to accept 0 since I wanted to test during working hours.
Anyway it seems fine and it successfully ran although I need to figuring out how to have it to write in the log when running as an event handler bash wrapper; when manually ran as you suggested the log gets written but not otherwise.
Thank you again for your time and help.
I think we can close this request.
Have a good day!
Regards
"It is impossible to work in information technology without also engaging in social engineering"
Jaron Lanier
User avatar
Pitone_Maledetto
 
Posts: 62
Joined: Fri Jul 01, 2016 4:11 am
Location: Liverpool, United Kingdom

Next

Return to Nagios Core

Who is online

Users browsing this forum: Google [Bot] and 11 guests