Page 1 of 1

Event Handler stopped working! Part 3

Posted: Thu Jul 26, 2018 3:09 am
by Pitone_Maledetto
This episode is called "at wit's end" :)

Thanks to @scottwilkerson I managed to get more knowledge on how the timeperiods work, however last night and once before on the 23rd I was back to square one unfortunately.

This is the timeperiod proposed:

Code: Select all

define timeperiod{
        timeperiod_name     hr-s3bt-nm
        alias               LService
        sunday              22:00-24:00
        monday              00:00-06:00
        monday              22:00-24:00
        tuesday             00:00-06:00
        tuesday             22:00-24:00
        wednesday           00:00-06:00
        wednesday           22:00-24:00
        thursday            00:00-06:00
        thursday            22:00-24:00
        friday              00:00-06:00
        friday              22:00-24:00
        saturday            00:00-06:00
        saturday            22:00-24:00
        sunday              00:00-06:00
        }
But last night at around ten to one in the morning I was called and when looked at my custom logs I found the following entry:

Code: Select all

[07-25-2018-21:06:29] - LService can't be resumed during working hours. 0
IO manger pid: 49773
IO manger pid: 56727
[07-26-2018-00:46:29] - LService can't be resumed during working hours. 0
The 0 at the end is the value that I captured assigned to $ISVALIDTIME:hr-s3bt-nm$ at that moment and how you can notice it is correct at 07-25-2018-21:06:29 but not at 07-26-2018-00:46:29.
So I am not sure what's going on and why the macro gets the wrong exit code assigned.
If anyone could help me with this it would be greatly appreciated.

p.s. the issue seems (maybe) to be with the 00:00-06:00 timeperiod entry since I noticed that the service was resumed a couple of times when it was down within the 22:00-24:00 one.

Thank you.

Re: Event Handler stopped working! Part 3

Posted: Thu Jul 26, 2018 10:00 am
by scottwilkerson
Can you share your event handler command as well as the script the event handler is running?

Re: Event Handler stopped working! Part 3

Posted: Thu Jul 26, 2018 10:46 am
by Pitone_Maledetto
Hi Scott,

This is the script in resume-hr-lservice:

Code: Select all

#   $SERVICESTATE$=$1
#   $SERVICESTATETYPE$=$2
#   $SERVICEATTEMPT$=$3
#   $HOSTADDRESS$=$4
#   $ISVALIDTIME:hr-s3bt-nm$=$5

NOW=$(date +"%m-%d-%Y-%T")
TIMESTAMP=$(date +%s)

# What state is the service in?
case "$1" in
OK)
    # The service just came back up, so don't do anything...
    ;;

WARNING)
    # Warning statuses are triggered if LService can't be resumed because for example it has been manually stopped.
    ;;

CRITICAL)
    # Is this a "soft" or a "hard" state?
    case "$2" in

    HARD)
        case "$3" in
        1)
            if [ $5 = 1 ] ; then
            # Trying to resume LService.
            /usr/local/nagios/libexec/check_by_ssh -t 120 -H $4 -l nagios -C "sudo /usr/local/bin/start_LService.sh" >> /tmp/resume.txt 2>&1
            elif [ $5 = 0 ] ; then
            # Timeperiod is invalid.
            echo "[$NOW] - LService can't be resumed during working hours. $5" >> /tmp/resume.txt
            fi
            ;;
        esac
        ;;
    esac
    ;;

UNKNOWN)
    # We don't know what might be causing an unknown error, so don't do anything...
    ;;

esac
exit 0
This is the command:

Code: Select all

define command{
   command_name    resume-hr-lservice
   command_line    /usr/local/nagios/libexec/eventhandlers/resume-hr-lservice  $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$ $HOSTADDRESS$ $ISVALIDTIME:hr-s3bt-nm$
   }
Thank you

Re: Event Handler stopped working! Part 3

Posted: Thu Jul 26, 2018 11:03 am
by Pitone_Maledetto
In the meantime I have changed the timeperiod to:

Code: Select all

define timeperiod{
    timeperiod_name    hr-s3bt-nm
    alias              LService
    sunday             06:00-22:00
    monday             06:00-22:00
    tuesday            06:00-22:00
    wednesday          06:00-22:00
    thursday           06:00-22:00
    friday             06:00-22:00
    saturday           06:00-22:00
    }
And changed the:

Code: Select all

if [ $5 = 0 ]
To resume. If 1 don't.
So the inverse with a different timeperiod just to test the theory.
regards

Re: Event Handler stopped working! Part 3

Posted: Thu Jul 26, 2018 1:42 pm
by scottwilkerson
Pitone_Maledetto wrote:In the meantime I have changed the timeperiod to:

Code: Select all

define timeperiod{
    timeperiod_name    hr-s3bt-nm
    alias              LService
    sunday             06:00-22:00
    monday             06:00-22:00
    tuesday            06:00-22:00
    wednesday          06:00-22:00
    thursday           06:00-22:00
    friday             06:00-22:00
    saturday           06:00-22:00
    }
And changed the:

Code: Select all

if [ $5 = 0 ]
To resume. If 1 don't.
So the inverse with a different timeperiod just to test the theory.
regards
Actually I think this solution looks much more elegant than even what I proposed.

Re: Event Handler stopped working! Part 3

Posted: Mon Aug 06, 2018 9:29 am
by Pitone_Maledetto
Hi @scottwilkerson,
It all seems to be fine now.
It would be interesting to see why the original solution with the long timeperiod did not work as expected.
Anyhow I am happy to close this thread.
Thank you very much for your help.
Regards

Re: Event Handler stopped working! Part 3

Posted: Mon Aug 06, 2018 11:00 am
by scottwilkerson
Great Locking.

I actually am not sure why the other timeperiod I suggested didn't work