[SOLVED] Can I limit the execution time of my event handler

An open discussion forum for obtaining help with Nagios Core. Nagios Core users of all experience levels are welcome here. Subforum have been created for the discussion of Nagios Core and Nagios Plugin development.

NOTE: The SourceForge.net mailing lists have been deprecated in favor of this forum in order to expedite support and provide additional features not available on the old mailing list.

[SOLVED] Can I limit the execution time of my event handler

Postby dennisg » Fri Feb 09, 2018 6:50 am

Hi swarm intelligence, I need your help:

Setup:
I have an important Web-Service that is supposed to run 24x7 and thus being monitored 24x7.
Code: Select all
check_interval                  2
retry_interval                  1
max_check_attempts              3
notification_interval           30
notification_period             24x7

An event handler has been installed and restarts the App-Server (tomcat) as expected. Everyhting is running fine so far.

Situation:
From time to time, long running tasks are being performed, especially at nights (Backups, scheduled import-jobs, cleanup-jobs, etc.).
These tasks sometimes make the App-Server respond "slow" (i.e. not within the time, configured in Nagios), thus leading to a CRITICAL state, which is then (correctly) being dealt by the event handler, which kicks the App-Server and thus breaks any running jobs...

My first idea was to create a new timeperiod (called "scheduled-tasks", mon. - sun. from 00:00 - 05:00 hrs) and to enhance the event handler to take care of this by means of the macro $ISVALIDTIME.
My script, which is based on the default script from Nagios, (and has already successfully been enhanced to take care of scheduled downtimes, etc.) correctly takes care of the timeperiod and instead of restarting Tomcat it just logs, that it has detected an issue with the service. All fine again, BUT:

Issue:
Since the the service is in a CRITICAL HARD state and sometimes never really leaves this state, the service is also not being restartet, when the specified "scheduled-tasks"-timeperiod runs off (i.e. @ 0500hrs) and remains faulty until a manual restart.

I'm looking for a smart way to work-around this issue, and this is, where you can join in :)
How can I achieve my goal to keep on checking the service 24x7 but just ignoring a faulty state during the specified off-hours and yet use an automatic restart (through event handler) after this timeperiod without manual interaction?

Approach:
My current thoughts are, to inject an external command at that part of the script that just logs an error instead od restarting the service, such as PROCESS_SERVICE_CHECK_RESULT, and just re-setting the state (back to "0"=OK), but I'm not sure if there isn't a better / smarter way to handle the situation. My attempt looks a bit "hackish" to me... :-?

I hope, I made myself (somewhat) clear. If you need any more information pls. don't hesitate to let me know.
Many thanks in advance for brainstorming with me on this issue :)

cheers,
Dennis
Last edited by dennisg on Mon Feb 12, 2018 9:14 am, edited 2 times in total.
dennisg
 
Posts: 14
Joined: Wed May 31, 2017 7:28 am

Re: Can I limit the execution time of my event handler

Postby npolovenko » Fri Feb 09, 2018 3:10 pm

Hello, @dennisg.
Can you share the event handler script with us? What if you replace theservice tomcat restart command with:
Code: Select all
1. service tomcat stop
2. service tomcat start

Essentially the same thing but it'll be able to start the tomcat even when it's completely off. Also, what command are you using to check the tomcat service, can you upload it? You might be able to just increase the timeout value.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
npolovenko
Support Tech
 
Posts: 1833
Joined: Mon May 15, 2017 5:00 pm

Re: Can I limit the execution time of my event handler

Postby dennisg » Sat Feb 10, 2018 3:06 am

Hi @npolovenko,

thanks for your reply.

Can you share the event handler script with us

Not that easily as it contians "internal information", which I first would have to clear off. Again: It's basically the script from Nagios that I had linked previously, enhanced to send mails and to take care of scheduled downtimes.

The thing is: During this period of scheduled tasks I don't want the tomcat to be restarted, so changing from service tomcat restart to stop and start in dedicated commands wouldn't be the solution I've been looking for.

Also, what command are you using to check the tomcat service, can you upload it? You might be able to just increase the timeout value.

It's basically a check_http. Increasing the timeout is also not an option as this would affect the behaviour all around the clock (and make an automatic restart being executed too few at other times, e.g. office hours.

Another solution came into my mind, which I will be trying out on monday: Injecting a service command SCHEDULE_SVC_DOWNTIME via cron, that defines a scheduled downtime for the service each night between 00:00 - 05:30.
Advantages:
* No need for a specific timeperiod
* No need to change the event handler
* No impact on other services (being checked with the same command)
* somewhat documented, as I can include a service comment as well.

What do you think?

Best regards,
Dennis
dennisg
 
Posts: 14
Joined: Wed May 31, 2017 7:28 am

Re: Can I limit the execution time of my event handler

Postby dennisg » Mon Feb 12, 2018 9:14 am

The "automatic scheduled downtime" seems to provide the expected result, so I'm gonna mark this thread as "solved".
dennisg
 
Posts: 14
Joined: Wed May 31, 2017 7:28 am

Re: [SOLVED] Can I limit the execution time of my event hand

Postby tmcdonald » Mon Feb 12, 2018 12:35 pm

Did you have any further (related) follow-up questions or are we good to lock this up?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
tmcdonald
Operations Engineer
 
Posts: 9114
Joined: Mon Sep 23, 2013 8:40 am

Re: [SOLVED] Can I limit the execution time of my event hand

Postby dennisg » Tue Feb 13, 2018 1:27 am

No, thanks. Pls. go ahead and lock it up.
dennisg
 
Posts: 14
Joined: Wed May 31, 2017 7:28 am


Return to Nagios Core

Who is online

Users browsing this forum: No registered users and 3 guests