Hi,
I have developed a set of PHP scripts to log critical hard alerts into our ITSM system and update the ticket when the service recovers or is acknowledged.
99% of the time, everything works as it should, however, I'm having a sparadic issues with the global event handler executing mulitple times and so causing mutliple tickets to be logged at the same time.
Looking through the logs from my scripts from the most recent occurance, it seems as if when the first hard event handler is triggered to log the call the script did not get a response in a reasonable time from our ITSM system and so the script did not fully complete and was hanging for a while, and then Nagios over 5 minutes attempted to re-trigger the global event handler for the same issue every 30 seconds.
21:09:29: HARD alert from Nagios XI for service A critical
Tries to log a call
21:10:30: HARD alert from Nagios XI for service A critical
Tries to log a call
21:11:31: HARD alert from Nagios XI for service A critical
Tries to log a call
21:12:32: HARD alert from Nagios XI for service A critical
Tries to log a call
21:13:34: HARD alert from Nagios XI for service A critical
Tries to log a call
21:14:35: HARD alert from Nagios XI for service A critical
Tries to log a call
21:16:34: Over the next 20 seconds, all the above alerts log a call
Just to help narrow this down and for me to try and put a fix in, can you confirm that if a global event handler executes a script which does not complete, Nagios will attempt to re-execute the event handler until it does?
Thanks.
Edit:
It looks like Nagios has an option (event_handler_timeout) which kills scripts which have been running for longer than 30 seconds (default), however, I don't believe this is working as the script is still carrying out various actions 5 minutes after initially invoked.
Global event hander - Executing mutliple times
Re: Global event hander - Executing mutliple times
What version of XI are you running?
I don't think that it will retry it over and over, but the retry_interval and max_check_attempts would like cause it to occur again sooner, I'll have to take a look at the configs.
Are you talking about the global even handlers in your /usr/local/nagios/etc/nagios.cfg OR are you talking about the Admin > Manage Components > Global Event Handlers?
Please PM me a copy of your profile, you can download it from Admin > System Profile > Download Profile.
- Also, include a screenshot of your settings in Admin > Manage Components > Global Event Handlers on each tab.
Thank you
I don't think that it will retry it over and over, but the retry_interval and max_check_attempts would like cause it to occur again sooner, I'll have to take a look at the configs.
Are you talking about the global even handlers in your /usr/local/nagios/etc/nagios.cfg OR are you talking about the Admin > Manage Components > Global Event Handlers?
Please PM me a copy of your profile, you can download it from Admin > System Profile > Download Profile.
- Also, include a screenshot of your settings in Admin > Manage Components > Global Event Handlers on each tab.
Thank you
Re: Global event hander - Executing mutliple times
Thanks @ssaxssax wrote:What version of XI are you running?
I don't think that it will retry it over and over, but the retry_interval and max_check_attempts would like cause it to occur again sooner, I'll have to take a look at the configs.
Are you talking about the global even handlers in your /usr/local/nagios/etc/nagios.cfg OR are you talking about the Admin > Manage Components > Global Event Handlers?
Please PM me a copy of your profile, you can download it from Admin > System Profile > Download Profile.
- Also, include a screenshot of your settings in Admin > Manage Components > Global Event Handlers on each tab.
Thank you
We are using 5.5.2 and this is in reference to Admin > Manage Components > Global Event Handlers.
I'll PM over the system profile and required screenshots.
I don't believe it is the retry_interval and max_check_attempts as the scripts are programmed to only handle critical hard alerts and disregard any thing else, and looking through Nagios the hard critical only occured once, but the event handler triggered mutliple times.
I've put a work around in at the moment which is to cause PHP curl to timeout after 20 seconds (opposed to 5min default) when connecting to our ITSM system and throw an exception which I catch and just quit the script.
Re: Global event hander - Executing mutliple times
You don't have multiple nagios processes running, correct? The command below should never output more than 2, if there's 3 or more you have a problem:
Code: Select all
ps aux | grep nagios.cfg | grep -v grep | wc -lRe: Global event hander - Executing mutliple times
I'm seeing this in your /usr/local/nagiosxi/var/eventman.log:
PHP Notice: Undefined index: contact in /usr/local/nagiosxi/html/includes/utils-notifications.inc.php on line 0
PHP Notice: Undefined variable: _SESSION in /usr/local/nagiosxi/html/includes/utils-users.inc.php on line 1859
PHP Fatal error: Call to a member function analystLogoff() on a non-object in /usr/local/nagios/eventhandlers/NagiosLogger/src/NagiosLogger/Nagios/Events/EventHandler.php on line 47
If you run this command and leave it running, do you see it calling it twice?
PHP Notice: Undefined index: contact in /usr/local/nagiosxi/html/includes/utils-notifications.inc.php on line 0
PHP Notice: Undefined variable: _SESSION in /usr/local/nagiosxi/html/includes/utils-users.inc.php on line 1859
PHP Fatal error: Call to a member function analystLogoff() on a non-object in /usr/local/nagios/eventhandlers/NagiosLogger/src/NagiosLogger/Nagios/Events/EventHandler.php on line 47
If you run this command and leave it running, do you see it calling it twice?
Code: Select all
tail -F /usr/local/nagiosxi/var/eventman.log