Page 1 of 1

Still getting alerts for deleted services

Posted: Thu May 27, 2021 12:50 pm
by uocken
We had a large number of similar services go into a Critical state (on purpose, there's maintenance occurring), and we went to put them in downtime, and the downtime state wouldn't stick, so we deactivated them temporarily. Alerts continued to come. We then mass acknowledged the critical services, and still got alerts. We then disabled notifications for these services, and still got alerts. We rebooted the entire server, still got alerts. Placed them in downtime again, this time it stuck, but still got alerts. The configuration of the services shows the acknowledgements, downtime, disable notifications, still getting alerts. In desperation, we gave up and deleted the services entirely, yet we're still getting alerts. We've killed and restarted all nagios and ndo2db services multiple times... no change. I checked the services cfg files for the affected servers, and the alerting services aren't in the cfg files.

At this point, I can only assume there's a backup of stale notifications -- though they all report current times somehow, but we don't know how to clear them. That, or whatever process handles alerting is using some orphaned info in the database, not the cfg files, to run these checks and continues to alert -- despite none of the GUI logs showing the alerts or notifications.

Any guidance would be appreciated.

Note: We are running version Nagios XI 5.6.3, we will soon be upgrading to the latest version.

Re: Still getting alerts for deleted services

Posted: Thu May 27, 2021 2:04 pm
by dchurch
Try clearing the event queue by logging into the Nagios XI machine as root and running these commands:

Code: Select all

service crond stop
service nagios stop
service ndo2db stop
mysql -u root -pnagiosxi nagiosxi <<< 'truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;'
service mysqld restart
service httpd restart
service ndo2db start
service nagios start
service crond start

Re: Still getting alerts for deleted services

Posted: Thu May 27, 2021 6:01 pm
by uocken
I'll give that a shot and report back the result, thanks!

Re: Still getting alerts for deleted services

Posted: Thu May 27, 2021 7:05 pm
by uocken
Sadly, this didn't help. Still getting alerts for services that no longer exist.

Re: Still getting alerts for deleted services

Posted: Fri May 28, 2021 11:38 am
by uocken
Okay, we found out more information. Apparently, our notifications were up to 20 hours behind. What's odd is that the notification uses a date/time value of when the notification was SENT, not when the alert was generated.

This seems like a major flaw.

This is the line in the notification template that shows the send time: Date/Time: %datetime%
Shouldn't %datetime% be the time the alert was generated, not when the notification was sent?

The system seems to be catching up as of this morning, since we are now about 1 hour behind.

How are the notifications being generated? Is there a repository where the old alerts can be purged so we can start getting alerts immediately again?

Thanks!

Re: Still getting alerts for deleted services

Posted: Fri May 28, 2021 12:52 pm
by dchurch
Short answer is that notifications are sent out using a Command. Core Config Manager (CCM) => Commands, and each contact is set up to use that command to send out the notification email.

Each has a notification setting, and then for each Contact getting an alert, the contact has a Notification Handler assigned from the list commands. This command receive host or service values (including free variables) and the contact information into the command and the ultimately queues up the email.

Here's an example of how to add a Free Variable to a notification template:

1. Set a Free Variable on the host called "mythingy"
2. Modify xi_host_notification_handler under CCM -> Commands, adding --mygroup="$_HOSTMYTHINGY$"
3. Open Admin -> Notification Management, and add a line Group Responsible: %mygroup%

Once you do this, you'll want to "deploy" the notification preferences to the contacts/users affected.

If your Host's Free Variable is called "location", you'll want to use --location="$_HOSTLOCATION$". When executed as a service notification handler, this comes from the host's variables for which the service is currently executing.

Note also that the Free Variable name and the "--[option]" DON'T need to be the same. E.g. You could use --whereami="$_HOSTLOCATION$" so long as you reference it in your notification template as %whereami%.

- More about free variables
- Using Free Variables in Notifications

Re: Still getting alerts for deleted services

Posted: Fri May 28, 2021 1:59 pm
by uocken
That wasn't really my question. I want to know these things:

1. if %datetime% is only when a notification is sent, what variable is when an alert is generated? Had this been used, we would have noticed the lag FAR sooner.

2. Since we're still an hour behind sending out notifications (i.e. an alert generated @ 10am isn't sent until around 10:50am, currently), hwo can we clear the stale alerts/notifications so we start getting immediate alert --> notifications?

Re: Still getting alerts for deleted services

Posted: Tue Jun 01, 2021 9:19 am
by dchurch
Checking the document, it doesn't look like there's a macro for the date and/or time when the alert went into the queue, only when the command was run to send it (i.e. the %datetime% macro like you pointed out).

I can submit a feature request on your behalf if you'd like. Please keep in mind that the decision to implement the enhancement is at the discretion of our development team.

Re: Still getting alerts for deleted services

Posted: Thu Jun 03, 2021 10:45 am
by uocken
That would be wonderful, thanks!

You can close this thread, our alerts did catch up after the backlog finally went through the system.

Re: Still getting alerts for deleted services

Posted: Thu Jun 03, 2021 2:53 pm
by dchurch
Glad to hear you resolved it! Locking thread.

If you have any additional issues, feel free to make a new thread.