NAgios XI 5.11.1 sending false host down email alerts days after power restored to data center
-
- Posts: 3
- Joined: Tue Jan 26, 2021 3:05 pm
- Location: Tallahassee Florida
NAgios XI 5.11.1 sending false host down email alerts days after power restored to data center
A week ago our data center had a power failure and most of the servers we manage went down for some time, including the Nagios XI server. When the Nagios server (physical server, 128Gb RAM, 20 CPU cores @3.0Ghz) first came up it sent some 300 or more email notifications out immediately. Over the course of the last week, it continues to send email notifications about hosts or services being down, but in reality nothing is down. What makes it worse is the date and time included in the notification is the current time, not the time of the outage last week. This is causing a huge problem because about 99% of the alert messages being delivered now are false alarms. to pause the insanity, we stopped the sendmail service on the server, and removed all contacts from the contact groups assigned to hosts and services. we were on version 5.11.1 when the power failure happened. Yesterday we upgraded to version 5.11.2 because of how many bug fixes were included. We have been unable to determine the source of the alert notifications being sent as things are back. Has anyone seen anything like this before? We are a multi tenant data center and need the false alerts to stop going out.
Re: NAgios XI 5.11.1 sending false host down email alerts days after power restored to data center
What does your sendmail queue look like? We experienced something similar this morning after a restore of one of our NagiosXI servers, however the difference is the timestamp in the email notification was yesterday 9/25 and not 9/26.
There is two ways to check this; mailq OR sendmail -bp
Our queue was only backed up by around 15 alerts and they were only sent to me, so I just forced them out. There is a way to clear out the queue, but I am not familiar with that.
There is two ways to check this; mailq OR sendmail -bp
Our queue was only backed up by around 15 alerts and they were only sent to me, so I just forced them out. There is a way to clear out the queue, but I am not familiar with that.