NAgios XI 5.11.1 sending false host down email alerts days after power restored to data center
Posted: Thu Sep 14, 2023 3:33 pm
A week ago our data center had a power failure and most of the servers we manage went down for some time, including the Nagios XI server. When the Nagios server (physical server, 128Gb RAM, 20 CPU cores @3.0Ghz) first came up it sent some 300 or more email notifications out immediately. Over the course of the last week, it continues to send email notifications about hosts or services being down, but in reality nothing is down. What makes it worse is the date and time included in the notification is the current time, not the time of the outage last week. This is causing a huge problem because about 99% of the alert messages being delivered now are false alarms. to pause the insanity, we stopped the sendmail service on the server, and removed all contacts from the contact groups assigned to hosts and services. we were on version 5.11.1 when the power failure happened. Yesterday we upgraded to version 5.11.2 because of how many bug fixes were included. We have been unable to determine the source of the alert notifications being sent as things are back. Has anyone seen anything like this before? We are a multi tenant data center and need the false alerts to stop going out.