Page 1 of 1

Nagios XI clean up old alerts

Posted: Tue Feb 05, 2019 4:09 pm
by tongchenkuo
Running Nagios XI 5.2.9 on CentOS Linux release 7.3.1611.

If the sendmail service is stopped for weeks, all the previous alters are remained somewhere.

When we restart the sendmail services (systemctl start sendmail), people received thousands of old alerts in 2 hours
before stop the sendmail service again.

How to clean up the previous alters before restart the sendmail service?

We have been tried the following steps but we still see thousands of files are generated in the /var/spool/mqueue.

- set mail server to dummy server or localhost in sendmail.cf. We cannot set to real mail server because thousands of emails will be sent.
FallbackSmartHost=fakedserver.company.com

- Stop Nagios Monitoring Engine, Performance Grapher, and Database Backend.

- run database repair script many times.

- shell script to delete all files in /var/spool/mqueue over and over.

- Restart all process from Monitoring Engine Status

- Recreate host where the notification mentioned but was deleted.

- Delete any suspected records in the nagios_tables like nagios.nagios_contact_notificationcommands, nagios.nagios_contactnotifications, nagios.nagios_contactnotificationmethods, etc.

I don't see any suspected records in both nagiosql and nagiosxi database. nagiosxi.xi_events:event_time is up to date,
nagiosxi.xi_eventqueue is null, and messages in nagiosxi.xi_meta are up to date.

We want to monitor the hosts from now and don't want to receive old alerts. We cann't let notifications really send out before cleanup
those old alerts.

any suggestion will be appreciated, thanks,

Re: Nagios XI clean up old alerts

Posted: Wed Feb 06, 2019 2:31 pm
by cdienger
Try also stopping the notification service before clearing /var/spool/mqueue.

Re: Nagios XI clean up old alerts

Posted: Thu Feb 07, 2019 3:49 pm
by tongchenkuo
No luck!, thousands of files are generated in a minute every hour to /var/spool/mqueue folder.
I post more information for this issue, Hopefully it will help,

files in the /var/spool/mqueue folder like this: I notice they are pair for each alarm; dqx.. and qfx...

-rw------- 1 root smmsp 1151 Feb 7 14:01 qfx17J1nlj004369
-rw------- 1 root smmsp 3213 Feb 7 14:01 dfx17J1nlj004369
-rw------- 1 root smmsp 1151 Feb 7 14:01 qfx17J1nlh004369
-rw------- 1 root smmsp 3213 Feb 7 14:01 dfx17J1nlh004369

These are the examples of the contains in the files,

It says "The original message was received at Sat, 29 Dec 2018 08:45:43 -0500 from nagios@localhost"
Where it came from?

dfx17J1nlh004369 contains

This is a MIME-encapsulated message

--x14JCv3J068259.1549307700/nagios04.MYCOMPANY.COM

The original message was received at Sat, 29 Dec 2018 08:45:43 -0500
from nagios@localhost


----- The following addresses had permanent fatal errors -----
[email protected]
(reason: 451 4.4.1 reply: read error from [127.0.0.1])
(expanded from: [email protected])

----- Transcript of session follows -----
[email protected]... Deferred: Connection reset by [127.0.0.1]
Message could not be delivered for 5 days
Message will be deleted from queue

--x14JCv3J068259.1549307700/nagios04.mycompany.com
Content-Type: message/delivery-status

Reporting-MTA: dns; nagios04.mycompany.com
Arrival-Date: Sat, 29 Dec 2018 08:45:43 -0500

Final-Recipient: RFC822; [email protected]
Action: failed
Status: 4.4.7
Diagnostic-Code: SMTP; 451 4.4.1 reply: read error from [127.0.0.1]
Last-Attempt-Date: Mon, 4 Feb 2019 14:15:00 -0500

--x14JCv3J068259.1549307700/nagios04.mycompany.com
Content-Type: message/rfc822

Return-Path: <[email protected]>
Received: (from nagios@localhost)
by nagios04.mycompany.com (8.14.7/8.14.7/Submit) id wBTDjhd1083554;
Sat, 29 Dec 2018 08:45:43 -0500
X-Authentication-Warning: nagios04.mycompany.com: nagios set sender to root@localhost using -f
To: [email protected]
Subject: PROBLEM Host Alert - DELETEDSERVER is DOWN
X-PHP-Originating-Script: 985:class.phpmailer.php
Date: Sat, 29 Dec 2018 08:45:43 -0500
From: [email protected]
Reply-to: [email protected]
Message-ID: <[email protected]>
X-Priority: 3
X-Mailer: PHPMailer 5.1 (phpmailer.sourceforge.net)
MIME-Version: 1.0

qfx17J1nlh004369 contains:

V8
T1549566119
K0
N0
P34075
I8/2/2895632
Fb
$_localhost [127.0.0.1]
$rESMTP
$sappprd04nagios.mycompany.com
${daemon_flags}
${if_addr}127.0.0.1
S<>
rRFC822; [email protected]
RPFD:<[email protected] >
H?P?Return-Path: <<81>g>
H??Received: from nagios04.mycompany.com (localhost [127.0.0.1])
by nagios04.mycompany.com (8.14.7/8.14.7) with ESMTP id x17J1nlh004369
for <[email protected] >; Thu, 7 Feb 2019 14:01:59 -0500
H??Received: from localhost (localhost)
by nagios04.mycompany.com (8.14.7/8.14.7/Submit) id x14JCv3J068259;
Mon, 4 Feb 2019 14:15:00 -0500
H??Date: Mon, 4 Feb 2019 14:15:00 -0500
H??From: Mail Delivery Subsystem <[email protected] >
H??Message-Id: <[email protected]>
H??To: [email protected]
H??MIME-Version: 1.0
H??Content-Type: multipart/report; report-type=delivery-status;
boundary="x14JCv3J068259.1549307700/appprd04nagios.corp.unifirst.com"
H??Subject: Returned mail: see transcript for details
H??Auto-Submitted: auto-generated (failure)

Re: Nagios XI clean up old alerts

Posted: Fri Feb 08, 2019 12:35 pm
by cdienger
From everything I'm reading about sendmail, deleting the contents of /var/spool/mqueue/ should do the trick. Are you able to completely clear it ever? Does it clear out if you run:

service nagios stop
service sendmail stop
rm /var/spool/mqueue/*


The 451 codes mean there was a temporary issue and that there should be another attempt to send the message. This is going to be dependent on sendmail settings and not something in the nagios database/software.