Page 1 of 3

Excessive notification emails

Posted: Wed Jul 06, 2016 8:46 am
by lee.krause
After a network/server outage this weekend, Nagios has been sending out thousands of emails.
All of the email have the same:
CHECK_NRPE: Error - Could not complete SSL handshake.

In the GUI all appears to be OK. I'm not seeing that error there.

Is there a way to flush out/see the pending notifications?

Thanks.
Lee

Re: Excessive notification emails

Posted: Wed Jul 06, 2016 10:08 am
by rkennedy
Are your contacts using notify-host-by-email or xi_host_notification_handler? On top of that, can you attach a screenshot of your 'Manage Email Settings' page?

The notify-host-by-email will use your local machine to send mail which is why I ask, and depending on your email settings page this will matter as well.

One thing that might help you in the future, is by setting your notification_interval to 0, it will only notify you once about a down host / service.

Re: Excessive notification emails

Posted: Wed Jul 06, 2016 10:33 am
by lee.krause
I think I'm using xi_host_notification_handler. Is there a way to tell?

Re: Excessive notification emails

Posted: Wed Jul 06, 2016 11:19 am
by rkennedy
lee.krause wrote:I think I'm using xi_host_notification_handler. Is there a way to tell?
Take a look at your Contacts in Configure -> Core Config Manager -> Contacts -> Select them and click 'Manage Host Notification Commands' - what do you see?

Re: Excessive notification emails

Posted: Wed Jul 06, 2016 11:28 am
by lee.krause
Here is a screenshot.

Re: Excessive notification emails

Posted: Wed Jul 06, 2016 11:31 am
by rkennedy
Are you using a template by any chance to control all of your contacts? It doesn't look like any notification commands are assigned there.

Re: Excessive notification emails

Posted: Wed Jul 06, 2016 11:49 am
by gormank
Might want to check the notifications list under home. are they being sent now, or were they sent in the past?
Also might want to check the mail queue with mailq. Maybe th ere's a backlog of messages being sent and it may be possible to delete queued mails.

Re: Excessive notification emails

Posted: Wed Jul 06, 2016 12:18 pm
by lee.krause
They are using the Default Template for notifications.

Here's the sendmail q:

# mailq
/var/spool/mqueue is empty
Total requests: 0

Here's the text of the email I received at 11:34 AM Central:
***** Nagios XI Alert *****

Nagios has detected a problem with this service.

Notification Type: PROBLEM

Service: Users
Host: xxxxxxxxxx
State: CRITICAL
Info:
CHECK_NRPE: Error - Could not complete SSL handshake.
Date/Time: 2016-07-06 16:33:48
(Time is GMT, we are -5 GMT)

Here's a screenshot of the same server in the GUI:
nagiosmessage3.PNG

As you can see the Users has been fine for 21 hours now.

Re: Excessive notification emails

Posted: Wed Jul 06, 2016 1:33 pm
by lmiltchev
I have seen similar issues when there were multiple nagios processes running. Run the following command and show the output:

Code: Select all

ps -ef | grep bin/nagios | grep -v grep

Re: Excessive notification emails

Posted: Wed Jul 06, 2016 1:43 pm
by lee.krause
# ps -ef | grep bin/nagios | grep -v grep
nagios 12134 1 0 Jul05 ? 00:12:33 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 12136 12134 0 Jul05 ? 00:00:48 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 12137 12134 0 Jul05 ? 00:00:49 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 12138 12134 0 Jul05 ? 00:00:48 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 12139 12134 0 Jul05 ? 00:00:48 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 12144 12134 0 Jul05 ? 00:00:04 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 21313 1 0 Jun01 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 21373 21313 0 Jun01 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 21374 21313 0 Jun01 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 21375 21313 0 Jun01 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 21376 21313 0 Jun01 ? 00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh