Page 1 of 2

Nagios XI sending very delayed emails after network outage.

Posted: Wed Mar 13, 2019 7:11 am
by yo_marc
Hi all - I really need help with this one.

I have a 5.5.8 XI server here that is sending very delayed emails long after a network outage has been resolved. (Over 5 hours ago).

The state of the monitoring looks good - None of the emails that are coming through now match what is seen in Nagios XI.

I am sending emails via a SMTP config in the GUI. It seems the only way I can stop these old/stale emails from going through is to put in some bogus info in that config. I do not see any mail sitting on the server locally.

Can anyone please tell me how I can stop these old/stale notifications from being sent? I've got hundreds of users, and the emails are confusing.

Rebooting did not help...

Re: Nagios XI sending very delayed emails after network outa

Posted: Wed Mar 13, 2019 7:51 am
by yo_marc
I am looking under the covers of 'eventman'. The eventman.log is churning away, apparently sending email from events that happened 9 hours ago.

Running the following query (borrowed and adapted from eventman.php), I am seeing over 50,000 rows returned...

Code: Select all

SELECT * FROM nagiosxi.xi_events WHERE (status_code='0' AND event_time<=NOW()) OR (status_code='".escape_sql_param(EVENTSTATUS_PROCESSING,DB_NAGIOSXI)."'
 AND processing_time + INTERVAL 1 MINUTE <= NOW()) ORDER BY event_id ASC;
CentOS Linux release 7.6.1810 (Core)
XI 5.5.8
mariadb-5.5.60-1.el7_5.x86_64

Re: Nagios XI sending very delayed emails after network outa

Posted: Wed Mar 13, 2019 8:09 am
by yo_marc
Seeing lots of SNMP entries like this in the eventman.log, FWIW:

Code: Select all

*** GLOBAL HANDLER (snmptrapsender)...
Array
(
    [event_id] => 3373480
    [event_source] => 2
    [event_type] => 1
    [event_time] => 2019-03-12 22:39:28
    [event_meta] => Array
        (
            [handler-type] => service
            [host] => <hostname>
            [service] => Puppet-Agent
            [hostaddress] => <IP address>
            [hoststate] => UP
            [hoststateid] => 0
            [hosteventid] => 1605924
            [hostproblemid] => 0
            [servicestate] => CRITICAL
            [servicestateid] => 2
            [lastservicestate] => OK
            [lastservicestateid] => 0
            [servicestatetype] => SOFT
            [currentattempt] => 1
            [maxattempts] => 5
            [serviceeventid] => 1605927
            [serviceproblemid] => 703058
            [serviceoutput] => CHECK_NRPE: Socket timeout after 10 seconds.
            [longserviceoutput] =>
            [servicedowntime] => 0
        )

    [logging_enabled] => 1
)
SNMP TRAP SENDER NOT CONFIGURED!
Not sure what those are about, or if they are of any importance.

Re: Nagios XI sending very delayed emails after network outa

Posted: Wed Mar 13, 2019 8:56 am
by yo_marc
I have the event queue cleared out. I manually deleted the entries from the DB. (Desperate times... Desperate measures. Plan B was to restore from yesterdays backup to accomplish the same task.)

If anyone could give any insight as to what those SNMP messages are about, that would be great. Also, any info on why our mail queue got so stacked up. We had over 144,000 events waiting to be processed from a 3 hours network-flapping outage affecting about 250 Hosts (corrected from 100 originally posted).

Our server has about 800 Host, 4000 Services. "5, 1, 5" on the check-interval, retry-interval, and max-check-attempts - respectively.

Re: Nagios XI sending very delayed emails after network outa

Posted: Wed Mar 13, 2019 2:18 pm
by npolovenko
Hello, @yo_marc. One thing you can do next time to stop spooled email notifications is to run the query to clear out the mailing queue:
echo "truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;" | mysql -uroot -pnagiosxi nagiosxi
Your global SNMP trap sender was attempting to send SNMP traps with critical service check results. But looks like it is not fully configured on your system. If you want to disable it go to the components menu, click on settings, check the box to disable SNMP trap sender integration.
Untitled.png

Re: Nagios XI sending very delayed emails after network outa

Posted: Thu Mar 14, 2019 10:40 am
by yo_marc
Perfect! Thank you very much for the help.

Re: Nagios XI sending very delayed emails after network outa

Posted: Thu Mar 14, 2019 10:48 am
by yo_marc
Strangely, I don't have the SNMP Trap Sender enabled...
SNMPtrapSender.JPG
I tried enabling, then disabling, but still seeing those same messages in the eventman.log

I did notice with it 'enabed', the last line below (SNMP TRAP SENDER NOT ENABLED!) was absent.

Code: Select all

*** GLOBAL HANDLER (snmptrapsender)...
Array
(
    [event_id] => 3527999
    [event_source] => 2
    [event_type] => 1
    [event_time] => 2019-03-14 11:44:31
    [event_meta] => Array
        (
            [handler-type] => service
            [host] => <host>
            [service] => Scheduled Tasks: Task Scheduler Library
            [hostaddress] => <host> 
            [hoststate] => UP
            [hoststateid] => 0
            [hosteventid] => 1693738
            [hostproblemid] => 0
            [servicestate] => CRITICAL
            [servicestateid] => 2
            [lastservicestate] => CRITICAL
            [lastservicestateid] => 2
            [servicestatetype] => SOFT
            [currentattempt] => 5
            [maxattempts] => 5
            [serviceeventid] => 1697027
            [serviceproblemid] => 747958
            [serviceoutput] => 2 / 23 tasks failed! <info>
            [longserviceoutput] =>
            [servicedowntime] => 0
        )

    [logging_enabled] => 1
)
SNMP TRAP SENDER NOT ENABLED! VALUE='0'
Is there something wrong with the Component, perhaps?

Re: Nagios XI sending very delayed emails after network outa

Posted: Thu Mar 14, 2019 12:02 pm
by npolovenko
@yo_marc, Can you check the global event handlers component? It should be in the same menu and make sure you don't have any global event handlers enabled.

Re: Nagios XI sending very delayed emails after network outa

Posted: Fri Mar 15, 2019 8:55 am
by yo_marc
Nothing is enabled there either...

I did go ahead and remove the 'SNMP Trap Sender' component from a test system, and that did remove those snmp specific entries from the eventman.log.

I don't think we plan on using that component, so I am probably ok with that 'fix'. But just to be sure its necessary, in an "alert storm" such as the one we experienced, I assume there will be some performance hit if we are unnecessarily trying to process snmp forwards... is that correct?

Re: Nagios XI sending very delayed emails after network outa

Posted: Fri Mar 15, 2019 11:56 am
by npolovenko
@yo_marc, I spoke to my colleagues and was told that these event messages are normal. The message will say that the "SNMP TRAP SENDER NOT ENABLED" even if you don't use the component. There is no way to disable these event messages. But running the query I provided should clear out the event queue. You can use it when something major goes wrong and there are lots of notifications.