Delay in Email Alerts

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
bsanjay
Posts: 86
Joined: Mon Apr 29, 2019 9:38 am

Delay in Email Alerts

Post by bsanjay »

Hello Team,
We are receiving the email alerts almost 12 hrs delayed. I have PM the system profile.

Best Regards,
bsanjay
gsmith
Posts: 1253
Joined: Tue Mar 02, 2021 11:15 am

Re: Delay in Email Alerts

Post by gsmith »

Hi

Who did you PM the System Profile to?

Thanks
bsanjay
Posts: 86
Joined: Mon Apr 29, 2019 9:38 am

Re: Delay in Email Alerts

Post by bsanjay »

Hello gsmith,
Please check your PM.


Best Regards,
bsanjay
gsmith
Posts: 1253
Joined: Tue Mar 02, 2021 11:15 am

Re: Delay in Email Alerts

Post by gsmith »

Hey bsanjay,

When you go Admin, Email Settings, and hit the "Send A Test Email" button,
do you get that email pretty quickly? Or does it take 12 hours as well?

Thanks
bsanjay
Posts: 86
Joined: Mon Apr 29, 2019 9:38 am

Re: Delay in Email Alerts

Post by bsanjay »

Hello gsmith,
I checked that and test email is received immediately to my email id with below message.

This is a test email from Nagios XI

Best Regards,
bsanjay
gsmith
Posts: 1253
Joined: Tue Mar 02, 2021 11:15 am

Re: Delay in Email Alerts

Post by gsmith »

Hi

Please make a copy of /etc/php.ini:

Code: Select all

cp /etc/php.ini /etc/php.ini.orig
Now edit /etc/php.ini and update to the following values:
max_input_vars = 50000
memory_limit = 1024M
max_execution_time = 120
max_input_time = 300

Restart the Apache service using one of the following (depending on the OS):

Code: Select all

systemctl restart httpd.service
-or-

Code: Select all

systemctl restart apache2.service

Thanks
bsanjay
Posts: 86
Joined: Mon Apr 29, 2019 9:38 am

Re: Delay in Email Alerts

Post by bsanjay »

Hi gsmith,
As per your suggestion, we made the changes to php.ini file and restarted httpd service. But today again we got email alerts delayed and that too multiple alerts for same host/service in short durations like 10-15 minutes.

I have attached the system profile in PM for your reference.

Best Regards,
Sanjay Batkura
gsmith
Posts: 1253
Joined: Tue Mar 02, 2021 11:15 am

Re: Delay in Email Alerts

Post by gsmith »

Hi,

Looking through the logs there are a lot of errors/failures. We'll need to clean them up.

1. First in database_log.txt:

Code: Select all

210728 14:23:44  InnoDB: Error: trying to open a table, but could not
InnoDB: open the tablespace file './nagiosxi/#sql-611e_d2c08.ibd'!
InnoDB: Have you moved InnoDB .ibd files around without using the
InnoDB: commands DISCARD TABLESPACE and IMPORT TABLESPACE?
InnoDB: It is also possible that this is a temporary table #sql...,
InnoDB: and MySQL removed the .ibd file for this.
InnoDB: Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/innodb-troubleshooting-datadict.html
InnoDB: for how to resolve the issue.
210728 14:23:44  InnoDB: Operating system error number 2 in a file operation.
InnoDB: The error means the system cannot find the path specified.
InnoDB: If you are installing InnoDB, remember that you must create
InnoDB: directories yourself, InnoDB does not create them.
Not sure what is going on there, but for starters please look at:
https://assets.nagios.com/downloads/nag ... tabase.pdf
to see if repairing the database helps.
Please send me any output from the repair operation.
Once you have restarted Nagios please wait about half an hour and then take
another System Profile and send it to me. I'll take a look at it on the off chance that
it solves issues 2 and 3 below. So please wait to hear back from me before going
on to steps 2 and 3.

2. Next look at the attached errors.pdf
Where you see items like (key on status -1):

Code: Select all

00017: Jul 29 03:10:01 brnagios1 rrdcached[31059]: queue_thread_main: rrd_update_r
(/usr/local/nagios/share/perfdata/dev01brwfaweb01.ux.corp.local/Disk_Usage_on__apps.rrd) failed with status -1.
(/usr/local/nagios/share/perfdata/dev01brwfaweb01.ux.corp.local/Disk_Usage_on__apps.rrd: found extra data on update argument: 159976.21:160978.44)
That means either the service command changed or what was being monitored has changed. For example with
the Disk Usage checks was another drive (D:\) added? A quick way to clean this up is too remove the files:
/usr/local/nagios/share/perfdata/dev01brwfaweb01.ux.corp.local/Disk_Usage_on__apps.rrd
/usr/local/nagios/share/perfdata/dev01brwfaweb01.ux.corp.local/Disk_Usage_on__apps.xml
The downside of this is that you will lose that performance data.
If you want to try and save the data please let me know and I will track down how to do that.

3. Within the attached file there are 159 instances of:

Code: Select all

00138: Jul 29 03:12:30 brnagios1 nagios: SERVICE ALERT: stg01autabweb01.ux.corp.local;Swap Usage;CRITICAL;SOFT;2;(Service Check Timed Out On
Worker: brnagios1.ux.corp.local)
These might go away once the perf data issue (#2) is cleaned up.

Thanks
You do not have the required permissions to view the files attached to this post.
Locked