Nagios XI appears to be queueing notifications?

askewdread · Post by **askewdread** » Thu Dec 01, 2016 1:13 pm

Hi,

we are currently trailing nagios xi and have come across an issue multiple times now

if we have a massive outage all the notifications appear to queue up and very slowly send, and they sometimes take over 24 hours.... the emails that get sent have the date/time of the actual outage,
which currently is around 14 hours ago

ive tried both sendmail and smtp and both do the same issue, ive got it using sendmail at the moment to make it easy to see they are actually coming from this server.....
in the nagios xi gui and the nagios.log file both show the alerts being sent at the time of the outage, and the mailq is empty

it eventually gets thru the backlog and stops, but is a bit annoying for whoever is on call at the time....
ive used nagios core personally for years and never come across anything like that....

tmcdonald · Post by **tmcdonald** » Thu Dec 01, 2016 1:58 pm

XI definitely has more moving parts than a pure Core setup, so there are some considerations to make. Primarily, what sort of hardware (CPU, RAM, disk speed) does this server have, and how many total hosts + services are you monitoring?

askewdread · Post by **askewdread** » Thu Dec 01, 2016 3:22 pm

Hi

the server is a vm, has 8 cpu cores and currently has 8gb of memory

its load sits at around 4 out of 5, the primary disk is

Code: Select all

1073741824 bytes (1.1 GB) copied, 0.759818 s, 1.4 GB/s

but the drbd disk (where nagios sits) is

Code: Select all

1073741824 bytes (1.1 GB) copied, 95.8859 s, 11.2 MB/s

so that is possibly the issue.... which now i have to work out why thats happening :/ as the physical disk is sitting on the same storage as the above one

rkennedy · Post by **rkennedy** » Thu Dec 01, 2016 5:49 pm

Assuming your offloading the database, this could affect it. I've seen it when the timestamps are different, because the filesystem tries to line up with SQL, which is different, and then causes a delay. Could you run the following on both the DB machine, and the XI machine? (replace the credentials on the offloaded DB with the proper information, we just need SELECT NOW(); to run.)

Code: Select all

grep "date.timezone" /etc/php.ini
ls -l /etc/localtime
php -r 'echo date("D M j G:i:s T Y")."\n";'
date
mysql -unagiosxi -pn@gweb -e "SELECT NOW();"

askewdread · Post by **askewdread** » Thu Dec 01, 2016 5:56 pm

hi,

its not offloaded, its just clustered with its storage in the drbd sync'd filesystem, then it just starts on the node that its active on

Code: Select all

[root@drabsglnx10 ~]# grep "date.timezone" /etc/php.ini
; http://php.net/date.timezone
date.timezone = UTC
[root@drabsglnx10 ~]# ls -l /etc/localtime
lrwxrwxrwx. 1 root root 38 Nov 11 08:25 /etc/localtime -> /usr/share/zoneinfo/Pacific/Auckland
[root@drabsglnx10 ~]# php -r 'echo date("D M j G:i:s T Y")."\n";'
Thu Dec 1 22:52:55 UTC 2016
[root@drabsglnx10 ~]# date
Fri Dec  2 11:52:55 NZDT 2016



[root@dnzbsglnx10 ~]# grep "date.timezone" /etc/php.ini
; http://php.net/date.timezone
date.timezone = Pacific/Auckland
[root@dnzbsglnx10 ~]# ls -l /etc/localtime
lrwxrwxrwx. 1 root root 36 Nov 11 09:51 /etc/localtime -> /usr/share/zoneinfo/Pacific/Auckland
[root@dnzbsglnx10 ~]# php -r 'echo date("D M j G:i:s T Y")."\n";'
Fri Dec 2 11:52:44 NZDT 2016
[root@dnzbsglnx10 ~]# date
Fri Dec  2 11:52:44 NZDT 2016
[root@dnzbsglnx10 ~]# mysql -unagiosxi -pn@gweb -e "SELECT NOW();"
+---------------------+
| NOW()               |
+---------------------+
| 2016-12-02 11:53:20 |
+---------------------+

rkennedy · Post by **rkennedy** » Thu Dec 01, 2016 6:02 pm

This explains the delay I believe -

Code: Select all

[root@drabsglnx10 ~]# php -r 'echo date("D M j G:i:s T Y")."\n";'
Thu Dec 1 22:52:55 UTC 2016
[root@drabsglnx10 ~]# date
Fri Dec  2 11:52:55 NZDT 2016

You'll want to update /etc/php.ini to be the proper time - https://www.devside.net/wamp-server/set ... php-to-use

Then just run service httpd restart

askewdread · Post by **askewdread** » Thu Dec 01, 2016 6:06 pm

would that come into it seeing that server sits there with all services stopped and doing nothing unless we failover to it?

rkennedy · Post by **rkennedy** » Fri Dec 02, 2016 10:50 am

I wouldn't think so, but it's hard to say as our basic installs do not take DRBD into account, this is generally handled by one of our partners, Linbit.f

The system time on that machine appears to be UTC, where as your other one is NZDT. The time definitely will affect notifications.

ive tried both sendmail and smtp and both do the same issue, ive got it using sendmail at the moment to make it easy to see they are actually coming from this server.....

I could see if sendmail was delaying due to a mail queue, but since you've tried both and the same occurence, it leads me to believe this is an OS level issue.

Nagios Support Forum

Nagios XI appears to be queueing notifications?

Nagios XI appears to be queueing notifications?

Re: Nagios XI appears to be queueing notifications?

Re: Nagios XI appears to be queueing notifications?

Re: Nagios XI appears to be queueing notifications?

Re: Nagios XI appears to be queueing notifications?

Re: Nagios XI appears to be queueing notifications?

Re: Nagios XI appears to be queueing notifications?

Re: Nagios XI appears to be queueing notifications?