Hi,
we are currently trailing nagios xi and have come across an issue multiple times now
if we have a massive outage all the notifications appear to queue up and very slowly send, and they sometimes take over 24 hours.... the emails that get sent have the date/time of the actual outage,
which currently is around 14 hours ago
ive tried both sendmail and smtp and both do the same issue, ive got it using sendmail at the moment to make it easy to see they are actually coming from this server.....
in the nagios xi gui and the nagios.log file both show the alerts being sent at the time of the outage, and the mailq is empty
it eventually gets thru the backlog and stops, but is a bit annoying for whoever is on call at the time....
ive used nagios core personally for years and never come across anything like that....
Nagios XI appears to be queueing notifications?
-
askewdread
- Posts: 69
- Joined: Wed Nov 16, 2016 4:54 pm
Re: Nagios XI appears to be queueing notifications?
XI definitely has more moving parts than a pure Core setup, so there are some considerations to make. Primarily, what sort of hardware (CPU, RAM, disk speed) does this server have, and how many total hosts + services are you monitoring?
Former Nagios employee
-
askewdread
- Posts: 69
- Joined: Wed Nov 16, 2016 4:54 pm
Re: Nagios XI appears to be queueing notifications?
Hi
the server is a vm, has 8 cpu cores and currently has 8gb of memory
its load sits at around 4 out of 5, the primary disk is
but the drbd disk (where nagios sits) is
so that is possibly the issue.... which now i have to work out why thats happening :/ as the physical disk is sitting on the same storage as the above one
the server is a vm, has 8 cpu cores and currently has 8gb of memory
its load sits at around 4 out of 5, the primary disk is
Code: Select all
1073741824 bytes (1.1 GB) copied, 0.759818 s, 1.4 GB/sCode: Select all
1073741824 bytes (1.1 GB) copied, 95.8859 s, 11.2 MB/sRe: Nagios XI appears to be queueing notifications?
Assuming your offloading the database, this could affect it. I've seen it when the timestamps are different, because the filesystem tries to line up with SQL, which is different, and then causes a delay. Could you run the following on both the DB machine, and the XI machine? (replace the credentials on the offloaded DB with the proper information, we just need SELECT NOW(); to run.)
Code: Select all
grep "date.timezone" /etc/php.ini
ls -l /etc/localtime
php -r 'echo date("D M j G:i:s T Y")."\n";'
date
mysql -unagiosxi -pn@gweb -e "SELECT NOW();"
Former Nagios Employee
-
askewdread
- Posts: 69
- Joined: Wed Nov 16, 2016 4:54 pm
Re: Nagios XI appears to be queueing notifications?
hi,
its not offloaded, its just clustered with its storage in the drbd sync'd filesystem, then it just starts on the node that its active on
its not offloaded, its just clustered with its storage in the drbd sync'd filesystem, then it just starts on the node that its active on
Code: Select all
[root@drabsglnx10 ~]# grep "date.timezone" /etc/php.ini
; http://php.net/date.timezone
date.timezone = UTC
[root@drabsglnx10 ~]# ls -l /etc/localtime
lrwxrwxrwx. 1 root root 38 Nov 11 08:25 /etc/localtime -> /usr/share/zoneinfo/Pacific/Auckland
[root@drabsglnx10 ~]# php -r 'echo date("D M j G:i:s T Y")."\n";'
Thu Dec 1 22:52:55 UTC 2016
[root@drabsglnx10 ~]# date
Fri Dec 2 11:52:55 NZDT 2016
[root@dnzbsglnx10 ~]# grep "date.timezone" /etc/php.ini
; http://php.net/date.timezone
date.timezone = Pacific/Auckland
[root@dnzbsglnx10 ~]# ls -l /etc/localtime
lrwxrwxrwx. 1 root root 36 Nov 11 09:51 /etc/localtime -> /usr/share/zoneinfo/Pacific/Auckland
[root@dnzbsglnx10 ~]# php -r 'echo date("D M j G:i:s T Y")."\n";'
Fri Dec 2 11:52:44 NZDT 2016
[root@dnzbsglnx10 ~]# date
Fri Dec 2 11:52:44 NZDT 2016
[root@dnzbsglnx10 ~]# mysql -unagiosxi -pn@gweb -e "SELECT NOW();"
+---------------------+
| NOW() |
+---------------------+
| 2016-12-02 11:53:20 |
+---------------------+
Re: Nagios XI appears to be queueing notifications?
This explains the delay I believe -
You'll want to update /etc/php.ini to be the proper time - https://www.devside.net/wamp-server/set ... php-to-use
Then just run service httpd restart
Code: Select all
[root@drabsglnx10 ~]# php -r 'echo date("D M j G:i:s T Y")."\n";'
Thu Dec 1 22:52:55 UTC 2016
[root@drabsglnx10 ~]# date
Fri Dec 2 11:52:55 NZDT 2016
Then just run service httpd restart
Former Nagios Employee
-
askewdread
- Posts: 69
- Joined: Wed Nov 16, 2016 4:54 pm
Re: Nagios XI appears to be queueing notifications?
would that come into it seeing that server sits there with all services stopped and doing nothing unless we failover to it?
Re: Nagios XI appears to be queueing notifications?
I wouldn't think so, but it's hard to say as our basic installs do not take DRBD into account, this is generally handled by one of our partners, Linbit.f
The system time on that machine appears to be UTC, where as your other one is NZDT. The time definitely will affect notifications.
The system time on that machine appears to be UTC, where as your other one is NZDT. The time definitely will affect notifications.
I could see if sendmail was delaying due to a mail queue, but since you've tried both and the same occurence, it leads me to believe this is an OS level issue.ive tried both sendmail and smtp and both do the same issue, ive got it using sendmail at the moment to make it easy to see they are actually coming from this server.....
Former Nagios Employee