Page 1 of 2
Notifications not happening on certain devices
Posted: Thu Jul 30, 2020 5:22 am
by vijilants
Nagios XI - System Info
System
Nagios XI version: 5.7.2
Release info: nms1 3.10.0-957.5.1.el7.x86_64 x86_64
CentOS Linux release 7.6.1810 (Core)
Gnome is not installed
Hi,
Can you please advise...
Yesterday the Nagios screen lit up with a mass of alarms. However there were no email notifications for any of the alarming devices.
Notifications are working and there were email notificatios before and after the event but not for the major event.
I have run notification and state history reports for the times in question and they also confirm that the events occured but no notifications for these specific events.
This was the first event where there was no notification...
Code: Select all
2020-07-29 13:52:54 CHCCITMWR02 Ping CRITICAL SOFT 1 of 5 CRITICAL - 10.40.2.163: rta nan, lost 100%
Here are two previous alarms dating back to last month which do show that notifications were previously working for this device....
Code: Select all
2020-06-30 17:04:28 CHCCITMWR02 Ping Flapping Stopped No OK operations Nagios XI OK - 10.40.2.163: rta 22.596ms, lost 20%
2020-06-30 16:44:35 CHCCITMWR02 Ping Flapping Started No OK operations Nagios XI OK - 10.40.2.163: rta 22.344ms, lost 60%
If there a way of investigating this as I'm not sure as to what has happened as out of 17 pages of alarms across several devices, no notifications were generated by the system for the time period.
Many Thanks
Re: Notifications not happening on certain devices
Posted: Thu Jul 30, 2020 3:02 pm
by benjaminsmith
Hi vijilants,
Typically it's a configuration or user preference setting. Also, if the host os service is flapping, it will suppress notifications when flap detection is enabled.
Code: Select all
2020-06-30 17:04:28 CHCCITMWR02 Ping Flapping Stopped No OK operations Nagios XI OK - 10.40.2.163: rta 22.596ms, lost 20%
2020-06-30 16:44:35 CHCCITMWR02 Ping Flapping Started No OK operations Nagios XI OK - 10.40.2.163: rta 22.344ms, lost 60
Also, notifications are only sent when the host or service enters a hard non-ok state, so alert would not generate a notification.
Code: Select all
2020-07-29 13:52:54 CHCCITMWR02 Ping CRITICAL SOFT 1 of 5 CRITICAL - 10.40.2.163: rta nan, lost 100%
Was this the case for every Nagios XI user? Make notifications are enabled for the user account.
notifications.png
Then go-to Admin > System Config > Email Settings, and try sending a test email message. Is it received? If not then there's an issue with the email settings. Next, check the configurations following the steps in the guide below.
Nagios XI - Notification Problems
If you are unable to resolve the issue, please send me a system profile and the names of the hosts or services that should have sent the notification.
To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share in a private message or upload it to the post/ticket, and then reply to this post to bring it up in the queue.
Re: Notifications not happening on certain devices
Posted: Fri Jul 31, 2020 9:40 am
by vijilants
Note that there is nothing wrong with the email functionality. We are recieving emails for all the other events.
Also your system will not allow me to attach more than one file I have sent you 3 x PMs...one with the profie, the next with the events in question which did not raise notifications, and another with the notifications that were raised during the time period.
Many Thanks
Re: Notifications not happening on certain devices
Posted: Mon Aug 03, 2020 4:10 am
by vijilants
Is there an update on this as there definately appears to be some sort of a problem sinvce our upgrade to v5.7.2
We currently have critical alarms up for an object and the system is supposed to send out a notification 4 hours if not cleared, however it sent out the first notification and none after that. The object has been down for 4 days.
Thanks
Re: Notifications not happening on certain devices
Posted: Mon Aug 03, 2020 1:11 pm
by benjaminsmith
Hi vijilants,
Appreciate the system profile, to compare the nagios log with the notifications report, could send over the nagios log from 7-29-20, you'll find this in the following folder:
and will be labeled with the date on which it was archived, such as 7-30. I'm not able to find any status updates for this host in the current log from 7/31.
Also, please go to Admin > System Config > Email Settings and enable logging for phpmailer. Then go to this host (LG2RDMWR03) from Home > Details > Host Status, select the host and click on the Advanced Tab (+) and send a
Custom Notification from the Commands menu, and then upload the phpmailer log. Thanks, Benjamin
Re: Notifications not happening on certain devices
Posted: Tue Aug 04, 2020 5:14 am
by vijilants
Hi Benjamin,
The devices that did not send out notifications were the ones with CHC at the start of the name.
I have done as you requested and attached are the files. I also did the custom notification for CHC01.
Many Thanks
Vij
Re: Notifications not happening on certain devices
Posted: Tue Aug 04, 2020 3:55 pm
by benjaminsmith
Hi Vij,
It will help to narrow down to a specific host or service to help troubleshoot this, the notification report uploaded was for LG2RDMWR03, so I assumed that was the issue.
It does look like it's an issue with SNMP traps, is that correct? The nagios log is from 7/28 as it gets archived on the 7/29, but I found this host starting with CH.
Code: Select all
[1595992394] SERVICE NOTIFICATION: operations;CHCBNYTNMWR01;SNMP Traps;WARNING;xi_service_notification_handler;A rttMonThresholdNotification indicates the 0A 28 03 3A 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 04 00 00 1 / enterprises.9.9.42.1.2.1.1.3.1 (): enterprises.9.9.42.1.4.1.1.5.1 ():0A 28 03 3A 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 04 00 00 enterprises.9.9.42.1.2.9.1.7.1 ():1
[1595992394] SERVICE NOTIFICATION: operations;CHCBNYTNMWR01;SNMP Traps;OK;xi_service_notification_handler;A rttMonNotification indicates the occurrence of a 0A 28 03 3A 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 04 00 00 1 1 64 60 55 00 00 00 00 13 C8 C7 90 00 00 00 04 24 00 00 44 00 00 00 00 00 00 00 00 08 B6 00 00 / enterprises.9.9.42.1.2.1.1.3.1 (): enterprises.9.9.42.1.4.1.1.5.1 ():0A 28 03 3A 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 04 00 00 enterprises.9.9.42.1.2.19.1.2.1 ():1 enterprises.9.9.42.1.2.19.1.10.1 ():1 enterprises.9.9.42.1.2.19.1.9.1 ():64 enterprises.9.9.42.1.2.19.1.5.1 ():60 enterprises.9.9.42.1.2.19.1.6.1 ():55 enterprises.9.9.42.1.2.2.1.33.1 ():00 00 00 00 13 C8 C7 90 00 00 00 04 24 00 00 44 00 00 00 00 00 00 00 00 08 B6 00 00
The notification handler was executed and the contact configuration (operations) looks good, but the phpmailer log does not have any entries from that time period. There are a few entires in 8/4 and everything else is from March.
Code: Select all
[03-08-2017 03:26:25] Message sent! (method=smtp;host=smtp.gmail.com;port=465;smtpauth=true;security=ssl), Referer: admin/testemail.php
[03-08-2017 03:26:45] Message sent! (method=smtp;host=smtp.gmail.com;port=465;smtpauth=true;security=ssl), Referer: admin/testemail.php
[03-08-2017 03:33:27] Message sent! (method=smtp;host=smtp.gmail.com;port=465;smtpauth=true;security=ssl), Referer: includes/components/xicore/xicore.inc.php > Event Handler Notification Email
[03-08-2017 03:39:36] Message sent! (method=smtp;host=smtp.gmail.com;port=465;smtpauth=true;security=ssl), Referer: admin/users.php > Email All Users
[03-08-2017 03:39:36] Message sent! (method=smtp;host=smtp.gmail.com;port=465;smtpauth=true;security=ssl), Referer: admin/users.php > Email All Users
[08-04-2020 06:03:28] Message sent! (method=smtp;host=smtp.gmail.com;port=465;smtpauth=true;security=ssl), Referer: includes/components/xicore/xicore.inc.php > Event Handler Notification Email
[08-04-2020 06:04:58] Message sent! (method=smtp;host=smtp.gmail.com;port=465;smtpauth=true;security=ssl), Referer: includes/components/xicore/xicore.inc.php > Event Handler Notification Email
Let's keep logging enable so we can track down the notification in the mail log to confirm that it was sent. Can you pull a notification report on this host and post it to the ticket?
Re: Notifications not happening on certain devices
Posted: Fri Aug 07, 2020 6:11 am
by vijilants
Hi,
When I opened this fault I sent you a state history report and a notification report and it was not the SNMP trap related alarms that had an issue but it was devices that the Nagios was unable to ping.
I sent you a state history report showing all the devices that went in to alarm as nagios was unable to ping them, however for the same period the notifications report showed that none of the devices in alarm had any notifications associated to the for some odd reason.
Did you review these reports ? This had nothing to do with SNMP traps.
Regards
Re: Notifications not happening on certain devices
Posted: Fri Aug 07, 2020 10:33 am
by benjaminsmith
Hi
@vijilants,
Can you send me the nagios.log with the data from 7-29. I requested this earlier as the log sent over contains the data from 7-28, and I can review that for notifications for hosts beginning with CHC for that period.
The configurations settings do look good, however, there is not enough data in the mail log to determine if a notification went out or not. I would recommend keeping the logging on, so if this happens in the near feature, it will be easier to determine what might have gone wrong.
Re: Notifications not happening on certain devices
Posted: Mon Aug 10, 2020 9:15 am
by vijilants
Hi ,
I did pm you the file you requested hhen you requested it.
Anyway, I have pm'd it to you again.
Thanks