Multiple events created for the same alert.

typer100 · Post by **typer100** » Thu Jul 02, 2020 10:28 am

I get multiple events in the eventman.log for the same error, resulting in multiple notifications (see attachment).
In the console I see only one event.
Current XI version is: 5.6.14

benjaminsmith · Post by **benjaminsmith** » Thu Jul 02, 2020 2:26 pm

Hi,

This is most likely caused by multiple Nagios processes running on the server. To remedy, let's try killing all the Nagios processes and restart all the XI services. The following commands will work on Cent 7 and may need to be adjusted for other operating systems.

Code: Select all

systemctl stop crond
systemctl stop npcd
systemctl stop nagios
systemctl stop ndo2db
pkill -9 -u nagios
for i in $(ipcs -q | grep nagios |awk '{print $2}'); do ipcrm -q $i; done
rm -rf /usr/local/nagiosxi/var/dbmaint.lock
rm -rf /usr/local/nagiosxi/var/event_handler.lock
rm -rf /usr/local/nagiosxi/scripts/reconfigure_nagios.lock
systemctl restart mariadb
systemctl start ndo2db
systemctl start nagios
systemctl start npcd
systemctl start crond

Hopefully that will take care of the problem, but If the issue persists, please send us your system profile and we'll take a closer look at the logs. Thanks, Benjamin

To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and share in a private message or upload it to the post/ticket, and then reply to this post to bring it up in the queue.

typer100 · Post by **typer100** » Mon Jul 06, 2020 9:27 am

I've uploaded the system profile since the quick fix didn't work.

Moderator's Note: The profile has been shared with the support team but has been removed from the public forum.

benjaminsmith · Post by **benjaminsmith** » Tue Jul 07, 2020 1:46 pm

Hi,

Thanks for the profile. There are several crashed database tables, please run the following command.

Code: Select all

mysqlcheck -r -f -uroot -pnagiosxi --all-databases --use_frm

Then check the database logs to see if the tables are corrected or send over a fresh system profile. If the database is successfully repaired, let me know if the issue is corrected as well. Thanks, Benjamin

References
Repairing The Nagios XI Databases
Log Locations and Descriptions

typer100 · Post by **typer100** » Tue Jul 07, 2020 2:02 pm

I did the repair.

===============
REPAIR COMPLETE
===============

=======================
nagios database repair succeeded
nagiosql database repair succeeded
nagiosxi database repair succeeded

Sending a new profile.

But it didn't fix my issue.

benjaminsmith · Post by **benjaminsmith** » Wed Jul 08, 2020 1:05 pm

Hi @Typer100,

The database is looking good now. Looking over the profile, I'm not seeing duplicate alerts in the logs. However, the profile is just the tail output of recent events.

If the issue happens again, please retrieve the full /usr/local/nagios/var/nagios.log along with the mail log ( depends on if you are using SMTP or Sendmail) from the server and let me know the exact name of the service that is sending duplicates. Thanks, Benjamin

Log Locations and Descriptions

typer100 · Post by **typer100** » Wed Jul 08, 2020 1:34 pm

Pretty much all alerts are sending 10 emails.

Service: Disk Usage on /dbawork
Host: sldgbd0065
Address: 172.26.14.39
State: OK
Info:
OK: Used disk space was 34.90 % (Used: 12923.24 MB, Total_size: 38970.85 MB, Free: 24061.47 MB)
Date/Time: 2020-07-08 11:31:47

benjaminsmith · Post by **benjaminsmith** » Wed Jul 08, 2020 5:16 pm

Hi,

So here is the full configuration for that service in Core "speak". So it's going send notifications to opsgenie and the unix_contact_group every hour if the service is critical, and also on recovery.

Code: Select all

define service {
	host_name	sldgbd0065
	service_description	Disk Usage on /dbawork
	check_period	24x7
	check_command	check_xi_ncpa!-t 'etb00W7XKrj79dpj3xx158nQ0yCG8Ho1' -P 5693 -M 'disk/logical/|dbawork' -u M -w 90 -c 95!!!!!!!
	contacts	opsgenie
	contact_groups	xi_unix_contact_group
	notification_period	xi_timeperiod_24x7
	initial_state	o
	importance	0
	check_interval	5.000000
	retry_interval	1.000000
	max_check_attempts	5
	is_volatile	0
	parallelize_check	1
	active_checks_enabled	1
	passive_checks_enabled	1
	obsess	1
	event_handler_enabled	1
	low_flap_threshold	0.000000
	high_flap_threshold	0.000000
	flap_detection_enabled	1
	flap_detection_options	a
	freshness_threshold	0
	check_freshness	0
	notification_options	r,c
	notifications_enabled	1
	notification_interval	60.000000
	first_notification_delay	0.000000
	stalking_options	n
	process_perf_data	1
	retain_status_information	1
	retain_nonstatus_information	1
	_OPSGENIETEAMS	Unix
	}

Here is the log output for the service:

Code: Select all

[1594183783] SERVICE NOTIFICATION: gaujf010;sldgbd0065;Disk Usage on /dbawork;CRITICAL;xi_service_notification_handler;CRITICAL: Used disk space was 100.00 % (Used: 36967.92 MB, Total_size: 38970.85 MB, Free: 16.78 MB)
[1594183783] SERVICE NOTIFICATION: support.aix;sldgbd0065;Disk Usage on /dbawork;CRITICAL;xi_service_notification_handler;CRITICAL: Used disk space was 100.00 % (Used: 36967.92 MB, Total_size: 38970.85 MB, Free: 16.78 MB)
[1594183783] SERVICE NOTIFICATION: opsgenie;sldgbd0065;Disk Usage on /dbawork;CRITICAL;notify-service-by-opsgenie;CRITICAL: Used disk space was 100.00 % (Used: 36967.92 MB, Total_size: 38970.85 MB, Free: 16.78 MB)
[1594187666] SERVICE NOTIFICATION: gaujf010;sldgbd0065;Disk Usage on /dbawork;CRITICAL;xi_service_notification_handler;CRITICAL: Used disk space was 100.00 % (Used: 36967.92 MB, Total_size: 38970.85 MB, Free: 16.78 MB)
[1594187666] SERVICE NOTIFICATION: support.aix;sldgbd0065;Disk Usage on /dbawork;CRITICAL;xi_service_notification_handler;CRITICAL: Used disk space was 100.00 % (Used: 36967.92 MB, Total_size: 38970.85 MB, Free: 16.78 MB)
[1594187666] SERVICE NOTIFICATION: opsgenie;sldgbd0065;Disk Usage on /dbawork;CRITICAL;notify-service-by-opsgenie;CRITICAL: Used disk space was 100.00 % (Used: 36967.92 MB, Total_size: 38970.85 MB, Free: 16.78 MB)
[1594191549] SERVICE NOTIFICATION: gaujf010;sldgbd0065;Disk Usage on /dbawork;CRITICAL;xi_service_notification_handler;CRITICAL: Used disk space was 100.00 % (Used: 36967.92 MB, Total_size: 38970.85 MB, Free: 16.78 MB)
[1594191549] SERVICE NOTIFICATION: support.aix;sldgbd0065;Disk Usage on /dbawork;CRITICAL;xi_service_notification_handler;CRITICAL: Used disk space was 100.00 % (Used: 36967.92 MB, Total_size: 38970.85 MB, Free: 16.78 MB)
[1594191549] SERVICE NOTIFICATION: opsgenie;sldgbd0065;Disk Usage on /dbawork;CRITICAL;notify-service-by-opsgenie;CRITICAL: Used disk space was 100.00 % (Used: 36967.92 MB, Total_size: 38970.85 MB, Free: 16.78 MB)
[1594195434] SERVICE NOTIFICATION: gaujf010;sldgbd0065;Disk Usage on /dbawork;CRITICAL;xi_service_notification_handler;CRITICAL: Used disk space was 100.00 % (Used: 36967.92 MB, Total_size: 38970.85 MB, Free: 16.78 MB)
[1594195434] SERVICE NOTIFICATION: support.aix;sldgbd0065;Disk Usage on /dbawork;CRITICAL;xi_service_notification_handler;CRITICAL: Used disk space was 100.00 % (Used: 36967.92 MB, Total_size: 38970.85 MB, Free: 16.78 MB)
[1594195434] SERVICE NOTIFICATION: opsgenie;sldgbd0065;Disk Usage on /dbawork;CRITICAL;notify-service-by-opsgenie;CRITICAL: Used disk space was 100.00 % (Used: 36967.92 MB, Total_size: 38970.85 MB, Free: 16.78 MB)
[1594199319] SERVICE NOTIFICATION: gaujf010;sldgbd0065;Disk Usage on /dbawork;CRITICAL;xi_service_notification_handler;CRITICAL: Used disk space was 100.00 % (Used: 36967.92 MB, Total_size: 38970.85 MB, Free: 16.78 MB)

If you look the timestamps you'll see that it is notifying the contacts every hour, so this would be expected. If you do not want to receive additional notifications, you can set the notification_interval to 0 and Nagios will only send out one notification otherwise I would increase the interfaval to a longer time period.

Hope that helps and let me know if you have any questions.

typer100 · Post by **typer100** » Thu Jul 09, 2020 5:52 am

Hi. I wish it could only be that or I just don't get it. You see, support.aix received 10 emails for the alerts within 2-3 seconds.
I've included a screenshot of that inbox. Not exactly the same alert for the same host, but same problem.

benjaminsmith · Post by **benjaminsmith** » Thu Jul 09, 2020 5:34 pm

Hi,

Right now, I'm not seeing multiple service notification in the Nagios log, so it's likely an issue with the mail setup or the event queue in XI. When you open those emails, are do they have the exact same Date/Timestamp (are they duplicates). Also, Is this issue affecting all XI users accounts?

Can your try to send a custom notification for this host, and let me know if you receive more than one message?

custom-notification.png

Nagios Support Forum

Multiple events created for the same alert.

Multiple events created for the same alert.

Re: Multiple events created for the same alert.

Re: Multiple events created for the same alert.

Re: Multiple events created for the same alert.

Re: Multiple events created for the same alert.

Re: Multiple events created for the same alert.

Re: Multiple events created for the same alert.

Re: Multiple events created for the same alert.

Re: Multiple events created for the same alert.

Re: Multiple events created for the same alert.