Page 1 of 1
Recovery alert never stops
Posted: Tue Jul 10, 2018 11:38 pm
by vazudevan
We are facing a strange issue, the host recovery notification for 4 hosts are going on repeating even after the notification is disabled. I checked alert settings and it is set to send out only one alert. Wondered if it was some kind of cache and rebooted the nagios server.
The alerts continued. looking at tail -f /usr/local/nagiosxi/var/eventman.log showed alrerts being processed and sent out. I tried deactivating the host and still the alerts did not stop. I deleted the hosts, and still the alerts were flowing.. A second reboot too has not solved the problem.
Now, the hosts are not in CCM at all, and still the alerts are being sent every 2 mins (check interval). How to address this. it has been very annoying and frustrating.
Re: Recovery alert never stops
Posted: Wed Jul 11, 2018 3:14 am
by vazudevan
Did further check and see that we are not receiving service notifications, and for host notification that gets triggered, the email template is not taking into effect. instead it appears to be a static email, that comes in place of an actual one.
i.e. the what we see in Home -> Incident Management -> Notifications is not matching with the actual notification received.
Re: Recovery alert never stops
Posted: Wed Jul 11, 2018 8:56 am
by scottwilkerson
If these are all recovery notifications from passive checks this is a known issue in 5.5 ans will be resolved in 5.5.1 to be released in the next few days.
However it appears you somehow have a host that is configured and not in the CCM, which would be different.
Please do the following
CCM -> Tools -> Config File Management
Delete Files
Then
Apply Configuration
Re: Recovery alert never stops
Posted: Wed Jul 11, 2018 10:20 am
by vazudevan
Hi, Tried deleting the files and apply config as suggested. This did not help either.
Maybe I have not explained the problem right. We are getting repeated alerts showing host X1 or X2 or X3 or X4 as UP frequently. We are not receiving any alerts for other hosts / services viz A, B, C ..Z.
Home - Incident Management - Notification shows that notifications are sent to other hosts A, B, C. However what ends up in the mailbox is alert for host X1, X2, X3 or X4 only.
For checking purpose, I modified the notification command to the core version of the commands notify-host-by-email and notify-service-by-email for a contact or two and they get the notification as desired.
However all contacts with command xi_host_notification_handler and xi_service_notification_handler receive notification for X1, X2, X3, X4 only and not any other notification.
This is happening even after deleting the hosts X1, X2,X3 and X4 from core and configs applied. we did check in the folder and these hosts config files are indeed removed. It appears something wrong with the database and eventhanlding.
We did a repair_database.sh as well, and did not help. Not sure if this is a bug or something else. If resolution is not simple, is there a method I can to go back to 5.4.13 ?
Re: Recovery alert never stops
Posted: Wed Jul 11, 2018 11:26 am
by scottwilkerson
By chance are the notifications you are seeing for X1, X2, X3, X4 old?
Can you post the output of
vazudevan wrote:If resolution is not simple, is there a method I can to go back to 5.4.13 ?
You can restore the backup that was made before upgrade which will bring you back to 5.4.13
https://assets.nagios.com/downloads/nag ... ios-XI.pdf
Re: Recovery alert never stops
Posted: Wed Jul 11, 2018 11:40 am
by vazudevan
The date in the notification is current, however the date that shows up in eventmon.log is old date and timestamp (July 10th)
Code: Select all
ps -ef|grep nagios.cfg
root 20709 20457 0 12:38 pts/1 00:00:00 grep --color=auto nagios.cfg
nagios 27489 1 2 11:58 ? 00:00:52 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 27553 27489 0 11:58 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
Re: Recovery alert never stops
Posted: Wed Jul 11, 2018 11:42 am
by scottwilkerson
scottwilkerson wrote:By chance are the notifications you are seeing for X1, X2, X3, X4 old?
Do they have a date in the message from before you removed the hosts?
Re: Recovery alert never stops
Posted: Wed Jul 11, 2018 11:55 am
by vazudevan
The alerts are carrying current time (%datetime% variable), is used in the template
Please refer ticket #433842 for profile and other details.
Re: Recovery alert never stops
Posted: Wed Jul 11, 2018 2:15 pm
by scottwilkerson
Going to close thread and continue in ticket support