Recovery alert never stops

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
vazudevan
Posts: 36
Joined: Fri Oct 21, 2016 4:52 am

Recovery alert never stops

Post by vazudevan »

We are facing a strange issue, the host recovery notification for 4 hosts are going on repeating even after the notification is disabled. I checked alert settings and it is set to send out only one alert. Wondered if it was some kind of cache and rebooted the nagios server.

The alerts continued. looking at tail -f /usr/local/nagiosxi/var/eventman.log showed alrerts being processed and sent out. I tried deactivating the host and still the alerts did not stop. I deleted the hosts, and still the alerts were flowing.. A second reboot too has not solved the problem.

Now, the hosts are not in CCM at all, and still the alerts are being sent every 2 mins (check interval). How to address this. it has been very annoying and frustrating.
vazudevan
Posts: 36
Joined: Fri Oct 21, 2016 4:52 am

Re: Recovery alert never stops

Post by vazudevan »

Did further check and see that we are not receiving service notifications, and for host notification that gets triggered, the email template is not taking into effect. instead it appears to be a static email, that comes in place of an actual one.

i.e. the what we see in Home -> Incident Management -> Notifications​ is not matching with the actual notification received.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Recovery alert never stops

Post by scottwilkerson »

If these are all recovery notifications from passive checks this is a known issue in 5.5 ans will be resolved in 5.5.1 to be released in the next few days.

However it appears you somehow have a host that is configured and not in the CCM, which would be different.

Please do the following
CCM -> Tools -> Config File Management
Delete Files
Then
Apply Configuration
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
vazudevan
Posts: 36
Joined: Fri Oct 21, 2016 4:52 am

Re: Recovery alert never stops

Post by vazudevan »

Hi, Tried deleting the files and apply config as suggested. This did not help either.

Maybe I have not explained the problem right. We are getting repeated alerts showing host X1 or X2 or X3 or X4 as UP frequently. We are not receiving any alerts for other hosts / services viz A, B, C ..Z.

Home - Incident Management - Notification shows that notifications are sent to other hosts A, B, C. However what ends up in the mailbox is alert for host X1, X2, X3 or X4 only.

For checking purpose, I modified the notification command to the core version of the commands notify-host-by-email and notify-service-by-email for a contact or two and they get the notification as desired.

However all contacts with command xi_host_notification_handler and xi_service_notification_handler receive notification for X1, X2, X3, X4 only and not any other notification.

This is happening even after deleting the hosts X1, X2,X3 and X4 from core and configs applied. we did check in the folder and these hosts config files are indeed removed. It appears something wrong with the database and eventhanlding.

We did a repair_database.sh as well, and did not help. Not sure if this is a bug or something else. If resolution is not simple, is there a method I can to go back to 5.4.13 ?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Recovery alert never stops

Post by scottwilkerson »

By chance are the notifications you are seeing for X1, X2, X3, X4 old?

Can you post the output of

Code: Select all

ps -ef|grep nagios.cfg
vazudevan wrote:If resolution is not simple, is there a method I can to go back to 5.4.13 ?
You can restore the backup that was made before upgrade which will bring you back to 5.4.13

https://assets.nagios.com/downloads/nag ... ios-XI.pdf
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
vazudevan
Posts: 36
Joined: Fri Oct 21, 2016 4:52 am

Re: Recovery alert never stops

Post by vazudevan »

The date in the notification is current, however the date that shows up in eventmon.log is old date and timestamp (July 10th)

Code: Select all

ps -ef|grep nagios.cfg
root     20709 20457  0 12:38 pts/1    00:00:00 grep --color=auto nagios.cfg
nagios   27489     1  2 11:58 ?        00:00:52 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   27553 27489  0 11:58 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
Last edited by vazudevan on Wed Jul 11, 2018 11:42 am, edited 1 time in total.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Recovery alert never stops

Post by scottwilkerson »

scottwilkerson wrote:By chance are the notifications you are seeing for X1, X2, X3, X4 old?
Do they have a date in the message from before you removed the hosts?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
vazudevan
Posts: 36
Joined: Fri Oct 21, 2016 4:52 am

Re: Recovery alert never stops

Post by vazudevan »

The alerts are carrying current time (%datetime% variable), is used in the template
Please refer ticket #433842 for profile and other details.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Recovery alert never stops

Post by scottwilkerson »

Going to close thread and continue in ticket support
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked