Page 1 of 1

Primary and Failover both are active

Posted: Thu Jan 18, 2018 7:45 am
by rtsupport
Current version : Nagios XI 2014R2.7
OS : Unix

Currently we are facing issue where our PRD and DR both server are in active state. however if PRD server is in active state then on DR server (Active server/Active services and Notification) should be in disable state. which are showing in active status now.

Due to this both PRD and DR servers are sending alerts and customer are receiving same alert twice.

NOTABLE : Earlier if DR was in active state then the URL in alert was mentioned for DR, (to ack or disable the alert) but now alert from DR server also mentioned same as PRD alert.

Steps done -
> Recently we have manually syncd DR with PRD- (using below steps)
________________________________________________________________________
1) Log into the primary Nagios XI system
2) cd /usr/local/nagiosxi/scripts
3) sudo ./backup_xi.sh
4) Copy the file the backup script created to /tmp (I'll refer to it as <BACKUP_SCRIPT> and it will have a name like 134544321.tgz)
5) scp the backup script to the Nagios XI Failover system to /tmp
6) Log into the Secondary Nagios XI system
7) cd /usr/local/nagiosxi/scripts
8) sudo ./restore_xi.sh /tmp/<BACKUP SCRIPT>
_______________________________________________________________________
>Full Nagios Restarted PRD/DR
>Repair DB
>reconfigure DR

Please advise what else we can check and do to resolve the issue.

Re: Primary and Failover bot are active

Posted: Thu Jan 18, 2018 5:33 pm
by tgriep
When you restore XI to the Backup server, it will overwrite the Program URL: and the External URL: from the Primary server so to fix this, after to restore to the backup server, login to the Admin > System Settings menu and update both or the URL's.

Then if the notifications do get enabled, is will show the correct server.

Re: Primary and Failover bot are active

Posted: Thu Jan 18, 2018 5:35 pm
by cdienger
What is in place to control failover? Did you following anything(https://assets.nagios.com/downloads/nag ... ios-XI.pdf or https://www.linbit.com/en/576-ha-nagios ... -on-rhel7/ for example) to setup failover or is this a new setup?

As an immediate fix you can disable the DR nagios server by running service nagios stop if you haven't already.

Re: Primary and Failover bot are active

Posted: Fri Jan 19, 2018 8:08 am
by rtsupport
Team,

thank you for your response, and just to let you know our issue has been resolved.

upon checking we found that when we restating full Nagios it is returning to previous state. so we disable the Active host, Active service, Notification and after that when we did full restart it back to state where these there services are in stop.

@ tgrip

As per your suggestion i checked "system setting " and found that URL has been replaced by PRD URL. and i am assuming that these is the reason for when DR is sending alert its having PRD URL in the end of the alert to respond to the user. we need to check this by replacing with DR URL if that fix this as well.

Re: Primary and Failover bot are active

Posted: Fri Jan 19, 2018 3:39 pm
by lmiltchev
@rtsupport Is it safe to close this topic and mark it as "resolved"?

Re: Primary and Failover bot are active

Posted: Tue Jan 23, 2018 5:39 am
by rtsupport
Yes, Please you may close the topic. Thank you!