Page 1 of 3
Nagios Failover Issue
Posted: Mon Feb 12, 2018 9:11 am
by rtsupport
Team,
We have setup of one master and one DR Nagios server. We are using Nagios XI 2014R2.7 on it. From last few days we have observed that failover is happening on weekends only. This makes DR active and we have to start Nagios manually on master server.
I am observing nagios logs but unable to find the exact reason of fail over. Please confirm from which log file i can get its traces exactly and what all can cause a fail over to happen in general. Thanks.
Re: Nagios Failover Issue
Posted: Mon Feb 12, 2018 4:25 pm
by npolovenko
Hello,
@rtsupport. I'd like to get some more information before I can answer your question. How did you set up the DR server and what're the criteria for it to start working?
Could you send in your Nagios XI System Profile so I can review it?
To send us your system profile. Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and attach it to your next post. Otherwise, you could upload It to a cloud storage of your choice and share a link with me via pm. Please post something in this thread to bring it back up in the support queue.
Profile was received and shared with the support team.
Re: Nagios Failover Issue
Posted: Tue Feb 13, 2018 7:08 am
by rtsupport
The backup server checks Primary server every minute to see if the Nagios logs have been stale for 20 minutes or if there are no Nagios processes running. If either one of these cases are true for five consecutive times, then Seconday/failover will become active. I have send profile file download link on PM.
Re: Nagios Failover Issue
Posted: Tue Feb 13, 2018 4:29 pm
by npolovenko
@rtsupport, Unfortunately, the profile did not contain the log files I was looking for because the XI version is too old. I would need the /var/log/messages (
If the server crashed yesterday, then the log i need is /var/log/messages-20180212). I also need /var/log/httpd/acess_log and /var/log/httpd/error_log(
same thing: if the server crashed yesterday look for the logs with the timestamp from yesterday), and /var/log/mysqld.log. You can use WinSCP or FileZilla to extract the logs and put them in a zip file.
Re: Nagios Failover Issue
Posted: Wed Feb 14, 2018 8:15 am
by rtsupport
Last fail over happened about 10 days before and log for this is not available on server. Let me observe for this weekend and i will send you all the logs once fail over happens.
Re: Nagios Failover Issue
Posted: Wed Feb 14, 2018 9:55 am
by scottwilkerson
Thanks
Re: Nagios Failover Issue
Posted: Tue Feb 27, 2018 7:57 am
by rtsupport
Hi
I am sending logs of date when failover was happened. I am sending log file link over PM. mysqld.log was blank so i am not sending it.
Re: Nagios Failover Issue
Posted: Tue Feb 27, 2018 9:21 am
by scottwilkerson
I have the logs, however I don't know which machine I am looking at nor what time the failure occurred as there are active nagios logs all the way until the bottom.
When you DR is checking the Main machine, how is it doing that? What is it reporting that is causing the switch to happen?
Re: Nagios Failover Issue
Posted: Tue Feb 27, 2018 10:09 am
by rtsupport
The DR checks Primary server every minute to see if the Nagios logs have been stale for 20 minutes or if there are no Nagios processes running. If either one of these cases are true for five consecutive times, then Seconday/failover/DR will become active. There is a script that perform it. Let me know if you need that script.I have send you logs of master server and not of DR server.
Re: Nagios Failover Issue
Posted: Tue Feb 27, 2018 10:37 am
by scottwilkerson
What time was the failure?
Does the DR server know which was the problem (no logs, or no processes) ?