Page 1 of 1
Nagios Log Server HA setup
Posted: Thu Oct 03, 2024 4:19 am
by erroltanner
Hello,
I have spun up 2 servers and installed Nagios log server on each and configured the cluster. Now this is where things fall apart, when configuring a client with our log server we run the commands to tell the server what logs to send to Nagios Log server and the ip address of the log server.
Say we setup the client to point to host 1 and host 1 dies the client will not know to start sending logs to host 2, am I missing something here?
Should we be setting up a load balancer to direct traffic to the 2 hosts?
Am i missing the whole point?
Re: Nagios Log Server HA setup
Posted: Thu Oct 03, 2024 9:25 am
by DoubleDoubleA
Hi @erroltanner,
Definitely it is wise to consider these questions. You are correct that we point all the logs to the cluster leader, which directs the load balancing for the cluster. If the cluster leader goes down, where do the logs go, and what happens to the cluster?
If the cluster leader dies, the cluster has a mechanism to choose a new cluster leader. But of course it will have a different IP address. In a 2-instance cluster, it would be easy to have a load balancer notice and point logs to the other one. In a 3- or more-instance cluster, the load balancer wouldn't know which to choose to send to.
And even so, putting in a load balancer only shifts the point of failure from the cluster leader to the load balancer itself. If the load balancer goes down, the logs don't get to the cluster either.
Log Server ships with the NCPA monitoring agent on it, so it is simple to set up monitoring for your Log Server instances through Nagios XI, so you'll know if there are problems with an instance. And if you had a load balancer, you'd want to monitor that as well.
Hopefully this context is helpful.
Aaron
Re: Nagios Log Server HA setup
Posted: Thu Oct 03, 2024 1:55 pm
by jmichaelson
I wonder if this isn't something that can be worked around by having a DNS A record that resolves to all the log server instances with a short TTL, so that if the sender loses its connection it would have to re-resolve the IP addresses and pick a new one. Something like:
Code: Select all
logingester 300 IN A 10.0.0.1
300 IN A 10.0.0.2
300 IN A 10.0.0.3
The 300 would provide a 5 minute expiration on how long a local resolver is allowed to cache the record so it might work. adjust the cache time to your liking.