Redundancy or load balancing on log server

gormank · Post by **gormank** » Wed Nov 16, 2016 3:14 pm

Hi,
I'm looking for a solution to the problem (as I understand it) of redundancy for log server in that if I'm sending data to a single address, say node1, and it dies, my logs go nowhere and are lost. Here's a discussion on the topic: https://support.nagios.com/forum/search ... 8&start=30

I've recently learned that there won't be an LTM in the subnet/VLAN where the log servers will exist.

What's the solution?

rkennedy · Post by **rkennedy** » Wed Nov 16, 2016 4:06 pm

There are a couple options for this. Without a FTM (google tells me F5 Local Traffic Manager), there are a couple routes.
- Round robin DNS - though, this will not help out in a completely 'down' state. Some companies have 'smart' DNS options though, so this might work.
- A free load balancer, for example I used HA proxy, and set that to push data to both members of my machine. (all clients forward to haproxy:3031, haproxy then forwards to either machine on ip:3031)

gormank · Post by **gormank** » Wed Nov 16, 2016 5:26 pm

Yeah,
That's what I got from the other thread I read with balance in the title... I was hoping that over time a better, more useful solution out of the box.
Another idea that was presented was RHEL clustering.

rkennedy · Post by **rkennedy** » Thu Nov 17, 2016 10:14 am

How does RHEL clustering work?

Keep in mind, most of the agents will have a backlog to send logs in if they cannot connect to x port.

gormank · Post by **gormank** » Thu Nov 17, 2016 11:04 am

I'm not certain, but it looks like old school clustering in that it would have a package, or floating IP riding on an interface on the primary (h1) box. If the primary stops listening on x port, or the host IP or package IP becomes unreachable, the package IP is activated on the secondary/failover (h2) box. Logs are sent to the package IP so if there's a failure, logs are cached until failover. With a floating address, the risk is that it will become reachable again on h1, or can't be removed from h1. The usual approach is to (forcibly) halt h1, but I'd prefer to avoid that, since it risks breaking the DB, and disables data redundancy. There would be a script to ifdown/ifup the package IP on the hosts.

There's also a possibility on at least some systems to have a secondary address/host to send logs to in the event the primary is unreachable. I think this works on Linux, but haven't looked too far into it, or verified Windows or hardware. See https://access.redhat.com/solutions/59705

I'm trying to provide a redundant path for log collection, and make failovers automatic. The point is while the received data is dispersed across multiple hosts, receiving the data isn't redundant, which seems a design flaw.

I can't say either of the above solutions are practical, just that they're theories I'm investigating.

Post by **mcapra** » Thu Nov 17, 2016 1:32 pm

gormank wrote:The point is while the received data is dispersed across multiple hosts, receiving the data isn't redundant, which seems a design flaw.

I tend to agree, but the solution is hardly trivial in this case for an out-of-the-box solution. One possible solution that we could bake into our rsyslog/nxlog configs is the redhat article you mentioned (https://access.redhat.com/solutions/59705). This is a fairly graceful way to configure failovers for logstash at the agent level (though not a solution for load balancing).

gormank wrote:The usual approach is to (forcibly) halt h1, but I'd prefer to avoid that, since it risks breaking the DB, and disables data redundancy.

With elasticsearch specifically, "breaking" the DB is a bit less of a concern in this case. You definitely run the risk of split-brain though depending on the nature of the outage which can be a pain to resolve.

gormank · Post by **gormank** » Thu Nov 17, 2016 1:52 pm

Trivial seems a bit relative. My company is paying Nagios $10k or more a year for this product and support. It claims to be a redundant cluster, yet I find even before installing it that its not redundant out of the box. I need additional load balancers, which also need to be redundant, and resolve the problem myself. This is hardly and enterprise solution. I assume others are paying for NLS as well. Rant mode off.

I like the simple solution of having the host agents/clients decide to send to an alternate address as well. Simple.

As far as split brain, you mean 2 boxes receiving streams of the same log data and having duplicates? Or the 2 hosts w/ the same IP issue?

Thanks

Post by **mcapra** » Thu Nov 17, 2016 4:54 pm

gormank wrote:I like the simple solution of having the host agents/clients decide to send to an alternate address as well. Simple.

I've brought this up with the developers. I agree that offering a simple solution via the agents themselves might be preferable since most people copy+paste the configs we provide for initial setup anyway.

gormank wrote:As far as split brain, you mean 2 boxes receiving streams of the same log data and having duplicates?

That's one possible outcome. The worse outcome would be a case where both nodes elect themselves master (due to some network-related outages in most cases) and both receive events without properly reconciling the data. Then you'd have different working data sets on each node. Very messy to clean up.

gormank · Post by **gormank** » Thu Nov 17, 2016 5:53 pm

Thanks for making the request.

Well, is that split brain issue possible w/ NLS, or are you speaking of the clustering that I'm thinking of? Linux clustering will halt the server if it has to.

Post by **mcapra** » Thu Nov 17, 2016 5:58 pm

I'm speaking in terms of the clustering done by NLS on the back-end. Specifically within elasticsearch.

Nagios Support Forum

Redundancy or load balancing on log server

Redundancy or load balancing on log server

Re: Redundancy or load balancing on log server

Re: Redundancy or load balancing on log server

Re: Redundancy or load balancing on log server

Re: Redundancy or load balancing on log server

Re: Redundancy or load balancing on log server

Re: Redundancy or load balancing on log server

Re: Redundancy or load balancing on log server

Re: Redundancy or load balancing on log server

Re: Redundancy or load balancing on log server