Redundancy or load balancing on log server

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Redundancy or load balancing on log server

Post by gormank »

Hi,
I'm looking for a solution to the problem (as I understand it) of redundancy for log server in that if I'm sending data to a single address, say node1, and it dies, my logs go nowhere and are lost. Here's a discussion on the topic: https://support.nagios.com/forum/search ... 8&start=30

I've recently learned that there won't be an LTM in the subnet/VLAN where the log servers will exist.

What's the solution?
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Redundancy or load balancing on log server

Post by rkennedy »

There are a couple options for this. Without a FTM (google tells me F5 Local Traffic Manager), there are a couple routes.
- Round robin DNS - though, this will not help out in a completely 'down' state. Some companies have 'smart' DNS options though, so this might work.
- A free load balancer, for example I used HA proxy, and set that to push data to both members of my machine. (all clients forward to haproxy:3031, haproxy then forwards to either machine on ip:3031)
Former Nagios Employee
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: Redundancy or load balancing on log server

Post by gormank »

Yeah,
That's what I got from the other thread I read with balance in the title... I was hoping that over time a better, more useful solution out of the box.
Another idea that was presented was RHEL clustering.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Redundancy or load balancing on log server

Post by rkennedy »

How does RHEL clustering work?

Keep in mind, most of the agents will have a backlog to send logs in if they cannot connect to x port.
Former Nagios Employee
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: Redundancy or load balancing on log server

Post by gormank »

I'm not certain, but it looks like old school clustering in that it would have a package, or floating IP riding on an interface on the primary (h1) box. If the primary stops listening on x port, or the host IP or package IP becomes unreachable, the package IP is activated on the secondary/failover (h2) box. Logs are sent to the package IP so if there's a failure, logs are cached until failover. With a floating address, the risk is that it will become reachable again on h1, or can't be removed from h1. The usual approach is to (forcibly) halt h1, but I'd prefer to avoid that, since it risks breaking the DB, and disables data redundancy. There would be a script to ifdown/ifup the package IP on the hosts.

There's also a possibility on at least some systems to have a secondary address/host to send logs to in the event the primary is unreachable. I think this works on Linux, but haven't looked too far into it, or verified Windows or hardware. See https://access.redhat.com/solutions/59705

I'm trying to provide a redundant path for log collection, and make failovers automatic. The point is while the received data is dispersed across multiple hosts, receiving the data isn't redundant, which seems a design flaw.

I can't say either of the above solutions are practical, just that they're theories I'm investigating.
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Redundancy or load balancing on log server

Post by mcapra »

gormank wrote:The point is while the received data is dispersed across multiple hosts, receiving the data isn't redundant, which seems a design flaw.
I tend to agree, but the solution is hardly trivial in this case for an out-of-the-box solution. One possible solution that we could bake into our rsyslog/nxlog configs is the redhat article you mentioned (https://access.redhat.com/solutions/59705). This is a fairly graceful way to configure failovers for logstash at the agent level (though not a solution for load balancing).
gormank wrote:The usual approach is to (forcibly) halt h1, but I'd prefer to avoid that, since it risks breaking the DB, and disables data redundancy.
With elasticsearch specifically, "breaking" the DB is a bit less of a concern in this case. You definitely run the risk of split-brain though depending on the nature of the outage which can be a pain to resolve.
Former Nagios employee
https://www.mcapra.com/
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: Redundancy or load balancing on log server

Post by gormank »

Trivial seems a bit relative. My company is paying Nagios $10k or more a year for this product and support. It claims to be a redundant cluster, yet I find even before installing it that its not redundant out of the box. I need additional load balancers, which also need to be redundant, and resolve the problem myself. This is hardly and enterprise solution. I assume others are paying for NLS as well. Rant mode off.

I like the simple solution of having the host agents/clients decide to send to an alternate address as well. Simple.

As far as split brain, you mean 2 boxes receiving streams of the same log data and having duplicates? Or the 2 hosts w/ the same IP issue?

Thanks
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Redundancy or load balancing on log server

Post by mcapra »

gormank wrote:I like the simple solution of having the host agents/clients decide to send to an alternate address as well. Simple.
I've brought this up with the developers. I agree that offering a simple solution via the agents themselves might be preferable since most people copy+paste the configs we provide for initial setup anyway.
gormank wrote:As far as split brain, you mean 2 boxes receiving streams of the same log data and having duplicates?
That's one possible outcome. The worse outcome would be a case where both nodes elect themselves master (due to some network-related outages in most cases) and both receive events without properly reconciling the data. Then you'd have different working data sets on each node. Very messy to clean up.
Former Nagios employee
https://www.mcapra.com/
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: Redundancy or load balancing on log server

Post by gormank »

Thanks for making the request.

Well, is that split brain issue possible w/ NLS, or are you speaking of the clustering that I'm thinking of? Linux clustering will halt the server if it has to.
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Redundancy or load balancing on log server

Post by mcapra »

I'm speaking in terms of the clustering done by NLS on the back-end. Specifically within elasticsearch.
Former Nagios employee
https://www.mcapra.com/
Locked