Log Distribution with a cluster best practices

2evanowen · Post by **2evanowen** » Thu Jul 16, 2015 6:31 pm

Hey guys,
I have a few questions that I ran into while patching...

When I take down my Master in the cluster (the one that I am sending all of the logs to) none of the other systems in the cluster receive logs because they are inheriting the logs from the master.
1) Will Nagios Log Server go back and get the logs it missed while it was down?
2) Is there a way for another system in the cluster to take over being the master while it's down (so I can still see current log files even while patching the original master)?
3) What is the best way for me to ensure that I don't lose logs when I am patching a server during that down time? Do you guys have a best practice for that?

polarbear1 · Post by **polarbear1** » Fri Jul 17, 2015 9:52 am

For #1 and #3 -- Long story short, you want to set up nxlog and rsyslog to queue your messages if they are unable to send.

Rsyslog --

Check the bottom section of your files at /etc/rsyslog.d/90-nagioslogserver_*.conf and you want to make them look something like this ...

Code: Select all

# Forward to Nagios Log Server and then discard, otherwise these messages
# will end up in the syslog file (/var/log/messages) unless there are other
# overriding rules.
#Buffer Settings
$ActionResumeInterval 10
$ActionQueueSize 100000
$ActionQueueDiscardMark 97500
$ActionQueueHighWaterMark 80000
$ActionQueueType LinkedList
$ActionQueueFileName each_queue_should_have_a_unique_queue_file_name
$ActionQueueCheckpointInterval 100
$ActionQueueMaxDiskSpace 500m
$ActionResumeRetryCount -1
$ActionQueueSaveOnShutdown on
$ActionQueueTimeoutEnqueue 0
$ActionQueueDiscardSeverity 0
if $programname == 'my_log_file' then @@my_NLS_server:5544
if $programname == 'my_log_file' then ~

Also note the $WorkDirectory variable higher up in the config. This is where your disk buffer files will be dumped, so make sure you have enough space available. By default its going somewhere in the /var partition. It's OK to change to something else. In my case I made a folder for it on /home because that's where I have the most room to spare on my build.

Related reading:
http://www.rsyslog.com/doc/v8-stable/co ... ueues.html

NXlog -- Check your C:\Program Files (x86)\nxlog\conf\nxlog.conf files...

Add these sections after the <Extension> sections. Note the sizes are in bits, change it based on what fits your situation.

Code: Select all

<Processor membuffer>
    Module  pm_buffer
    MaxSize 512000
    Type    Mem
</Processor>

<Processor diskbuffer>
    Module  pm_buffer
    MaxSize 5242880
    Type    Disk
    File    "C:\My\Working\Directory"
    WarnLimit   3932160
</Processor>

Then at the bottom, tweak your route to something like this

Code: Select all

<Route 1>
    Path my_file_1, my_file_2 => diskbuffer => membuffer => out
</Route>

I know the Route logic looks a little weird, but that's because it works backwards. It will try to send out but if it can't, it will go to membuffer, if the membuffer is full it will go to diskbuffer, and if diskbuffer is full it will just start discarding.

When you reconnect, all your queues will be sent to NLS (but it may take a while to catch up). The rsyslog queues will even have the correct timestamps (it you can watch it backfill the gap in your dashboard). I am still trying to figure out how to make nxlog do this - right now it just dumps the whole queue with the current timestamp, not when the event actually happened.

As for #2, and I'm only guessing here so I'd wait for the official answer if you setup DNS round robin and point your client servers to send logs to @@MY_NLS_CLUSTER:5544 (instead of @@MY_NLS_BOX_1:5544) then if NLS_BOX_1 goes down, it should relatively transparently just go to NLS_BOX_X.

Hopefully jolson will chime in with a better answer, but this should give you something to think about for now. Cheers.

jolson · Post by **jolson** » Fri Jul 17, 2015 10:11 am

polarbear1,

Thanks for taking the time to write out all of this - it's very helpful.

I just wanted to add some information regarding question #2.

2) Is there a way for another system in the cluster to take over being the master while it's down (so I can still see current log files even while patching the original master)?

Being the master instance of a cluster does not have any attachment to how logs are received by the cluster. You can send your logs to any instance (master or non master) and the logs will be processed the same.

That being said, the way to get around this to either use a hardware load balancer (F5 or similar) or DNS Round Robin balancing. There are pros and cons to each of these methods, and a great discussion took place here: https://support.nagios.com/forum/viewto ... 38&t=33005 (It's on the customer only forum).

2evanowen · Post by **2evanowen** » Fri Jul 17, 2015 5:25 pm

Thanks guys this should be enough to get me rolling with a solution.

tmcdonald · Post by **tmcdonald** » Mon Jul 20, 2015 9:29 am

Great! I'll be closing this thread now, but feel free to open another if you need anything in the future!

And thanks again to @polarbear1!

jolson · Post by **jolson** » Mon Jul 20, 2015 9:33 am

Sounds good! I'll lock the thread.

Nagios Support Forum

Log Distribution with a cluster best practices

Log Distribution with a cluster best practices

Re: Log Distribution with a cluster best practices

Re: Log Distribution with a cluster best practices

Re: Log Distribution with a cluster best practices

Re: Log Distribution with a cluster best practices

Re: Log Distribution with a cluster best practices