Hey guys,
I have a few questions that I ran into while patching...
When I take down my Master in the cluster (the one that I am sending all of the logs to) none of the other systems in the cluster receive logs because they are inheriting the logs from the master.
1) Will Nagios Log Server go back and get the logs it missed while it was down?
2) Is there a way for another system in the cluster to take over being the master while it's down (so I can still see current log files even while patching the original master)?
3) What is the best way for me to ensure that I don't lose logs when I am patching a server during that down time? Do you guys have a best practice for that?
Log Distribution with a cluster best practices
-
polarbear1
- Posts: 73
- Joined: Mon Apr 13, 2015 4:26 pm
Re: Log Distribution with a cluster best practices
For #1 and #3 -- Long story short, you want to set up nxlog and rsyslog to queue your messages if they are unable to send.
Rsyslog --
Check the bottom section of your files at /etc/rsyslog.d/90-nagioslogserver_*.conf and you want to make them look something like this ...
Also note the $WorkDirectory variable higher up in the config. This is where your disk buffer files will be dumped, so make sure you have enough space available. By default its going somewhere in the /var partition. It's OK to change to something else. In my case I made a folder for it on /home because that's where I have the most room to spare on my build.
Related reading:
http://www.rsyslog.com/doc/v8-stable/co ... ueues.html
NXlog -- Check your C:\Program Files (x86)\nxlog\conf\nxlog.conf files...
Add these sections after the <Extension> sections. Note the sizes are in bits, change it based on what fits your situation.
Then at the bottom, tweak your route to something like this
I know the Route logic looks a little weird, but that's because it works backwards. It will try to send out but if it can't, it will go to membuffer, if the membuffer is full it will go to diskbuffer, and if diskbuffer is full it will just start discarding.
When you reconnect, all your queues will be sent to NLS (but it may take a while to catch up). The rsyslog queues will even have the correct timestamps (it you can watch it backfill the gap in your dashboard). I am still trying to figure out how to make nxlog do this - right now it just dumps the whole queue with the current timestamp, not when the event actually happened.
As for #2, and I'm only guessing here so I'd wait for the official answer if you setup DNS round robin and point your client servers to send logs to @@MY_NLS_CLUSTER:5544 (instead of @@MY_NLS_BOX_1:5544) then if NLS_BOX_1 goes down, it should relatively transparently just go to NLS_BOX_X.
Hopefully jolson will chime in with a better answer, but this should give you something to think about for now. Cheers.
Rsyslog --
Check the bottom section of your files at /etc/rsyslog.d/90-nagioslogserver_*.conf and you want to make them look something like this ...
Code: Select all
# Forward to Nagios Log Server and then discard, otherwise these messages
# will end up in the syslog file (/var/log/messages) unless there are other
# overriding rules.
#Buffer Settings
$ActionResumeInterval 10
$ActionQueueSize 100000
$ActionQueueDiscardMark 97500
$ActionQueueHighWaterMark 80000
$ActionQueueType LinkedList
$ActionQueueFileName each_queue_should_have_a_unique_queue_file_name
$ActionQueueCheckpointInterval 100
$ActionQueueMaxDiskSpace 500m
$ActionResumeRetryCount -1
$ActionQueueSaveOnShutdown on
$ActionQueueTimeoutEnqueue 0
$ActionQueueDiscardSeverity 0
if $programname == 'my_log_file' then @@my_NLS_server:5544
if $programname == 'my_log_file' then ~
Related reading:
http://www.rsyslog.com/doc/v8-stable/co ... ueues.html
NXlog -- Check your C:\Program Files (x86)\nxlog\conf\nxlog.conf files...
Add these sections after the <Extension> sections. Note the sizes are in bits, change it based on what fits your situation.
Code: Select all
<Processor membuffer>
Module pm_buffer
MaxSize 512000
Type Mem
</Processor>
<Processor diskbuffer>
Module pm_buffer
MaxSize 5242880
Type Disk
File "C:\My\Working\Directory"
WarnLimit 3932160
</Processor>Code: Select all
<Route 1>
Path my_file_1, my_file_2 => diskbuffer => membuffer => out
</Route>When you reconnect, all your queues will be sent to NLS (but it may take a while to catch up). The rsyslog queues will even have the correct timestamps (it you can watch it backfill the gap in your dashboard). I am still trying to figure out how to make nxlog do this - right now it just dumps the whole queue with the current timestamp, not when the event actually happened.
As for #2, and I'm only guessing here so I'd wait for the official answer if you setup DNS round robin and point your client servers to send logs to @@MY_NLS_CLUSTER:5544 (instead of @@MY_NLS_BOX_1:5544) then if NLS_BOX_1 goes down, it should relatively transparently just go to NLS_BOX_X.
Hopefully jolson will chime in with a better answer, but this should give you something to think about for now. Cheers.
Re: Log Distribution with a cluster best practices
polarbear1,
Thanks for taking the time to write out all of this - it's very helpful.
I just wanted to add some information regarding question #2.
That being said, the way to get around this to either use a hardware load balancer (F5 or similar) or DNS Round Robin balancing. There are pros and cons to each of these methods, and a great discussion took place here: https://support.nagios.com/forum/viewto ... 38&t=33005 (It's on the customer only forum).
Thanks for taking the time to write out all of this - it's very helpful.
I just wanted to add some information regarding question #2.
Being the master instance of a cluster does not have any attachment to how logs are received by the cluster. You can send your logs to any instance (master or non master) and the logs will be processed the same.2) Is there a way for another system in the cluster to take over being the master while it's down (so I can still see current log files even while patching the original master)?
That being said, the way to get around this to either use a hardware load balancer (F5 or similar) or DNS Round Robin balancing. There are pros and cons to each of these methods, and a great discussion took place here: https://support.nagios.com/forum/viewto ... 38&t=33005 (It's on the customer only forum).
Re: Log Distribution with a cluster best practices
Thanks guys this should be enough to get me rolling with a solution.

Re: Log Distribution with a cluster best practices
Great! I'll be closing this thread now, but feel free to open another if you need anything in the future!
And thanks again to @polarbear1!
And thanks again to @polarbear1!
Former Nagios employee
Re: Log Distribution with a cluster best practices
Sounds good! I'll lock the thread.