For #1 and #3 -- Long story short, you want to set up nxlog and rsyslog to queue your messages if they are unable to send.
Rsyslog --
Check the bottom section of your files at /etc/rsyslog.d/90-nagioslogserver_*.conf and you want to make them look something like this ...
Code: Select all
# Forward to Nagios Log Server and then discard, otherwise these messages
# will end up in the syslog file (/var/log/messages) unless there are other
# overriding rules.
#Buffer Settings
$ActionResumeInterval 10
$ActionQueueSize 100000
$ActionQueueDiscardMark 97500
$ActionQueueHighWaterMark 80000
$ActionQueueType LinkedList
$ActionQueueFileName each_queue_should_have_a_unique_queue_file_name
$ActionQueueCheckpointInterval 100
$ActionQueueMaxDiskSpace 500m
$ActionResumeRetryCount -1
$ActionQueueSaveOnShutdown on
$ActionQueueTimeoutEnqueue 0
$ActionQueueDiscardSeverity 0
if $programname == 'my_log_file' then @@my_NLS_server:5544
if $programname == 'my_log_file' then ~
Also note the $WorkDirectory variable higher up in the config. This is where your disk buffer files will be dumped, so make sure you have enough space available. By default its going somewhere in the /var partition. It's OK to change to something else. In my case I made a folder for it on /home because that's where I have the most room to spare on my build.
Related reading:
http://www.rsyslog.com/doc/v8-stable/co ... ueues.html
NXlog -- Check your C:\Program Files (x86)\nxlog\conf\nxlog.conf files...
Add these sections after the <Extension> sections. Note the sizes are in bits, change it based on what fits your situation.
Code: Select all
<Processor membuffer>
Module pm_buffer
MaxSize 512000
Type Mem
</Processor>
<Processor diskbuffer>
Module pm_buffer
MaxSize 5242880
Type Disk
File "C:\My\Working\Directory"
WarnLimit 3932160
</Processor>
Then at the bottom, tweak your route to something like this
Code: Select all
<Route 1>
Path my_file_1, my_file_2 => diskbuffer => membuffer => out
</Route>
I know the Route logic looks a little weird, but that's because it works backwards. It will try to send out but if it can't, it will go to membuffer, if the membuffer is full it will go to diskbuffer, and if diskbuffer is full it will just start discarding.
When you reconnect, all your queues will be sent to NLS (but it may take a while to catch up). The rsyslog queues will even have the correct timestamps (it you can watch it backfill the gap in your dashboard). I am still trying to figure out how to make nxlog do this - right now it just dumps the whole queue with the current timestamp, not when the event actually happened.
As for #2, and I'm only guessing here so I'd wait for the official answer if you setup DNS round robin and point your client servers to send logs to @@MY_NLS_CLUSTER:5544 (instead of @@MY_NLS_BOX_1:5544) then if NLS_BOX_1 goes down, it should relatively transparently just go to NLS_BOX_X.
Hopefully jolson will chime in with a better answer, but this should give you something to think about for now. Cheers.