daily system backup getting out of control

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
Post Reply
CBoekhuis
Posts: 214
Joined: Tue Aug 16, 2011 4:55 am

daily system backup getting out of control

Post by CBoekhuis »

Hi,

Hope you can help on this one, the daily system backup is getting larger by the day. At this moment each node produces a 8,5GB tar.gz and obviously takes hours to complete.
The real problem is that more often the backups will fail for all kinds of reasons like "Waiting for available slot." messages, or worse when "{"acknowledged":true,"persistent":{},"transient":{"plugin":{"knapsack":{"export":{"state":"[]"}}}}}" messages appear in the /tmp/backup.log.
The worst one is when it wil only produce a 26303 byte tar.gz which is broken and empty. At that point I will have to restart elasticsearch to get it to function again.

Reading in the manual, within the system backup amongst dashboards, etc. also de audit log is saved. That gave me the thought "what is the retention of the audit log?". Escpecially since we have the "save user query to audit" on a swell.
Digging through the audit log, I ended up somewhere in 2016 when we started our cluster. That might explain why the backup is so large and always growing.

Question is, is there no retention on de audit log (or other logging saved in the system backup)? If not, how can I set a retention or at least clear out some old data. Unless something else is going, but I would appreciate some help. No backup is never good ;) .

Nagios log server version is 2.1.15 on CentOS 7.9

King Regards,
Hans Blom
User avatar
swolf
Developer
Posts: 302
Joined: Tue Jun 06, 2017 9:48 am

Re: daily system backup getting out of control

Post by swolf »

Hi @CBoekhuis, thanks for reaching out.

I agree with you that there 1) isn't proper retention configuration for the NLS audit log, and 2) there should be. I've filed a feature request on your behalf.

To handle the immediate situation, I would determine how much data you want to keep, and find an approximate timestamp (unix epoch in milliseconds) that corresponds to the oldest record you'd like to keep.

Once you're absolutely sure you have the right time, I would take a VM- or server-level backup/snapshot, then run this query on your terminal, replacing my timestamp with yours:

Code: Select all

curl -XDELETE 'localhost:9200/nagioslogserver_log/_query' -d '{ "query": { "range" : { "created": { "lte": 1692900851854 } } } }' 
Please be very careful, as we don't have a defined process for fixing a mistake here.

Hopefully that helps - please let me know if you have any further questions or concerns.

-Sebastian Wolf
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy
CBoekhuis
Posts: 214
Joined: Tue Aug 16, 2011 4:55 am

Re: daily system backup getting out of control

Post by CBoekhuis »

Hi Sebastian,

thank you for your help! I just tested this on our test cluster and that works. Now I'll gradually reduce the audit log on our production cluster.
Looking forward to seeing this as a feature in a future release. Can you close this topic?

Greetings....Hans
CBoekhuis
Posts: 214
Joined: Tue Aug 16, 2011 4:55 am

Re: daily system backup getting out of control

Post by CBoekhuis »

I was a little to fast. Turns out that on our production cluster the big chunk of data is in the nagioslogserver_history indice. It's 47,5GB large.
I take it that I can use the same command that you provided except change the nagioslogserver_log for nagioslogserver_history?

Maybe the feature request should include this indice as well ?

Thanks!
CBoekhuis
Posts: 214
Joined: Tue Aug 16, 2011 4:55 am

Re: daily system backup getting out of control

Post by CBoekhuis »

In the mean time I cleared out the entire nagioslogserver_history indice.
I don't know if there's a retention set to this indice, but just like the case with the audit log, it would be a valuable feature if this is a configurable option as well.

Kind Regards...Hans
Post Reply