Page 1 of 1

NLS Backups not ending / Working

Posted: Thu Jun 30, 2016 11:48 am
by keithmcm
Hi there

We have recently enabled backups via the maintenance schedule on our Nagios Log Server. The backup repository is an NFS mount that the Nagios.Nagios user and group have RW access to. Our backup maintainence job never seems to end however (status always running) I am not seeing any errors in the jobs.log file, but I am seeing it start if I manually reset all the jobs and then kick it to life by setting the next run time to now.
CommandSubsystemNagiosLogServer.jpg
Our backup config looks as below:
BackupMaintenanceNagioslogServer.jpg
Any advice on how to diagnose would be helpful - our aim is to backup this server and then recover this data to a considerably larger physical server. Not sure what to look for, but can see that no data is being written to the backup repository.

thanks
Keith

Re: NLS Backups not ending / Working

Posted: Thu Jun 30, 2016 11:55 am
by hsmith
What's the output of df -h on your logserver?

How much space is available on your backup repository?

Can you su to the 'nagios' user and attempt to enter the backup directory and touch and rm a file?

Re: NLS Backups not ending / Working

Posted: Thu Jun 30, 2016 2:05 pm
by keithmcm
hsmith wrote:What's the output of df -h on your logserver?

[root@ct-nagiosls ~]# df -h
Filesystem Size Used Avail Use% Mounted on
rootfs 99G 6.6G 91G 7% /
devtmpfs 48G 236K 48G 1% /dev
tmpfs 48G 0 48G 0% /dev/shm
/dev/sda1 99G 6.6G 91G 7% /
/dev/sdb1 493G 423G 70G 86% /data
/dev/sdc1 394G 324G 71G 83% /data2
/dev/sdd1 394G 324G 70G 83% /data3
/dev/sde1 394G 324G 71G 83% /data4
/dev/sdf1 394G 324G 70G 83% /data5
/dev/sdh1 493G 398G 70G 86% /data6
/dev/sdi1 493G 398G 71G 86% /data7
/dev/sdj1 493G 398G 70G 86% /data8
172.25.0.39:/ct-nagiosls
22T 17G 22T 1% /backupnas

Not always had this much free space, we ran out about 6 months ago and I was forced to delete several indexes to restore operation before even being able to add additional storage.


How much space is available on your backup repository?

> 22TB - way more than we have in use on the current /data partitions.

Can you su to the 'nagios' user and attempt to enter the backup directory and touch and rm a file?
Yep, tested that before posting, but for clarity below:

[root@ct-nagiosls ~]# su - nagios
[nagios@ct-nagiosls ~]$ cd /backupnas/vmserver/
[nagios@ct-nagiosls vmserver]$ touch test
[nagios@ct-nagiosls vmserver]$ ls -al test
-rw-r--r-- 1 nagios users 0 Jun 30 20:01 test
[nagios@ct-nagiosls vmserver]$ rm test
[nagios@ct-nagiosls vmserver]$ ls -lad .
drwxrwxr-x 2 nagios nagios 4096 Jun 30 20:01 .
[nagios@ct-nagiosls vmserver]$ pwd
/backupnas/vmserver
[nagios@ct-nagiosls vmserver]$

Re: NLS Backups not ending / Working

Posted: Thu Jun 30, 2016 2:52 pm
by hsmith
Can you su to nagios, and then issue the following command?

Code: Select all

curator snapshot --repository VMBackup indices --all-indices

Re: NLS Backups not ending / Working

Posted: Fri Jul 01, 2016 11:23 am
by keithmcm
The command ran for several hours before exiting with an error:

[nagios@ct-nagiosls ~]$ curator snapshot --repository VMBackup indices --all-indices
2016-06-30 20:55:11,834 INFO Job starting: snapshot indices
2016-06-30 20:55:11,834 WARNING Overriding default connection timeout. New timeout: 21600
2016-06-30 20:55:11,977 INFO Matching all indices. Ignoring flags other than --exclude.
2016-06-30 20:55:11,977 INFO Action snapshot will be performed on the following indices: [u'kibana-int', u'logstash-2015.11.01', u'logstash-2015.11.02', u'logstash-2015.11.03'

Snip

u'logstash-2016.06.30', u'nagioslogserver', u'nagioslogserver_log']
2016-06-30 20:57:57,991 INFO Snapshot name: curator-20160630195757
2016-07-01 02:57:58,234 ERROR Client raised a TransportError.
2016-07-01 02:57:58,235 WARNING Job did not complete successfully.

Guessing that the timeout was insufficient given the volume of data, so I'm now running a backup per day, scripted as follows:

[nagios@ct-nagiosls ~]$ cat 2015.12 | xargs -i curator snapshot --repository VMBackup indices --index logstash-{} > backup.log

where 2015.12 looks like:

2015.12.01
2015.12.02
2015.12.03
2015.12.04
2015.12.05
2015.12.06

This seems to be running successfully, but has several months still to go..

Re: NLS Backups not ending / Working

Posted: Tue Jul 05, 2016 9:55 am
by hsmith
We just got back from a 4 day weekend - how's the backup process going?