Page 1 of 3

backups not functioning after update to 1.4.0

Posted: Thu Feb 04, 2016 7:57 am
by krobertson71
I found the other thread where the old backups were showing as n/a. I created an 'oldbackups' dir and copied everything into it.

I then set the back up job to run about 1 hour in the future last night. Still nothing in the backup snapshots. Screenshot below.
nlsbackups-1.png
I have verified that nagios can write to that directory by touching a file and then deleting it.

Re: backups not functioning after update to 1.4.0

Posted: Thu Feb 04, 2016 3:11 pm
by jolson
Try running the following on your command line:

Code: Select all

which curator
ls -l /usr/lib/python2.6/site-packages/curator/curator.py
curator --help
curator snapshot --repository nlsback indices --prefix logstash
I'm interested in seeing the error output. Thanks!

Re: backups not functioning after update to 1.4.0

Posted: Thu Feb 04, 2016 3:22 pm
by krobertson71
Here you go

Code: Select all

[nagios@nagilgp01 ~]$ which curator
/usr/bin/curator

Code: Select all

[nagios@nagilgp01 ~]$ ls -l /usr/lib/python2.6/site-packages/curator/curator.py
-rw-r--r-- 1 root root 80 Feb  2 11:49 /usr/lib/python2.6/site-packages/curator/curator.py

Code: Select all

[nagios@nagilgp01 ~]$ curator --help
Usage: curator [OPTIONS] COMMAND [ARGS]...

  Curator for Elasticsearch indices.

  See http://elastic.co/guide/en/elasticsearch/client/curator/current

Options:
  --host TEXT         Elasticsearch host.
  --url_prefix TEXT   Elasticsearch http url prefix.
  --port INTEGER      Elasticsearch port.
  --use_ssl           Connect to Elasticsearch through SSL.
  --certificate TEXT  Path to certificate to use for SSL validation.
                      (OPTIONAL)
  --ssl-no-validate   Do not validate SSL certificate
  --http_auth TEXT    Use Basic Authentication ex: user:pass
  --timeout INTEGER   Connection timeout in seconds.
  --master-only       Only operate on elected master node.
  --dry-run           Do not perform any changes.
  --debug             Debug mode
  --loglevel TEXT     Log level
  --logfile TEXT      log file
  --logformat TEXT    Log output format [default|logstash].
  --quiet             Suppress command-line output.
  --version           Show the version and exit.
  --help              Show this message and exit.

Commands:
  alias       Index Aliasing
  allocation  Index Allocation
  bloom       Disable bloom filter cache
  close       Close indices
  delete      Delete indices or snapshots
  open        Open indices
  optimize    Optimize Indices
  replicas    Replica Count Per-shard
  seal        Seal indices (Synced flush: ES 1.6.0+ only)
  show        Show indices or snapshots
  snapshot    Take snapshots of indices (Backup)

Code: Select all

[nagios@nagilgp01 ~]$ curator snapshot --repository nlsback indices --prefix logstash
2016-02-04 15:21:02,522 INFO      Job starting: snapshot indices
2016-02-04 15:21:02,522 WARNING   Overriding default connection timeout.  New timeout: 21600
2016-02-04 15:21:02,581 INFO      Action snapshot will be performed on the following indices: [u'logstash-2016.01.26', u'logstash-2016.01.27', u'logstash-2016.01.28', u'logstash-2016.01.29', u'logstash-2016.01.30', u'logstash-2016.01.31', u'logstash-2016.02.01', u'logstash-2016.02.02', u'logstash-2016.02.03', u'logstash-2016.02.04', u'logstash-2016.02.05']
2016-02-04 15:21:02,905 ERROR     Failed to verify all nodes have repository access.
2016-02-04 15:21:02,906 WARNING   Job did not complete successfully.
That last one seems to be the issue. Never had a problem with this before. We just had the "old version" backup issues with partials etc..

Re: backups not functioning after update to 1.4.0

Posted: Thu Feb 04, 2016 7:55 pm
by krobertson71
Feel like a moron. Been busy today and just saw that I did not put in the name of my repository. No difference.. Still the same error.

The /backups directory is owned by nagios:nagios. Group nagios contains apache and nagios.

Code: Select all

[nagios@nagilgp01 ~]$ curator snapshot --repository backups indices --prefix logstash
2016-02-04 19:51:43,604 INFO      Job starting: snapshot indices
2016-02-04 19:51:43,605 WARNING   Overriding default connection timeout.  New timeout: 21600
2016-02-04 19:51:43,627 INFO      Action snapshot will be performed on the following indices: [u'logstash-2016.01.26', u'logstash-2016.01.27', u'logstash-2016.01.28', u'logstash-2016.01.29', u'logstash-2016.01.30', u'logstash-2016.01.31', u'logstash-2016.02.01', u'logstash-2016.02.02', u'logstash-2016.02.03', u'logstash-2016.02.04', u'logstash-2016.02.05']
2016-02-04 19:51:43,987 ERROR     Failed to verify all nodes have repository access.
2016-02-04 19:51:43,988 WARNING   Job did not complete successfully.

Re: backups not functioning after update to 1.4.0

Posted: Thu Feb 04, 2016 9:12 pm
by Box293
Can I get you to confirm that /backups is a NFS share mounted on all log server nodes (or other similar common share).

Re: backups not functioning after update to 1.4.0

Posted: Fri Feb 05, 2016 10:32 am
by krobertson71
I can confirm they are not, but they were not in the past. Each node has it's own /backups directory. In the past each node would backup to it's local /backup directory.

Re: backups not functioning after update to 1.4.0

Posted: Fri Feb 05, 2016 2:00 pm
by hsmith
Have you verified the permissions on the "/backup" directory across all of the nodes?

Re: backups not functioning after update to 1.4.0

Posted: Fri Feb 05, 2016 4:45 pm
by krobertson71
yes they are both the same

node 1

Code: Select all

drwxrwxr-x    2 nagios nagios  4096 Feb  4 22:40 backups
nagios group has 'apache' and 'nagios' as members

node 2

Code: Select all

drwxrwxr-x    3 nagios nagios  4096 Feb  4 22:42 backups
same as above

Re: backups not functioning after update to 1.4.0

Posted: Fri Feb 05, 2016 4:57 pm
by krobertson71
also drive space has plenty of room

Re: backups not functioning after update to 1.4.0

Posted: Sun Feb 07, 2016 8:05 pm
by Box293
Box293 wrote:Can I get you to confirm that /backups is a NFS share mounted on all log server nodes (or other similar common share).
krobertson71 wrote:I can confirm they are not, but they were not in the past. Each node has it's own /backups directory. In the past each node would backup to it's local /backup directory.
This is the source of your problem.

Here is the information detailing this when you go to add a backup repository:
Screenshot.png
While it may have worked in previous versions, you need to correct this so it is a shared repository. The backup process is executed on ONE of the nodes, not every one of the nodes. Hence all nodes need access to a share repository.