I have a 3 node NLS cluster running the latest version.
The other day we had an incident in our environment so the logs on some servers increased exponentially... much more that anticipated thus my Repository at some point reached 100%.
This resulted that all snapshots are gone from the GUI (image attached).
What I did first, since I was able to do it, I increased the file system so it went to 91%.
Then via the Backup & Maintenance I altered values there in order to delete some snapshots and run the jobs again (see log output below).
---------------
Code: Select all
tail: /usr/local/nagioslogserver/var/jobs.log: file truncated
Running command do_maintenance with args ' ' for job id: backup_maintenance
2016-09-01 09:39:01,864 INFO Job starting: optimize indices2016-09-01 09:39:01,864 WARNING Overriding default connection timeout. New timeout: 216002016-09-01 09:39:01,933 INFO Action optimize will be performed on the following indices: [u'logstash-2016.08.12', u'logstash-2016.08.13', u'logstash-2016.08.14', u'logstash-2016.08.15', u'logstash-2016.08.16', u'logstash-2016.08.17', u'logstash-2016.08.18', u'logstash-2016.08.19', u'logstash-2016.08.20', u'logstash-2016.08.21', u'logstash-2016.08.22', u'logstash-2016.08.23', u'logstash-2016.08.24', u'logstash-2016.08.25', u'logstash-2016.08.26', u'logstash-2016.08.27', u'logstash-2016.08.28', u'logstash-2016.08.29', u'logstash-2016.08.30']2016-09-01 09:39:02,326 INFO Job completed successfully.ine 889, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/lib/python2.6/site-packages/click/core.py", line 534, in invoke
return callback(*args, **kwargs)
File "/usr/lib64/python2.6/contextlib.py", line 34, in __exit__
self.gen.throw(type, value, traceback)
File "/usr/lib/python2.6/site-packages/click/core.py", line 86, in augment_usage_errors
yield
File "/usr/lib/python2.6/site-packages/click/core.py", line 534, in invoke
return callback(*args, **kwargs)
File "/usr/lib/python2.6/site-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "/usr/lib/python2.6/site-packages/curator/cli/index_selection.py", line 167, in indices
retval = do_command(client, ctx.parent.info_name, working_list, ctx.parent.params, master_timeout)
File "/usr/lib/python2.6/site-packages/curator/cli/utils.py", line 250, in do_command
skip_repo_validation=params['skip_repo_validation'],
File "/usr/lib/python2.6/site-packages/curator/api/snapshot.py", line 72, in create_snapshot
if name in all_snaps:
TypeError: argument of type 'bool' is not iterable
2016-09-01 09:39:02,424 INFO Job starting: snapshot indices2016-09-01 09:39:02,424 WARNING Overriding default connection timeout. New timeout: 216002016-09-01 09:39:02,436 INFO Action snapshot will be performed on the following indices: [u'logstash-2016.08.12', u'logstash-2016.08.13', u'logstash-2016.08.14', u'logstash-2016.08.15', u'logstash-2016.08.16', u'logstash-2016.08.17', u'logstash-2016.08.18', u'logstash-2016.08.19', u'logstash-2016.08.20', u'logstash-2016.08.21', u'logstash-2016.08.22', u'logstash-2016.08.23', u'logstash-2016.08.24', u'logstash-2016.08.25', u'logstash-2016.08.26', u'logstash-2016.08.27', u'logstash-2016.08.28', u'logstash-2016.08.29', u'logstash-2016.08.30', u'logstash-2016.08.31']2016-09-01 09:39:03,140 INFO Snapshot name: curator-201609010639032016-09-01 09:39:03,297 ERROR Unable to find all snapshots in repository: NLSSnaps2016-09-01 09:39:03,756 INFO Job starting: delete snapshots2016-09-01 09:39:03,917 ERROR Unable to find all snapshots in repository: NLSSnaps2016-09-01 09:39:03,917 ERROR No snapshots found in Elasticsearch.No snapshots found in Elasticsearch.2016-09-01 09:39:04,004 INFO Job starting: delete indices2016-09-01 09:39:04,016 INFO Pruning Kibana-related indices to prevent accidental deletion.2016-09-01 09:39:04,016 INFO Action delete will be performed on the following indices: [u'logstash-2016.08.12', u'logstash-2016.08.13', u'logstash-2016.08.14', u'logstash-2016.08.15', u'logstash-2016.08.16', u'logstash-2016.08.17']2016-09-01 09:39:04,017 INFO Deleting indices as a batch operation:2016-09-01 09:39:04,017 INFO ---deleting index logstash-2016.08.122016-09-01 09:39:04,017 INFO ---deleting index logstash-2016.08.132016-09-01 09:39:04,018 INFO ---deleting index logstash-2016.08.142016-09-01 09:39:04,018 INFO ---deleting index logstash-2016.08.152016-09-01 09:39:04,018 INFO ---deleting index logstash-2016.08.162016-09-01 09:39:04,018 INFO ---deleting index logstash-2016.08.172016-09-01 09:39:07,130 INFO Job completed successfully.tail: /usr/local/nagioslogserver/var/jobs.log: file truncated
Running command run_alerts with args ' ' for job id: run_all_alerts
SUCCESS
Running command run_alerts with args ' ' for job id: run_all_alerts
SUCCESS
Running command run_alerts with args ' ' for job id: run_all_alerts
SUCCESS
I was also getting the message:
"[2016-09-01 09:22:37,824][INFO ][cluster.routing.allocation.decider] [845bc07c-ed91-4920-8e23-747c9cc699f5] low disk watermark [85%] exceeded on [lXjC93b5QMm2hsgX9odwZA][845bc07c-ed91-4920-8e23-747c9cc699f5] free: 11.5gb[11.7%], replicas will not be assigned to this node"
I was not aware that there was a threashold of 85% so I had to delete some indices to restore cluster health.
Anyway now the Health Status is green but I still cannot see my snapshots.
From CLI the Repository directory look like this:
----------------------
Code: Select all
total 208
drwx------ 2 root root 4096 Sep 18 2015 lost+found
-rw-r--r-- 1 nagios users 22 May 1 17:04 tests-kZZz9XEMSSCVjbBJDvEDXw-mfeDlC7lScWV9EXNnkLcdQ
-rw-r--r-- 1 nagios users 22 May 1 17:04 tests-kZZz9XEMSSCVjbBJDvEDXw-agVsrjUBQvSfPLFQeY2p1A
-rw-r--r-- 1 nagios users 22 May 10 17:24 tests-XZeYQA6NQmGZsZZj8djxbQ-siq0YOCBQ4yCWb2epxpTqA
-rw-r--r-- 1 nagios users 22 May 10 17:24 tests-XZeYQA6NQmGZsZZj8djxbQ-jGfhi6kTQV-7fA4ps7nVQw
-rw-r--r-- 1 nagios users 22 Jul 29 17:26 tests-HzSOPt_iQJKjiUvdH5qD6g-P9hEYLDnQxqlb97xb2P9Qw
-rw-r--r-- 1 nagios users 443 Aug 11 17:26 metadata-curator-20160811142626
-rw-r--r-- 1 nagios users 506 Aug 11 17:27 snapshot-curator-20160811142626
-rw-r--r-- 1 nagios users 443 Aug 12 17:25 metadata-curator-20160812142557
-rw-r--r-- 1 nagios users 312 Aug 12 17:28 snapshot-curator-20160812142557
-rw-r--r-- 1 nagios users 443 Aug 13 17:25 metadata-curator-20160813142543
-rw-r--r-- 1 nagios users 313 Aug 13 17:28 snapshot-curator-20160813142543
-rw-r--r-- 1 nagios users 443 Aug 14 17:25 metadata-curator-20160814142552
-rw-r--r-- 1 nagios users 605 Aug 14 17:27 snapshot-curator-20160814142552
-rw-r--r-- 1 nagios users 443 Aug 15 17:25 metadata-curator-20160815142529
-rw-r--r-- 1 nagios users 311 Aug 15 17:26 snapshot-curator-20160815142529
-rw-r--r-- 1 nagios users 443 Aug 16 17:25 metadata-curator-20160816142524
-rw-r--r-- 1 nagios users 312 Aug 16 17:26 snapshot-curator-20160816142524
-rw-r--r-- 1 nagios users 443 Aug 17 17:25 metadata-curator-20160817142536
-rw-r--r-- 1 nagios users 314 Aug 17 17:27 snapshot-curator-20160817142536
-rw-r--r-- 1 nagios users 443 Aug 18 17:26 metadata-curator-20160818142620
-rw-r--r-- 1 nagios users 312 Aug 18 17:27 snapshot-curator-20160818142620
-rw-r--r-- 1 nagios users 447 Aug 19 17:27 metadata-curator-20160819142704
-rw-r--r-- 1 nagios users 312 Aug 19 17:28 snapshot-curator-20160819142704
-rw-r--r-- 1 nagios users 443 Aug 20 17:26 metadata-curator-20160820142630
-rw-r--r-- 1 nagios users 587 Aug 20 17:27 snapshot-curator-20160820142630
-rw-r--r-- 1 nagios users 447 Aug 21 17:26 metadata-curator-20160821142617
-rw-r--r-- 1 nagios users 307 Aug 21 17:27 snapshot-curator-20160821142617
-rw-r--r-- 1 nagios users 443 Aug 22 17:25 metadata-curator-20160822142539
-rw-r--r-- 1 nagios users 625 Aug 22 17:27 snapshot-curator-20160822142539
-rw-r--r-- 1 nagios users 443 Aug 23 19:31 metadata-curator-20160823163132
-rw-r--r-- 1 nagios users 311 Aug 23 19:40 snapshot-curator-20160823163132
-rw-r--r-- 1 nagios users 443 Aug 24 19:31 metadata-curator-20160824163158
-rw-r--r-- 1 nagios users 312 Aug 24 19:36 snapshot-curator-20160824163158
-rw-r--r-- 1 nagios users 443 Aug 25 19:32 metadata-curator-20160825163230
-rw-r--r-- 1 nagios users 320 Aug 25 19:36 snapshot-curator-20160825163230
-rw-r--r-- 1 nagios users 443 Aug 26 19:33 metadata-curator-20160826163330
-rw-r--r-- 1 nagios users 324 Aug 26 19:38 snapshot-curator-20160826163330
-rw-r--r-- 1 nagios users 443 Aug 27 19:33 metadata-curator-20160827163351
-rw-r--r-- 1 nagios users 328 Aug 27 19:36 snapshot-curator-20160827163351
-rw-r--r-- 1 nagios users 443 Aug 28 19:34 metadata-curator-20160828163410
-rw-r--r-- 1 nagios users 550 Aug 28 19:35 snapshot-curator-20160828163410
-rw-r--r-- 1 nagios users 443 Aug 29 19:32 metadata-curator-20160829163222
-rw-r--r-- 1 nagios users 333 Aug 29 19:33 snapshot-curator-20160829163222
-rw-r--r-- 1 nagios users 443 Aug 30 19:31 metadata-curator-20160830163152
-rw-r--r-- 1 nagios users 334 Aug 30 19:40 snapshot-curator-20160830163152
-rw-r--r-- 1 nagios users 443 Aug 31 19:35 metadata-curator-20160831163558
drwxr-xr-x 167 nagios nagios 20480 Aug 31 19:35 indices
-rw-r--r-- 1 nagios users 0 Aug 31 19:42 snapshot-curator-20160831163558
-rw-r--r-- 1 nagios users 0 Aug 31 19:42 index
-rw-r--r-- 1 nagios users 0 Sep 1 08:39 tests-svR0E6cGTkSd_nec4Hj-7g-masterI guess the problem is the index file wich is empty? Why this happened? How can I see the existing Reposiroty again from the GUI?
Thanx a lot.
BR,
Kostas