Page 1 of 3

PARTIAL backups

Posted: Wed Aug 19, 2015 6:25 pm
by stecino
Hello,

I am seeing partial backups to my repository, and it coincides with the upgrade to the latest version of NLS

I found this https://support.nagios.com/forum/viewto ... ps#p141561 thread, is this still a problem

Here is snapshot from my my elasticsearch log

d5d524d3-1f75-4846-aebd-043e13b0bdb9.log
[2015-08-18 05:59:20,314][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [nnsjq_log] (dynamic)
[2015-08-18 05:59:21,582][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [nns_log] (dynamic)
[2015-08-18 05:59:22,432][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [email_log] (dynamic)
[2015-08-18 05:59:29,350][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [ws3_log] (dynamic)
[2015-08-18 06:05:57,551][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [recipapp_log] (dynamic)
[2015-08-18 06:30:01,636][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [nnsr_log] (dynamic)
[2015-08-18 07:58:10,364][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [tty_log] (dynamic)
[2015-08-18 07:59:09,958][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [fax_log] (dynamic)
[2015-08-18 11:49:15,237][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [sportal_log] (dynamic)
[2015-08-18 12:34:31,299][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [eb_admincatalina_log] (dynamic)
[2015-08-18 12:41:51,609][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [eb_admin_log] (dynamic)
[2015-08-18 15:01:05,286][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [nnsh_log] (dynamic)
[2015-08-18 15:02:27,686][DEBUG][action.admin.indices.settings.put] [78a3bc74-18f9-46c6-a763-267d4860c047] failed to update settings on indices [logstash-2015.07.19]
org.elasticsearch.ElasticsearchIllegalArgumentException: Can't update non dynamic settings[[index.codec.bloom.load]] for open indices [[logstash-2015.07.19]]
at org.elasticsearch.cluster.metadata.MetaDataUpdateSettingsService$2.execute(MetaDataUpdateSettingsService.java:243)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:374)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:188)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:158)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2015-08-18 15:02:28,383][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.07.19] deleting index
[2015-08-18 15:02:57,075][DEBUG][action.admin.indices.close] [78a3bc74-18f9-46c6-a763-267d4860c047] failed to close indices [logstash-2015.07.19]
org.elasticsearch.indices.IndexMissingException: [logstash-2015.07.19] missing
at org.elasticsearch.cluster.metadata.MetaDataIndexStateService$1.execute(MetaDataIndexStateService.java:87)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:374)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:188)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:158)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2015-08-18 15:05:41,107][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.05] is done
[2015-08-18 15:11:05,224][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.08] is done
[2015-08-18 15:19:32,317][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.09] is done
[2015-08-18 15:23:12,117][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.10] is done
[2015-08-18 15:27:46,122][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.11] is done
[2015-08-18 15:32:29,501][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.12] is done
[2015-08-18 15:38:40,620][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.13] is done
[2015-08-18 15:45:00,522][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.14] is done
[2015-08-18 15:49:09,551][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.15] is done
[2015-08-18 15:54:15,369][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.16] is done
[2015-08-18 15:58:13,939][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.17] is done


Also attached is the administrative panel for backup

I am also seeing this error since I upgraded: org.elasticsearch.repositories.RepositoryException: [NLS_BACKUP] failed to create repository ...

It seems like it can't find the repository?

Thanks in advance

Re: PARTIAL backups

Posted: Wed Aug 19, 2015 6:44 pm
by Box293
stecino wrote:I am seeing partial backups to my repository, and it coincides with the upgrade to the latest version of NLS
I just want to confirm that you are talking about version 2015r2.2

Re: PARTIAL backups

Posted: Wed Aug 19, 2015 6:52 pm
by stecino
Box293 wrote:
stecino wrote:I am seeing partial backups to my repository, and it coincides with the upgrade to the latest version of NLS
I just want to confirm that you are talking about version 2015r2.2
Correct

Re: PARTIAL backups

Posted: Wed Aug 19, 2015 6:56 pm
by Box293
What are the permissions on /nls_backup

Code: Select all

ls -al /nls_backup
I'm assuming that this /nls_backup folder is accessible by all nodes in your cluster?

Re: PARTIAL backups

Posted: Fri Aug 21, 2015 3:42 pm
by stecino
Box293 wrote:What are the permissions on /nls_backup

Code: Select all

ls -al /nls_backup
I'm assuming that this /nls_backup folder is accessible by all nodes in your cluster?
Yes it has, from all the cluster nodes.

Re: PARTIAL backups

Posted: Sun Aug 23, 2015 8:50 pm
by Box293
Box293 wrote:What are the permissions on /nls_backup

Code: Select all

ls -al /nls_backup
Why type of storage is /nls_backup? NFS? LocalDisk?

Re: PARTIAL backups

Posted: Tue Aug 25, 2015 3:24 pm
by stecino
Box293 wrote:
Box293 wrote:What are the permissions on /nls_backup

Code: Select all

ls -al /nls_backup
Why type of storage is /nls_backup? NFS? LocalDisk?
It's a NFS mount point. I have never had this issue, until I did this version update. From that point on all my backups are PARTIAL. I really need to get this resolved.

Re: PARTIAL backups

Posted: Tue Aug 25, 2015 3:36 pm
by jolson
What happens if you delete all of your PARTIAL backups via the Web GUI, and run the following from the command line:

Code: Select all

curator snapshot --repository nlsback --older-than 1
At what time are your backups scheduled to run?

What is the date of your server?

Code: Select all

date
If you view your command subsystem (Administration -> Command Subsystem) do the job names all look proper?
2015-08-25 15_27_59-Command Subsystem • Nagios Log Server - Firefox Developer Edition.png
If the job names look different than mine, I want you to 'Reset All Jobs'.

In the error log you've provided, I see an interesting stack trace:

Code: Select all

[2015-08-18 15:02:27,686][DEBUG][action.admin.indices.settings.put] [78a3bc74-18f9-46c6-a763-267d4860c047] failed to update settings on indices [logstash-2015.07.19]
org.elasticsearch.ElasticsearchIllegalArgumentException: Can't update non dynamic settings[[index.codec.bloom.load]] for open indices [[logstash-2015.07.19]]
at org.elasticsearch.cluster.metadata.MetaDataUpdateSettingsService$2.execute(MetaDataUpdateSettingsService.java:243)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:374)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:188)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:158)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Bloom filters were removed in the most recent version of elasticsearch, which is why the above error is troubling to me. Let's check on your elasticsearch version:

Code: Select all

/usr/local/nagioslogserver/elasticsearch/bin/elasticsearch -v
It should be on version 1.6.0.

Finally, if none of the above helps at all, I want you to follow-tail jobs.log and force a backup command to see where the failure is occuring.

Code: Select all

tail -f /usr/local/nagioslogserver/var/jobs.log
Note that the above tail _must_ be run on all of your instances.

After the tail has begun, start the 'backup_maintenance' job from the command Subsystem. You should see one of your instances reporting backup information, which will hopefully contain the error we're looking for.

Re: PARTIAL backups

Posted: Tue Aug 25, 2015 6:19 pm
by stecino
jolson wrote:What happens if you delete all of your PARTIAL backups via the Web GUI, and run the following from the command line:

Code: Select all

curator snapshot --repository nlsback --older-than 1
At what time are your backups scheduled to run?

What is the date of your server?

Code: Select all

date
If you view your command subsystem (Administration -> Command Subsystem) do the job names all look proper?
2015-08-25 15_27_59-Command Subsystem • Nagios Log Server - Firefox Developer Edition.png
If the job names look different than mine, I want you to 'Reset All Jobs'.

In the error log you've provided, I see an interesting stack trace:

Code: Select all

[2015-08-18 15:02:27,686][DEBUG][action.admin.indices.settings.put] [78a3bc74-18f9-46c6-a763-267d4860c047] failed to update settings on indices [logstash-2015.07.19]
org.elasticsearch.ElasticsearchIllegalArgumentException: Can't update non dynamic settings[[index.codec.bloom.load]] for open indices [[logstash-2015.07.19]]
at org.elasticsearch.cluster.metadata.MetaDataUpdateSettingsService$2.execute(MetaDataUpdateSettingsService.java:243)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:374)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:188)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:158)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Bloom filters were removed in the most recent version of elasticsearch, which is why the above error is troubling to me. Let's check on your elasticsearch version:

Code: Select all

/usr/local/nagioslogserver/elasticsearch/bin/elasticsearch -v
It should be on version 1.6.0.

Finally, if none of the above helps at all, I want you to follow-tail jobs.log and force a backup command to see where the failure is occuring.

Code: Select all

tail -f /usr/local/nagioslogserver/var/jobs.log
Note that the above tail _must_ be run on all of your instances.

After the tail has begun, start the 'backup_maintenance' job from the command Subsystem. You should see one of your instances reporting backup information, which will hopefully contain the error we're looking for.

Deleted all the partial backups and ran the command

# curator snapshot --repository nlsback --older-than 1
2015-08-25 23:13:44,413 INFO Job starting...
2015-08-25 23:13:44,413 INFO Default timeout of 30 seconds is too low for command SNAPSHOT. Overriding to 21,600 seconds (6 hours).
2015-08-25 23:13:44,417 INFO Beginning SNAPSHOT operations...
2015-08-25 23:13:44,466 INFO Attempting to create snapshot for index logstash-2015.07.27.
2015-08-25 23:13:44,510 ERROR Error: TransportError(404, u'RemoteTransportException[[78a3bc74-18f9-46c6-a763-267d4860c047][inet[/xx.xx.xx.246:9300]][cluster:admin/snapshot/get]]; nested: RepositoryMissingException[[nlsback] missing]; ')
2015-08-25 23:13:44,551 ERROR Error: TransportError(404, u'RemoteTransportException[[78a3bc74-18f9-46c6-a763-267d4860c047][inet[/xx.xx.xx.246:9300]][cluster:admin/snapshot/get]]; nested: RepositoryMissingException[[nlsback] missing]; ')
Traceback (most recent call last):
File "/usr/bin/curator", line 9, in <module>
load_entry_point('elasticsearch-curator==1.2.2', 'console_scripts', 'curator')()
File "/usr/lib/python2.6/site-packages/curator/curator.py", line 731, in main
arguments.func(client, **argdict)
File "/usr/lib/python2.6/site-packages/curator/curator.py", line 585, in command_loop
skipped = op(client, index_name, **kwargs)
File "/usr/lib/python2.6/site-packages/curator/curator.py", line 406, in _create_snapshot
client.snapshot.create(repository=repository, snapshot=snap_name, body=body, wait_for_completion=wait_for_completion)
File "/usr/lib/python2.6/site-packages/elasticsearch/client/utils.py", line 68, in _wrapped
return func(*args, params=params, **kwargs)
File "/usr/lib/python2.6/site-packages/elasticsearch/client/snapshot.py", line 19, in create
repository, snapshot), params=params, body=body)
File "/usr/lib/python2.6/site-packages/elasticsearch/transport.py", line 284, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/usr/lib/python2.6/site-packages/elasticsearch/connection/http_urllib3.py", line 55, in perform_request
self._raise_error(response.status, raw_data)
File "/usr/lib/python2.6/site-packages/elasticsearch/connection/base.py", line 97, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.NotFoundError: TransportError(404, u'RemoteTransportException[[78a3bc74-18f9-46c6-a763-267d4860c047][inet[/xx.xx.xx.246:9300]][cluster:admin/snapshot/create]]; nested: RepositoryMissingException[[nlsback] missing]; ')

xx.xx.xx.xx:/nls_backup/
690G 114G 541G 18% /nls_backup

/usr/local/nagioslogserver/elasticsearch/bin/elasticsearch -v
Version: 1.6.0, Build: cdd3ac4/2015-06-09T13:36:34Z, JVM: 1.7.0_71

Attaching the systemJob interface. All times are in UTC

Re: PARTIAL backups

Posted: Wed Aug 26, 2015 10:44 am
by jolson
I should have clarified - when you run the following command:

Code: Select all

curator snapshot --repository nlsback --older-than 1
Be certain that 'nlsback' is replaced with the name of your repository (which could be something like 'NLS Backups').

e.g.

Code: Select all

curator snapshot --repository 'NLS Backup Repo' --older-than 1