PARTIAL backups

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
stecino
Posts: 248
Joined: Thu Mar 14, 2013 4:42 pm

PARTIAL backups

Post by stecino »

Hello,

I am seeing partial backups to my repository, and it coincides with the upgrade to the latest version of NLS

I found this https://support.nagios.com/forum/viewto ... ps#p141561 thread, is this still a problem

Here is snapshot from my my elasticsearch log

d5d524d3-1f75-4846-aebd-043e13b0bdb9.log
[2015-08-18 05:59:20,314][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [nnsjq_log] (dynamic)
[2015-08-18 05:59:21,582][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [nns_log] (dynamic)
[2015-08-18 05:59:22,432][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [email_log] (dynamic)
[2015-08-18 05:59:29,350][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [ws3_log] (dynamic)
[2015-08-18 06:05:57,551][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [recipapp_log] (dynamic)
[2015-08-18 06:30:01,636][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [nnsr_log] (dynamic)
[2015-08-18 07:58:10,364][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [tty_log] (dynamic)
[2015-08-18 07:59:09,958][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [fax_log] (dynamic)
[2015-08-18 11:49:15,237][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [sportal_log] (dynamic)
[2015-08-18 12:34:31,299][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [eb_admincatalina_log] (dynamic)
[2015-08-18 12:41:51,609][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [eb_admin_log] (dynamic)
[2015-08-18 15:01:05,286][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [nnsh_log] (dynamic)
[2015-08-18 15:02:27,686][DEBUG][action.admin.indices.settings.put] [78a3bc74-18f9-46c6-a763-267d4860c047] failed to update settings on indices [logstash-2015.07.19]
org.elasticsearch.ElasticsearchIllegalArgumentException: Can't update non dynamic settings[[index.codec.bloom.load]] for open indices [[logstash-2015.07.19]]
at org.elasticsearch.cluster.metadata.MetaDataUpdateSettingsService$2.execute(MetaDataUpdateSettingsService.java:243)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:374)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:188)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:158)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2015-08-18 15:02:28,383][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.07.19] deleting index
[2015-08-18 15:02:57,075][DEBUG][action.admin.indices.close] [78a3bc74-18f9-46c6-a763-267d4860c047] failed to close indices [logstash-2015.07.19]
org.elasticsearch.indices.IndexMissingException: [logstash-2015.07.19] missing
at org.elasticsearch.cluster.metadata.MetaDataIndexStateService$1.execute(MetaDataIndexStateService.java:87)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:374)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:188)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:158)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2015-08-18 15:05:41,107][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.05] is done
[2015-08-18 15:11:05,224][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.08] is done
[2015-08-18 15:19:32,317][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.09] is done
[2015-08-18 15:23:12,117][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.10] is done
[2015-08-18 15:27:46,122][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.11] is done
[2015-08-18 15:32:29,501][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.12] is done
[2015-08-18 15:38:40,620][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.13] is done
[2015-08-18 15:45:00,522][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.14] is done
[2015-08-18 15:49:09,551][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.15] is done
[2015-08-18 15:54:15,369][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.16] is done
[2015-08-18 15:58:13,939][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.17] is done


Also attached is the administrative panel for backup

I am also seeing this error since I upgraded: org.elasticsearch.repositories.RepositoryException: [NLS_BACKUP] failed to create repository ...

It seems like it can't find the repository?

Thanks in advance
You do not have the required permissions to view the files attached to this post.
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: PARTIAL backups

Post by Box293 »

stecino wrote:I am seeing partial backups to my repository, and it coincides with the upgrade to the latest version of NLS
I just want to confirm that you are talking about version 2015r2.2
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
stecino
Posts: 248
Joined: Thu Mar 14, 2013 4:42 pm

Re: PARTIAL backups

Post by stecino »

Box293 wrote:
stecino wrote:I am seeing partial backups to my repository, and it coincides with the upgrade to the latest version of NLS
I just want to confirm that you are talking about version 2015r2.2
Correct
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: PARTIAL backups

Post by Box293 »

What are the permissions on /nls_backup

Code: Select all

ls -al /nls_backup
I'm assuming that this /nls_backup folder is accessible by all nodes in your cluster?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
stecino
Posts: 248
Joined: Thu Mar 14, 2013 4:42 pm

Re: PARTIAL backups

Post by stecino »

Box293 wrote:What are the permissions on /nls_backup

Code: Select all

ls -al /nls_backup
I'm assuming that this /nls_backup folder is accessible by all nodes in your cluster?
Yes it has, from all the cluster nodes.
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: PARTIAL backups

Post by Box293 »

Box293 wrote:What are the permissions on /nls_backup

Code: Select all

ls -al /nls_backup
Why type of storage is /nls_backup? NFS? LocalDisk?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
stecino
Posts: 248
Joined: Thu Mar 14, 2013 4:42 pm

Re: PARTIAL backups

Post by stecino »

Box293 wrote:
Box293 wrote:What are the permissions on /nls_backup

Code: Select all

ls -al /nls_backup
Why type of storage is /nls_backup? NFS? LocalDisk?
It's a NFS mount point. I have never had this issue, until I did this version update. From that point on all my backups are PARTIAL. I really need to get this resolved.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: PARTIAL backups

Post by jolson »

What happens if you delete all of your PARTIAL backups via the Web GUI, and run the following from the command line:

Code: Select all

curator snapshot --repository nlsback --older-than 1
At what time are your backups scheduled to run?

What is the date of your server?

Code: Select all

date
If you view your command subsystem (Administration -> Command Subsystem) do the job names all look proper?
2015-08-25 15_27_59-Command Subsystem • Nagios Log Server - Firefox Developer Edition.png
If the job names look different than mine, I want you to 'Reset All Jobs'.

In the error log you've provided, I see an interesting stack trace:

Code: Select all

[2015-08-18 15:02:27,686][DEBUG][action.admin.indices.settings.put] [78a3bc74-18f9-46c6-a763-267d4860c047] failed to update settings on indices [logstash-2015.07.19]
org.elasticsearch.ElasticsearchIllegalArgumentException: Can't update non dynamic settings[[index.codec.bloom.load]] for open indices [[logstash-2015.07.19]]
at org.elasticsearch.cluster.metadata.MetaDataUpdateSettingsService$2.execute(MetaDataUpdateSettingsService.java:243)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:374)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:188)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:158)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Bloom filters were removed in the most recent version of elasticsearch, which is why the above error is troubling to me. Let's check on your elasticsearch version:

Code: Select all

/usr/local/nagioslogserver/elasticsearch/bin/elasticsearch -v
It should be on version 1.6.0.

Finally, if none of the above helps at all, I want you to follow-tail jobs.log and force a backup command to see where the failure is occuring.

Code: Select all

tail -f /usr/local/nagioslogserver/var/jobs.log
Note that the above tail _must_ be run on all of your instances.

After the tail has begun, start the 'backup_maintenance' job from the command Subsystem. You should see one of your instances reporting backup information, which will hopefully contain the error we're looking for.
You do not have the required permissions to view the files attached to this post.
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
stecino
Posts: 248
Joined: Thu Mar 14, 2013 4:42 pm

Re: PARTIAL backups

Post by stecino »

jolson wrote:What happens if you delete all of your PARTIAL backups via the Web GUI, and run the following from the command line:

Code: Select all

curator snapshot --repository nlsback --older-than 1
At what time are your backups scheduled to run?

What is the date of your server?

Code: Select all

date
If you view your command subsystem (Administration -> Command Subsystem) do the job names all look proper?
2015-08-25 15_27_59-Command Subsystem • Nagios Log Server - Firefox Developer Edition.png
If the job names look different than mine, I want you to 'Reset All Jobs'.

In the error log you've provided, I see an interesting stack trace:

Code: Select all

[2015-08-18 15:02:27,686][DEBUG][action.admin.indices.settings.put] [78a3bc74-18f9-46c6-a763-267d4860c047] failed to update settings on indices [logstash-2015.07.19]
org.elasticsearch.ElasticsearchIllegalArgumentException: Can't update non dynamic settings[[index.codec.bloom.load]] for open indices [[logstash-2015.07.19]]
at org.elasticsearch.cluster.metadata.MetaDataUpdateSettingsService$2.execute(MetaDataUpdateSettingsService.java:243)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:374)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:188)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:158)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Bloom filters were removed in the most recent version of elasticsearch, which is why the above error is troubling to me. Let's check on your elasticsearch version:

Code: Select all

/usr/local/nagioslogserver/elasticsearch/bin/elasticsearch -v
It should be on version 1.6.0.

Finally, if none of the above helps at all, I want you to follow-tail jobs.log and force a backup command to see where the failure is occuring.

Code: Select all

tail -f /usr/local/nagioslogserver/var/jobs.log
Note that the above tail _must_ be run on all of your instances.

After the tail has begun, start the 'backup_maintenance' job from the command Subsystem. You should see one of your instances reporting backup information, which will hopefully contain the error we're looking for.

Deleted all the partial backups and ran the command

# curator snapshot --repository nlsback --older-than 1
2015-08-25 23:13:44,413 INFO Job starting...
2015-08-25 23:13:44,413 INFO Default timeout of 30 seconds is too low for command SNAPSHOT. Overriding to 21,600 seconds (6 hours).
2015-08-25 23:13:44,417 INFO Beginning SNAPSHOT operations...
2015-08-25 23:13:44,466 INFO Attempting to create snapshot for index logstash-2015.07.27.
2015-08-25 23:13:44,510 ERROR Error: TransportError(404, u'RemoteTransportException[[78a3bc74-18f9-46c6-a763-267d4860c047][inet[/xx.xx.xx.246:9300]][cluster:admin/snapshot/get]]; nested: RepositoryMissingException[[nlsback] missing]; ')
2015-08-25 23:13:44,551 ERROR Error: TransportError(404, u'RemoteTransportException[[78a3bc74-18f9-46c6-a763-267d4860c047][inet[/xx.xx.xx.246:9300]][cluster:admin/snapshot/get]]; nested: RepositoryMissingException[[nlsback] missing]; ')
Traceback (most recent call last):
File "/usr/bin/curator", line 9, in <module>
load_entry_point('elasticsearch-curator==1.2.2', 'console_scripts', 'curator')()
File "/usr/lib/python2.6/site-packages/curator/curator.py", line 731, in main
arguments.func(client, **argdict)
File "/usr/lib/python2.6/site-packages/curator/curator.py", line 585, in command_loop
skipped = op(client, index_name, **kwargs)
File "/usr/lib/python2.6/site-packages/curator/curator.py", line 406, in _create_snapshot
client.snapshot.create(repository=repository, snapshot=snap_name, body=body, wait_for_completion=wait_for_completion)
File "/usr/lib/python2.6/site-packages/elasticsearch/client/utils.py", line 68, in _wrapped
return func(*args, params=params, **kwargs)
File "/usr/lib/python2.6/site-packages/elasticsearch/client/snapshot.py", line 19, in create
repository, snapshot), params=params, body=body)
File "/usr/lib/python2.6/site-packages/elasticsearch/transport.py", line 284, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/usr/lib/python2.6/site-packages/elasticsearch/connection/http_urllib3.py", line 55, in perform_request
self._raise_error(response.status, raw_data)
File "/usr/lib/python2.6/site-packages/elasticsearch/connection/base.py", line 97, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.NotFoundError: TransportError(404, u'RemoteTransportException[[78a3bc74-18f9-46c6-a763-267d4860c047][inet[/xx.xx.xx.246:9300]][cluster:admin/snapshot/create]]; nested: RepositoryMissingException[[nlsback] missing]; ')

xx.xx.xx.xx:/nls_backup/
690G 114G 541G 18% /nls_backup

/usr/local/nagioslogserver/elasticsearch/bin/elasticsearch -v
Version: 1.6.0, Build: cdd3ac4/2015-06-09T13:36:34Z, JVM: 1.7.0_71

Attaching the systemJob interface. All times are in UTC
You do not have the required permissions to view the files attached to this post.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: PARTIAL backups

Post by jolson »

I should have clarified - when you run the following command:

Code: Select all

curator snapshot --repository nlsback --older-than 1
Be certain that 'nlsback' is replaced with the name of your repository (which could be something like 'NLS Backups').

e.g.

Code: Select all

curator snapshot --repository 'NLS Backup Repo' --older-than 1
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
Locked