Nagios Support Forum

Posted: **Wed Aug 19, 2015 6:25 pm**

Hello,

I am seeing partial backups to my repository, and it coincides with the upgrade to the latest version of NLS

I found this https://support.nagios.com/forum/viewto ... ps#p141561 thread, is this still a problem

Here is snapshot from my my elasticsearch log

d5d524d3-1f75-4846-aebd-043e13b0bdb9.log
[2015-08-18 05:59:20,314][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [nnsjq_log] (dynamic)
[2015-08-18 05:59:21,582][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [nns_log] (dynamic)
[2015-08-18 05:59:22,432][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [email_log] (dynamic)
[2015-08-18 05:59:29,350][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [ws3_log] (dynamic)
[2015-08-18 06:05:57,551][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [recipapp_log] (dynamic)
[2015-08-18 06:30:01,636][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [nnsr_log] (dynamic)
[2015-08-18 07:58:10,364][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [tty_log] (dynamic)
[2015-08-18 07:59:09,958][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [fax_log] (dynamic)
[2015-08-18 11:49:15,237][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [sportal_log] (dynamic)
[2015-08-18 12:34:31,299][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [eb_admincatalina_log] (dynamic)
[2015-08-18 12:41:51,609][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [eb_admin_log] (dynamic)
[2015-08-18 15:01:05,286][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.08.18] update_mapping [nnsh_log] (dynamic)
[2015-08-18 15:02:27,686][DEBUG][action.admin.indices.settings.put] [78a3bc74-18f9-46c6-a763-267d4860c047] failed to update settings on indices [logstash-2015.07.19]
org.elasticsearch.ElasticsearchIllegalArgumentException: Can't update non dynamic settings[[index.codec.bloom.load]] for open indices [[logstash-2015.07.19]]
at org.elasticsearch.cluster.metadata.MetaDataUpdateSettingsService$2.execute(MetaDataUpdateSettingsService.java:243)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:374)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:188)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:158)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2015-08-18 15:02:28,383][INFO ][cluster.metadata ] [78a3bc74-18f9-46c6-a763-267d4860c047] [logstash-2015.07.19] deleting index
[2015-08-18 15:02:57,075][DEBUG][action.admin.indices.close] [78a3bc74-18f9-46c6-a763-267d4860c047] failed to close indices [logstash-2015.07.19]
org.elasticsearch.indices.IndexMissingException: [logstash-2015.07.19] missing
at org.elasticsearch.cluster.metadata.MetaDataIndexStateService$1.execute(MetaDataIndexStateService.java:87)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:374)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:188)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:158)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2015-08-18 15:05:41,107][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.05] is done
[2015-08-18 15:11:05,224][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.08] is done
[2015-08-18 15:19:32,317][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.09] is done
[2015-08-18 15:23:12,117][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.10] is done
[2015-08-18 15:27:46,122][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.11] is done
[2015-08-18 15:32:29,501][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.12] is done
[2015-08-18 15:38:40,620][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.13] is done
[2015-08-18 15:45:00,522][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.14] is done
[2015-08-18 15:49:09,551][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.15] is done
[2015-08-18 15:54:15,369][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.16] is done
[2015-08-18 15:58:13,939][INFO ][snapshots ] [78a3bc74-18f9-46c6-a763-267d4860c047] snapshot [NLS_BACKUP:logstash-2015.08.17] is done

Also attached is the administrative panel for backup

I am also seeing this error since I upgraded: org.elasticsearch.repositories.RepositoryException: [NLS_BACKUP] failed to create repository ...

It seems like it can't find the repository?

Thanks in advance

Posted: **Wed Aug 19, 2015 6:44 pm**

stecino wrote:I am seeing partial backups to my repository, and it coincides with the upgrade to the latest version of NLS

I just want to confirm that you are talking about version 2015r2.2

Posted: **Wed Aug 19, 2015 6:52 pm**

Box293 wrote:
stecino wrote:I am seeing partial backups to my repository, and it coincides with the upgrade to the latest version of NLS
I just want to confirm that you are talking about version 2015r2.2

Correct

Posted: **Wed Aug 19, 2015 6:56 pm**

What are the permissions on /nls_backup

Code: Select all

ls -al /nls_backup

I'm assuming that this /nls_backup folder is accessible by all nodes in your cluster?

Posted: **Fri Aug 21, 2015 3:42 pm**

Box293 wrote:What are the permissions on /nls_backup
Code: Select all
ls -al /nls_backup
I'm assuming that this /nls_backup folder is accessible by all nodes in your cluster?

Yes it has, from all the cluster nodes.

Posted: **Sun Aug 23, 2015 8:50 pm**

Box293 wrote:What are the permissions on /nls_backup
Code: Select all
ls -al /nls_backup

Why type of storage is /nls_backup? NFS? LocalDisk?

Posted: **Tue Aug 25, 2015 3:24 pm**

Box293 wrote:
Box293 wrote:What are the permissions on /nls_backup
Code: Select all
ls -al /nls_backup
Why type of storage is /nls_backup? NFS? LocalDisk?

It's a NFS mount point. I have never had this issue, until I did this version update. From that point on all my backups are PARTIAL. I really need to get this resolved.

Posted: **Tue Aug 25, 2015 3:36 pm**

What happens if you delete all of your PARTIAL backups via the Web GUI, and run the following from the command line:

Code: Select all

curator snapshot --repository nlsback --older-than 1

At what time are your backups scheduled to run?

What is the date of your server?

Code: Select all

date

If you view your command subsystem (Administration -> Command Subsystem) do the job names all look proper?

2015-08-25 15_27_59-Command Subsystem • Nagios Log Server - Firefox Developer Edition.png

If the job names look different than mine, I want you to 'Reset All Jobs'.

In the error log you've provided, I see an interesting stack trace:

Code: Select all

[2015-08-18 15:02:27,686][DEBUG][action.admin.indices.settings.put] [78a3bc74-18f9-46c6-a763-267d4860c047] failed to update settings on indices [logstash-2015.07.19]
org.elasticsearch.ElasticsearchIllegalArgumentException: Can't update non dynamic settings[[index.codec.bloom.load]] for open indices [[logstash-2015.07.19]]
at org.elasticsearch.cluster.metadata.MetaDataUpdateSettingsService$2.execute(MetaDataUpdateSettingsService.java:243)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:374)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:188)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:158)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Bloom filters were removed in the most recent version of elasticsearch, which is why the above error is troubling to me. Let's check on your elasticsearch version:

Code: Select all

/usr/local/nagioslogserver/elasticsearch/bin/elasticsearch -v

It should be on version 1.6.0.

Finally, if none of the above helps at all, I want you to follow-tail jobs.log and force a backup command to see where the failure is occuring.

Code: Select all

tail -f /usr/local/nagioslogserver/var/jobs.log

Note that the above tail _must_ be run on all of your instances.

After the tail has begun, start the 'backup_maintenance' job from the command Subsystem. You should see one of your instances reporting backup information, which will hopefully contain the error we're looking for.

Posted: **Tue Aug 25, 2015 6:19 pm**

jolson wrote:What happens if you delete all of your PARTIAL backups via the Web GUI, and run the following from the command line:
Code: Select all
curator snapshot --repository nlsback --older-than 1
At what time are your backups scheduled to run?

What is the date of your server?
Code: Select all
date
If you view your command subsystem (Administration -> Command Subsystem) do the job names all look proper?
2015-08-25 15_27_59-Command Subsystem • Nagios Log Server - Firefox Developer Edition.png
If the job names look different than mine, I want you to 'Reset All Jobs'.

In the error log you've provided, I see an interesting stack trace:
Code: Select all
[2015-08-18 15:02:27,686][DEBUG][action.admin.indices.settings.put] [78a3bc74-18f9-46c6-a763-267d4860c047] failed to update settings on indices [logstash-2015.07.19]
org.elasticsearch.ElasticsearchIllegalArgumentException: Can't update non dynamic settings[[index.codec.bloom.load]] for open indices [[logstash-2015.07.19]]
at org.elasticsearch.cluster.metadata.MetaDataUpdateSettingsService$2.execute(MetaDataUpdateSettingsService.java:243)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:374)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:188)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:158)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Bloom filters were removed in the most recent version of elasticsearch, which is why the above error is troubling to me. Let's check on your elasticsearch version:
Code: Select all
/usr/local/nagioslogserver/elasticsearch/bin/elasticsearch -v
It should be on version 1.6.0.

Finally, if none of the above helps at all, I want you to follow-tail jobs.log and force a backup command to see where the failure is occuring.
Code: Select all
tail -f /usr/local/nagioslogserver/var/jobs.log
Note that the above tail _must_ be run on all of your instances.

After the tail has begun, start the 'backup_maintenance' job from the command Subsystem. You should see one of your instances reporting backup information, which will hopefully contain the error we're looking for.

Deleted all the partial backups and ran the command

# curator snapshot --repository nlsback --older-than 1
2015-08-25 23:13:44,413 INFO Job starting...
2015-08-25 23:13:44,413 INFO Default timeout of 30 seconds is too low for command SNAPSHOT. Overriding to 21,600 seconds (6 hours).
2015-08-25 23:13:44,417 INFO Beginning SNAPSHOT operations...
2015-08-25 23:13:44,466 INFO Attempting to create snapshot for index logstash-2015.07.27.
2015-08-25 23:13:44,510 ERROR Error: TransportError(404, u'RemoteTransportException[[78a3bc74-18f9-46c6-a763-267d4860c047][inet[/xx.xx.xx.246:9300]][cluster:admin/snapshot/get]]; nested: RepositoryMissingException[[nlsback] missing]; ')
2015-08-25 23:13:44,551 ERROR Error: TransportError(404, u'RemoteTransportException[[78a3bc74-18f9-46c6-a763-267d4860c047][inet[/xx.xx.xx.246:9300]][cluster:admin/snapshot/get]]; nested: RepositoryMissingException[[nlsback] missing]; ')
Traceback (most recent call last):
File "/usr/bin/curator", line 9, in <module>
load_entry_point('elasticsearch-curator==1.2.2', 'console_scripts', 'curator')()
File "/usr/lib/python2.6/site-packages/curator/curator.py", line 731, in main
arguments.func(client, **argdict)
File "/usr/lib/python2.6/site-packages/curator/curator.py", line 585, in command_loop
skipped = op(client, index_name, **kwargs)
File "/usr/lib/python2.6/site-packages/curator/curator.py", line 406, in _create_snapshot
client.snapshot.create(repository=repository, snapshot=snap_name, body=body, wait_for_completion=wait_for_completion)
File "/usr/lib/python2.6/site-packages/elasticsearch/client/utils.py", line 68, in _wrapped
return func(*args, params=params, **kwargs)
File "/usr/lib/python2.6/site-packages/elasticsearch/client/snapshot.py", line 19, in create
repository, snapshot), params=params, body=body)
File "/usr/lib/python2.6/site-packages/elasticsearch/transport.py", line 284, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/usr/lib/python2.6/site-packages/elasticsearch/connection/http_urllib3.py", line 55, in perform_request
self._raise_error(response.status, raw_data)
File "/usr/lib/python2.6/site-packages/elasticsearch/connection/base.py", line 97, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.NotFoundError: TransportError(404, u'RemoteTransportException[[78a3bc74-18f9-46c6-a763-267d4860c047][inet[/xx.xx.xx.246:9300]][cluster:admin/snapshot/create]]; nested: RepositoryMissingException[[nlsback] missing]; ')

xx.xx.xx.xx:/nls_backup/
690G 114G 541G 18% /nls_backup

/usr/local/nagioslogserver/elasticsearch/bin/elasticsearch -v
Version: 1.6.0, Build: cdd3ac4/2015-06-09T13:36:34Z, JVM: 1.7.0_71

Attaching the systemJob interface. All times are in UTC

Posted: **Wed Aug 26, 2015 10:44 am**

I should have clarified - when you run the following command:

Code: Select all

curator snapshot --repository nlsback --older-than 1

Be certain that 'nlsback' is replaced with the name of your repository (which could be something like 'NLS Backups').

e.g.

Code: Select all

curator snapshot --repository 'NLS Backup Repo' --older-than 1

Nagios Support Forum

PARTIAL backups

PARTIAL backups

Re: PARTIAL backups

Re: PARTIAL backups

Re: PARTIAL backups

Re: PARTIAL backups

Re: PARTIAL backups

Re: PARTIAL backups

Re: PARTIAL backups

Re: PARTIAL backups

Re: PARTIAL backups