Page 4 of 5

Re: backup

Posted: Thu May 07, 2015 10:45 am
by pccwglobalit
thanks.
now nls1 nagios uid is 8004 and nls2 nagios uid is 8005.
when nls1 write something, in nls2 it will show 8004. but nls2 can alsow write to nfs, will that affect backup?
or every node can write to nfs and no need same UID?
thanks for your help.

Re: backup

Posted: Thu May 07, 2015 11:00 am
by jolson
I would get the UID's synchronized personally - I believe it's the best solution here, and will help us avoid a lot of potential complications. Any chance you can change the 'nagios' user UID's on your NLS nodes?

This is a great resource regarding how to do so: https://muffinresearch.co.uk/linux-chan ... -for-user/

I tested the below on one of my NLS nodes, and I cannot see any problems. I change 'nagios' UID from 500 to 501:

Gracefully shut down NLS and company:

Code: Select all

service elasticsearch stop
service logstash stop
service httpd stop
service crond stop
Pkill hanging processes:

Code: Select all

pkill -u nagios
Change UID Permissions:

Code: Select all

usermod -u 501 nagios
find / -user 500 -exec chown -h 501 {} \;
Start up all processes:

Code: Select all

service elasticsearch start
service logstash start
service httpd start
service crond start
You will have to change GID permissions if the GID is messed up as well - but in general, 'nagios' should belong to GID 100 - users.

Re: backup

Posted: Thu May 07, 2015 11:03 am
by pccwglobalit
thanks. this is a great way to do that i think.

by the way, the backup job will backup all the log?

Re: backup

Posted: Thu May 07, 2015 11:05 am
by jolson
Correct - the backup job should backup all of the logs to the repository. This way, you are able to restore them from backup if you wind up losing some information.

You can read more about the process here: http://assets.nagios.com/downloads/nagi ... enance.pdf

Re: backup

Posted: Tue May 12, 2015 7:57 am
by pccwglobalit
i just started to backup and only two dates are showing in progress since last night. it was 24 hours already

Name State Indexes Actions
logstash-2015.02.11 IN_PROGRESS logstash-2015.02.11 restore delete
logstash-2014.12.18 IN_PROGRESS logstash-2014.12.18 restore delete


backups Waiting SUCCESS 05/11/2015 17:50:13 1 day 05/12/2015 15:50:13 System Edit
backup_maintenance Waiting SUCCESS 05/12/2015 02:14:23 1 day 05/13/2015 02:14:23 System Edit

is it successful or we need wait more time?

Re: backup

Posted: Tue May 12, 2015 9:36 am
by jolson
How large are the shards that you're backing up? The speed will depend on your storage medium, and on the size of your information.

Please run the following command and report the status to us:

Code: Select all

curl -s -XGET 'http://localhost:9200/_cluster/state?pretty' | grep snapshot -A 100

Re: backup

Posted: Tue May 12, 2015 11:46 pm
by pccwglobalit
please find the log

Re: backup

Posted: Wed May 13, 2015 9:16 am
by jolson
It looks like your 'logstash-2015.02.11' snapshot might be stuck initializing. Let's try and kill the snapshot and re-running your backup while following the logs:

Code: Select all

curator delete --prefix logstash-2015.02.11 --older-than 1
Start a tail of jobs.log on all of your nodes:

Code: Select all

/usr/local/nagioslogserver/var/jobs.log
jobs.log will be 0b in size because it's truncated often. This is normal. Please run the above command before continuing.

Navigate to 'Administration -> Command Subsystem' and run the 'Reset all Jobs' command. After that is finished, run the 'Backup and Maintenance' command. Let us know the results. Thanks!

Re: backup

Posted: Wed May 13, 2015 9:35 am
by pccwglobalit
2015-05-13 14:26:55,905 INFO Job starting...
2015-05-13 14:26:55,913 INFO Beginning DELETE operations...
2015-05-13 14:26:55,928 ERROR Could not find a valid timestamp for logstash-2015.02.11 with timestring %Y.%m.%d
2015-05-13 14:26:55,928 INFO DELETE index operations completed.
2015-05-13 14:26:55,928 INFO Done in 0:00:00.044038.


Running command do_maintenance with args ' ' for job id: backup_maintenance
2015-05-13 14:28:47,236 INFO Job starting...
2015-05-13 14:28:47,251 INFO Beginning BLOOM operations...
2015-05-13 14:28:47,287 INFO Attempting to disable bloom filter for index logstash-2014.09.08.
2015-05-13 14:28:47,291 INFO Skipping index logstash-2014.09.08: Already closed.
2015-05-13 14:28:47,291 INFO Attempting to disable bloom filter for index logstash-2014.09.09.
2015-05-13 14:28:47,293 INFO Skipping index logstash-2014.09.09: Already closed.
2015-05-13 14:28:47,293 INFO Attempting to disable bloom filter for index logstash-2014.09.10.
2015-05-13 14:28:47,295 INFO Skipping index logstash-2014.09.10: Already closed.
2015-05-13 14:28:47,295 INFO Attempting to disable bloom filter for index logstash-2014.09.11.
2015-05-13 14:28:47,296 INFO Skipping index logstash-2014.09.11: Already closed.
2015-05-13 14:28:47,297 INFO Attempting to disable bloom filter for index logstash-2014.09.12.
2015-05-13 14:28:47,298 INFO Skipping index logstash-2014.09.12: Already closed.
2015-05-13 14:28:47,298 INFO Attempting to disable bloom filter for index logstash-2014.09.13.
2015-05-13 14:28:47,300 INFO Skipping index logstash-2014.09.13: Already closed.
2015-05-13 14:28:47,300 INFO Attempting to disable bloom filter for index logstash-2014.09.14.
2015-05-13 14:28:47,302 INFO Skipping index logstash-2014.09.14: Already closed.


2015-05-13 14:32:16,692 INFO Attempting to optimize index logstash-2015.02.23.
2015-05-13 14:32:16,985 INFO Skipping index logstash-2015.02.23: Already optimized.
2015-05-13 14:32:16,986 INFO Attempting to optimize index logstash-2015.02.24.
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/curator/curator.py", line 736, in <module>
main()
File "/usr/lib/python2.7/site-packages/curator/curator.py", line 731, in main
arguments.func(client, **argdict)
File "/usr/lib/python2.7/site-packages/curator/curator.py", line 585, in command_loop
skipped = op(client, index_name, **kwargs)
File "/usr/lib/python2.7/site-packages/curator/curator.py", line 406, in _create_snapshot
client.snapshot.create(repository=repository, snapshot=snap_name, body=body, wait_for_completion=wait_for_completion)
File "/usr/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 68, in _wrapped
return func(*args, params=params, **kwargs)
File "/usr/lib/python2.7/site-packages/elasticsearch/client/snapshot.py", line 22, in create
repository, snapshot), params=params, body=body)
File "/usr/lib/python2.7/site-packages/elasticsearch/transport.py", line 301, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/usr/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 82, in perform_request
self._raise_error(response.status, raw_data)
File "/usr/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 102, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.TransportError: TransportError(503, u'ConcurrentSnapshotExecutionException[[nlsbackup:logstash-2015.02.13] a snapshot is already running]')

Re: backup

Posted: Wed May 13, 2015 9:39 am
by jolson
[[nlsbackup:logstash-2015.02.13] a snapshot is already running]
Is this where your backup procedure stopped? If so, let's try killing the above snapshot and re-running the backup procedure once more as per my previous post.

Code: Select all

curator delete --prefix logstash-2015.02.13 --older-than 1
Start the jobs.log tail, and re-run the backup command from the GUI.