backup

pccwglobalit · Post by **pccwglobalit** » Thu May 07, 2015 10:45 am

thanks.
now nls1 nagios uid is 8004 and nls2 nagios uid is 8005.
when nls1 write something, in nls2 it will show 8004. but nls2 can alsow write to nfs, will that affect backup?
or every node can write to nfs and no need same UID?
thanks for your help.

jolson · Post by **jolson** » Thu May 07, 2015 11:00 am

I would get the UID's synchronized personally - I believe it's the best solution here, and will help us avoid a lot of potential complications. Any chance you can change the 'nagios' user UID's on your NLS nodes?

This is a great resource regarding how to do so: https://muffinresearch.co.uk/linux-chan ... -for-user/

I tested the below on one of my NLS nodes, and I cannot see any problems. I change 'nagios' UID from 500 to 501:

Gracefully shut down NLS and company:

Code: Select all

service elasticsearch stop
service logstash stop
service httpd stop
service crond stop

Pkill hanging processes:

Code: Select all

pkill -u nagios

Change UID Permissions:

Code: Select all

usermod -u 501 nagios
find / -user 500 -exec chown -h 501 {} \;

Start up all processes:

Code: Select all

service elasticsearch start
service logstash start
service httpd start
service crond start

You will have to change GID permissions if the GID is messed up as well - but in general, 'nagios' should belong to GID 100 - users.

pccwglobalit · Post by **pccwglobalit** » Thu May 07, 2015 11:03 am

thanks. this is a great way to do that i think.

by the way, the backup job will backup all the log?

jolson · Post by **jolson** » Thu May 07, 2015 11:05 am

Correct - the backup job should backup all of the logs to the repository. This way, you are able to restore them from backup if you wind up losing some information.

You can read more about the process here: http://assets.nagios.com/downloads/nagi ... enance.pdf

pccwglobalit · Post by **pccwglobalit** » Tue May 12, 2015 7:57 am

i just started to backup and only two dates are showing in progress since last night. it was 24 hours already

Name State Indexes Actions
logstash-2015.02.11 IN_PROGRESS logstash-2015.02.11 restore delete
logstash-2014.12.18 IN_PROGRESS logstash-2014.12.18 restore delete

backups Waiting SUCCESS 05/11/2015 17:50:13 1 day 05/12/2015 15:50:13 System Edit
backup_maintenance Waiting SUCCESS 05/12/2015 02:14:23 1 day 05/13/2015 02:14:23 System Edit

is it successful or we need wait more time?

jolson · Post by **jolson** » Tue May 12, 2015 9:36 am

How large are the shards that you're backing up? The speed will depend on your storage medium, and on the size of your information.

Please run the following command and report the status to us:

Code: Select all

curl -s -XGET 'http://localhost:9200/_cluster/state?pretty' | grep snapshot -A 100

pccwglobalit · Post by **pccwglobalit** » Tue May 12, 2015 11:46 pm

please find the log

jolson · Post by **jolson** » Wed May 13, 2015 9:16 am

It looks like your 'logstash-2015.02.11' snapshot might be stuck initializing. Let's try and kill the snapshot and re-running your backup while following the logs:

Code: Select all

curator delete --prefix logstash-2015.02.11 --older-than 1

Start a tail of jobs.log on all of your nodes:

Code: Select all

/usr/local/nagioslogserver/var/jobs.log

jobs.log will be 0b in size because it's truncated often. This is normal. Please run the above command before continuing.

Navigate to 'Administration -> Command Subsystem' and run the 'Reset all Jobs' command. After that is finished, run the 'Backup and Maintenance' command. Let us know the results. Thanks!

pccwglobalit · Post by **pccwglobalit** » Wed May 13, 2015 9:35 am

2015-05-13 14:26:55,905 INFO Job starting...
2015-05-13 14:26:55,913 INFO Beginning DELETE operations...
2015-05-13 14:26:55,928 ERROR Could not find a valid timestamp for logstash-2015.02.11 with timestring %Y.%m.%d
2015-05-13 14:26:55,928 INFO DELETE index operations completed.
2015-05-13 14:26:55,928 INFO Done in 0:00:00.044038.

Running command do_maintenance with args ' ' for job id: backup_maintenance
2015-05-13 14:28:47,236 INFO Job starting...
2015-05-13 14:28:47,251 INFO Beginning BLOOM operations...
2015-05-13 14:28:47,287 INFO Attempting to disable bloom filter for index logstash-2014.09.08.
2015-05-13 14:28:47,291 INFO Skipping index logstash-2014.09.08: Already closed.
2015-05-13 14:28:47,291 INFO Attempting to disable bloom filter for index logstash-2014.09.09.
2015-05-13 14:28:47,293 INFO Skipping index logstash-2014.09.09: Already closed.
2015-05-13 14:28:47,293 INFO Attempting to disable bloom filter for index logstash-2014.09.10.
2015-05-13 14:28:47,295 INFO Skipping index logstash-2014.09.10: Already closed.
2015-05-13 14:28:47,295 INFO Attempting to disable bloom filter for index logstash-2014.09.11.
2015-05-13 14:28:47,296 INFO Skipping index logstash-2014.09.11: Already closed.
2015-05-13 14:28:47,297 INFO Attempting to disable bloom filter for index logstash-2014.09.12.
2015-05-13 14:28:47,298 INFO Skipping index logstash-2014.09.12: Already closed.
2015-05-13 14:28:47,298 INFO Attempting to disable bloom filter for index logstash-2014.09.13.
2015-05-13 14:28:47,300 INFO Skipping index logstash-2014.09.13: Already closed.
2015-05-13 14:28:47,300 INFO Attempting to disable bloom filter for index logstash-2014.09.14.
2015-05-13 14:28:47,302 INFO Skipping index logstash-2014.09.14: Already closed.

2015-05-13 14:32:16,692 INFO Attempting to optimize index logstash-2015.02.23.
2015-05-13 14:32:16,985 INFO Skipping index logstash-2015.02.23: Already optimized.
2015-05-13 14:32:16,986 INFO Attempting to optimize index logstash-2015.02.24.
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/curator/curator.py", line 736, in <module>
main()
File "/usr/lib/python2.7/site-packages/curator/curator.py", line 731, in main
arguments.func(client, **argdict)
File "/usr/lib/python2.7/site-packages/curator/curator.py", line 585, in command_loop
skipped = op(client, index_name, **kwargs)
File "/usr/lib/python2.7/site-packages/curator/curator.py", line 406, in _create_snapshot
client.snapshot.create(repository=repository, snapshot=snap_name, body=body, wait_for_completion=wait_for_completion)
File "/usr/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 68, in _wrapped
return func(*args, params=params, **kwargs)
File "/usr/lib/python2.7/site-packages/elasticsearch/client/snapshot.py", line 22, in create
repository, snapshot), params=params, body=body)
File "/usr/lib/python2.7/site-packages/elasticsearch/transport.py", line 301, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/usr/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 82, in perform_request
self._raise_error(response.status, raw_data)
File "/usr/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 102, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.TransportError: TransportError(503, u'ConcurrentSnapshotExecutionException[[nlsbackup:logstash-2015.02.13] a snapshot is already running]')

jolson · Post by **jolson** » Wed May 13, 2015 9:39 am

[[nlsbackup:logstash-2015.02.13] a snapshot is already running]

Is this where your backup procedure stopped? If so, let's try killing the above snapshot and re-running the backup procedure once more as per my previous post.

Code: Select all

curator delete --prefix logstash-2015.02.13 --older-than 1

Start the jobs.log tail, and re-run the backup command from the GUI.

Nagios Support Forum

backup

Re: backup

Re: backup

Re: backup

Re: backup

Re: backup

Re: backup

Re: backup

Re: backup

Re: backup

Re: backup