backup

jolson · Post by **jolson** » Fri Mar 20, 2015 12:29 pm

Okay - now we are at a point where jobs.log should output the information that we need to get this issue moving.

Please take a tail of jobs.log from all nodes - it needs to be a follow tail because the jobs.log files are truncated very often.

Before attempting a backup job, use the following command on every node:
tail -f /usr/local/nagioslogserver/var/jobs.log

Then, force a backup_maintenance command from the Command Subsystem. One of your nodes will take the job and run with it - it should display errors if the backup process isn't working. Please let me know what those errors are. Thanks!

pccwglobalit · Post by **pccwglobalit** » Tue May 05, 2015 9:38 am

For backup, which node will perform backup? any script i can run in command mode ?
i see the /usr/local/nagioslogserver/scripts/create_backup.sh backup directory is BACKUP_DIR="/store/backups/nagioslogserver" and that is different with my setting in GUI.

can i select the day and do the backup in command mode manually?

thanks

Post by **tgriep** » Tue May 05, 2015 12:55 pm

Running the create_backup.sh isn't what you want to do. It only backups the configuration and not the logs.
Here is what you need to do.
Login to each of the nodes as root in a shell and run the following.

Code: Select all

tail -f /usr/local/nagioslogserver/var/jobs.log

Then on one of the nodes, login to logserver's GUI and go to, "Administration" > "Command Subsystem"
Click on "Edit" for the backups and change the "Next Run Time" to be 5 minutes in the future and save the settings.

On one of the nodes, that the tail -f command is running, you should see the backup happening, post that here so we can look at the errors.

pccwglobalit · Post by **pccwglobalit** » Wed May 06, 2015 8:45 am

here is
2015-05-06 13:21:14,173 ERROR Error: TransportError(404, u'RemoteTransportException[[181841a1-d717-437c-bd36-6d4a8344abe6][inet[/192.168.78.10:9300]]
[cluster/snapshot/get]]; nested: RepositoryMissingException[[nls-backup] missing]; ')

2015-05-06 13:22:14,946 INFO Attempting to optimize index logstash-2015.04.20.
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/curator/curator.py", line 736, in <module>
main()
File "/usr/lib/python2.7/site-packages/curator/curator.py", line 731, in main
arguments.func(client, **argdict)
File "/usr/lib/python2.7/site-packages/curator/curator.py", line 585, in command_loop
skipped = op(client, index_name, **kwargs)
File "/usr/lib/python2.7/site-packages/curator/curator.py", line 406, in _create_snapshot
client.snapshot.create(repository=repository, snapshot=snap_name, body=body, wait_for_completion=wait_for_completion)
File "/usr/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 68, in _wrapped
return func(*args, params=params, **kwargs)
File "/usr/lib/python2.7/site-packages/elasticsearch/client/snapshot.py", line 22, in create
repository, snapshot), params=params, body=body)
File "/usr/lib/python2.7/site-packages/elasticsearch/transport.py", line 307, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/usr/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 86, in perform_request
self._raise_error(response.status, raw_data)
File "/usr/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 102, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.NotFoundError: TransportError(404, u'RemoteTransportException[[181841a1-d717-437c-bd36-6d4a8344abe6][inet[/192.168.78.10:9300]][
cluster/snapshot/create]]; nested: RepositoryMissingException[[nls-backup] missing]; ')

2015-05-06 13:22:02,289 INFO logstash-2015.02.05: Successfully closed.
2015-05-06 13:22:02,290 INFO logstash-2015.02.06 is within the threshold period (90 days).
2015-05-06 13:22:02,290 INFO logstash-2015.02.07 is within the threshold period (90 days).
2015-05-06 13:22:02,290 INFO logstash-2015.02.08 is within the threshold period (90 days).
2015-05-06 13:22:02,290 INFO logstash-2015.02.09 is within the threshold period (90 days).
2015-05-06 13:22:02,291 INFO logstash-2015.02.10 is within the threshold period (90 days).
2015-05-06 13:22:02,291 INFO logstash-2015.02.11 is within the threshold period (90 days).

pccwglobalit · Post by **pccwglobalit** » Wed May 06, 2015 8:47 am

We have four nodes, can we force one selected one to do backup?

jolson · Post by **jolson** » Wed May 06, 2015 9:20 am

We have four nodes, can we force one selected one to do backup?

Currently backups are run as 'global jobs' - this means that one node in your cluster will randomly pick the job up and run it. This distributes the jobs, and is also the reason why all nodes need access to the backup repository.

There are a few errors that I want to bring to your attention:

2015-05-06 13:22:14,946 INFO Attempting to optimize index logstash-2015.04.20.
Traceback (most recent call last):

It's possible that this index is corrupt, since it's generating a traceback. You could remove it with the following command:
curl -XDELETE 'http://localhost:9200/logstash-2015.04.20/'
After deletion, try running the backup again.

elasticsearch.exceptions.NotFoundError: TransportError(404, u'RemoteTransportException[[181841a1-d717-437c-bd36-6d4a8344abe6][inet[/192.168.78.10:9300]][
cluster/snapshot/create]]; nested: RepositoryMissingException[[nls-backup] missing]; ')

This error indicates that your repository could be missing. Please ensure that the 'nagios' user has read and write privileges to your repository.

pccwglobalit · Post by **pccwglobalit** » Thu May 07, 2015 8:11 am

will this remove all log on 4-20?

can we fix one node to do backup?

jolson · Post by **jolson** » Thu May 07, 2015 9:26 am

will this remove all log on 4-20?

That is correct - all log information from 4-20 would be erased.

can we fix one node to do backup?

All nodes will need to have proper access to the repository to do backups - not just one. This is because the job can be picked up by any single node, therefore all nodes will need proper access.

pccwglobalit · Post by **pccwglobalit** » Thu May 07, 2015 10:20 am

Our backup respository is using NFS server and four nodes uid is different, how can we solve it? change all hosts uid to same? or any other option?

jolson · Post by **jolson** » Thu May 07, 2015 10:39 am

Our backup respository is using NFS server and four nodes uid is different, how can we solve it? change all hosts uid to same? or any other option?

The most manageable solution would be to use something similar to NIS to manage consistent UID's across your network. You can change the UIDs manually, but that can cause a lot of pain administratively.

There are many other ways to approach this problem. Below is one such way:

You can avoid the problem of UID's by granting NFS permissions based on the IP address of the client in question. Let's assume three servers - NFS, NLS1, and NLS2. NLS1 has an IP of 10.0.0.1 and NLS2 has an IP of 10.0.0.2.

NFS:

Code: Select all

cat /etc/exports
/nlsbackup            10.0.0.1/32(rw) 10.0.0.2/32(rw)

NLS1&NLS2:

Code: Select all

cat /etc/fstab
NFSIP:/nlsbackup /mnt/nlsback nfs defaults 0 0

This should work appropriately, but requires a little more management overhead. It also requires on source IP addresses for security - which could be undesirable. Let me know if you need any help along the way. Thanks!

Nagios Support Forum

backup

Re: backup

Re: backup

Re: backup

Re: backup

Re: backup

Re: backup

Re: backup

Re: backup

Re: backup

Re: backup