Home » Categories » Multiple Categories

Nagios Log Server - Troubleshooting Backups

Overview

This article explains how to troubleshoot backups in Nagios Log Server.

There are two methods in which you can diagnose backup issues:

  • Watching the log server job logs and forcing a backup

  • Using the command line to execute backups

 

Watch Nagios Log Server Job Log

When using this method, you will force the backup to be executed through the web interface. Once doing this the backup with be executed by one of the nodes in the cluster (not specifically the node that you just executed the command through the web interface). With this in mind, you will need to watch the job log on ALL nodes in the cluster.

Open an SSH session to each node in your cluster.

Execute the following command:

tail -f /usr/local/nagioslogserver/var/jobs.log

 

Once you have done this on all nodes in the cluster, open the Nagios Log Server GUI.

On the top menu bar click Admin.

System > Command Subsystem

Click Edit for the backups job

Click inside the Next Run Time field

In the drop down calendar that appears click Now

Click Done

Click Update

Once you have done this, you will now need to watch all the SSH sessions to observe the backup process.

In the GUI, the job will show as running, you know when it is complete when this changes to waiting.

 

 

Command Line Backup

Unlike the previous steps, this command only needs an SSH session to one of the nodes in the cluster.

Open an SSH session to a node in your cluster.

Execute the following command:

curl -XGET "localhost:9200/_snapshot?pretty"

 

The purpose of this command was to get the name to the backup snapshot store to use in the following command. You can see in the following output the name we need to use is Common_Backups

{
  "Common_Backups" : {
    "type" : "fs",
    "settings" : {
      "compress" : "true",
      "location" : "/mnt/nagios_log_server_common_backups"
    }
  }
}

 

 This command is what will execute the backup:

curator snapshot --repository "Common_Backups" indices --all-indices

 

Here is an example of the output produced while this command executes:

2016-04-15 13:52:40,373 INFO      Job starting: snapshot indices
2016-04-15 13:52:40,373 WARNING   Overriding default connection timeout.  New timeout: 21600
2016-04-15 13:52:40,438 INFO      Matching all indices. Ignoring flags other than --exclude.
2016-04-15 13:52:40,439 INFO      Action snapshot will be performed on the following indices: [u'kibana-int', u'logstash-2015.03.23', u'logstash-2015.03.24', u'logstash-2015.03.25']
2016-04-15 13:52:44,829 INFO      Snapshot name: curator-20160415035244
2016-04-15 13:53:04,015 INFO      Snapshot curator-20160415035244 successfully completed.
2016-04-15 13:53:04,015 INFO      Job completed successfully.

 

Note: The duration this command will run for depends on how much data exists in your log server implementation.

 

If you wanted more detailed output, you can run the command with the debug argument:

curator --loglevel debug snapshot --repository "Common_Backups" indices --all-indices

 

In addition to this, if you wanted to output all the data to a log file, the following command can be used:

curator --loglevel debug --logfile /tmp/test_backup.txt snapshot --repository "Common_Backups" indices --all-indices

 

Note: There will be no output displayed on the screen while this command runs as it is all being redirected to the log file. You will know when the command has completed as you will be returned to the bash prompt.

 

 

Cluster Master Node

It is important to understand which node is the cluster master. The cluster master is the node responsible for performing the actual backup.

Open a terminal session to a node in your cluster.

Execute the following command:

curl 'localhost:9200/_cat/master?v'

 

The output will be similar to:

id                     host                     ip         node                                 
JLpicZIOQSez77kwzJKx7g nls-c7x-x64.box293.local 10.25.5.86 4ab27926-bbb0-4a5e-bb7f-4eb9fba97643

 

It is from the node in this output you should do any backup testing. You should test all the nodes can perform a backup when they are the master. There is no command to change a node master from one node to another, however restarting elasticsearch service will force another node to become a master. The command to restart the elasictsearch service is:

 

RHEL 7 + | CentOS 7 + | Debian | Ubuntu 16/18/20

systemctl restart elasticsearch.service

 

After restarting the service and waiting a minute, execute the master command which should now show the new master:

id                     host                     ip         node                                 
LYSbImCgT9CHl6iqss1S0g nls-r7x-x64.box293.local 10.25.5.99 4c5786bd-1382-44b6-bb67-88a9c0d3e7ea

 

 

Note About Backup Repository

As per this documentation:

Documentation - Managing Backups and Maintenance

This paragraph is important:

When you are on the Backup & Maintenance page, the table on the right labelled Repositories is where you will set the location for your Nagios Log Server backup to be stored.  This location must be a shared network path writeable by the nagios user and available to ALL instances in your cluster.

 

 

Final Thoughts

For any support related questions please visit the Nagios Support Forums at:

http://support.nagios.com/forum/

1 (1)
Article Rating (1 Votes)
Rate this article
  • Icon PDFExport to PDF
  • Icon MS-WordExport to MS Word
Attachments Attachments
There are no attachments for this article.
Related Articles RSS Feed
Nagios Log Server - Resetting nagiosadmin Password
Viewed 11075 times since Tue, Aug 9, 2016
Nagios Log Server - Managing Snapshots and Maintenance
Viewed 5188 times since Thu, Jan 28, 2016
Nagios Log Server - Logstash not running after upgrade to 2.0
Viewed 3627 times since Tue, Nov 14, 2017
Nagios Log Server - Waiting For Database Startup
Viewed 5958 times since Wed, Oct 12, 2016
Nagios Log Server - License Key Not Accepted
Viewed 4108 times since Wed, Apr 12, 2017
Nagios Log Server - Understanding and Troubleshooting Yellow Cluster Health
Viewed 10097 times since Mon, Feb 15, 2016
Forwarding Logs from Nagios Log Server to Another Destination
Viewed 5190 times since Wed, Sep 16, 2020
Nagios Log Server - Troubleshooting Commands
Viewed 5874 times since Mon, Feb 11, 2019
Active Directory / LDAP - Troubleshooting Authentication Integration
Viewed 16234 times since Mon, Jun 26, 2017
Pages Not Displaying Correctly
Viewed 6894 times since Mon, Jan 25, 2016