Backup snapshots disappeared

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
Locked
batzos
Posts: 21
Joined: Mon Oct 05, 2015 2:36 am

Backup snapshots disappeared

Post by batzos »

I have a 2 instances cluster. I stopped and restarted elasticsearch service from the CLI of the 1st server and since then I cannot see my backup snapshots in "Backup & Maintenance". I reset all jobs from the "Command Subsystem", but nothing. In no instance they are visible. I do not know if the following has an impact, but the last weeks, in the 1st instance I get the message from the system status that both elasticsearch and logstash are stopped (!) and in the second one only logstash is stopped.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Backup snapshots disappeared

Post by jolson »

in the 1st instance I get the message from the system status that both elasticsearch and logstash are stopped
That could matter. I'm interested in the following information from that node:

Code: Select all

df -h
free -m
top | head -n5
cat /etc/sysconfig/logstash
tail -n300 /var/log/elasticsearch/*.log
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
batzos
Posts: 21
Joined: Mon Oct 05, 2015 2:36 am

Re: Backup snapshots disappeared

Post by batzos »

I may have found the reason of the "disappearance" of the snapshots from my first server in the cluster. This cluster consists of 2 servers in LAN. After this, I installed another Nagios log server in DMZi. I have assigned to the server in DMZi the same backup repository as this one for the 1st cluster, even though it is not part of the first cluster, it is a separate NLS. I guess, when I restarted elasticsearch in the 1st one, the snapshots were lost and now I can see them all in the DMZi NLS. How can I get them back to the 1st server?
Below are the results of the commands:

Code: Select all

[root@ ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg01-rootvol
                      252G   20G  220G   8% /
tmpfs                 7.8G     0  7.8G   0% /dev/shm
/dev/sda1             248M   76M  160M  33% /boot
[root@eicillp095 ~]#



[root@ ~]# free -m
             total       used       free     shared    buffers     cached
Mem:         15947      15684        263          0        220       4799
-/+ buffers/cache:      10663       5283
Swap:         1023         83        940
[root@eicillp095 ~]#

[root@ ~]# top | head -n5
top - 08:53:08 up 47 days, 10 min,  1 user,  load average: 0.09, 0.04, 0.01
Tasks: 199 total,   1 running, 198 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.6%us,  0.8%sy,  0.4%ni, 96.0%id,  0.1%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  16330152k total, 16060412k used,   269740k free,   225932k buffers
Swap:  1048572k total,    85872k used,   962700k free,  4915352k cached


[root@ ~]# cat /etc/sysconfig/logstash
###############################
# Default settings for logstash
###############################

# Override Java location
#JAVACMD=/usr/bin/java

# Set a home directory
APP_DIR=/usr/local/nagioslogserver
LS_HOME="$APP_DIR/logstash"

# set ES_CLUSTER
ES_CLUSTER=$(cat $APP_DIR/var/cluster_uuid)

# Arguments to pass to java
#LS_HEAP_SIZE="256m"
LS_JAVA_OPTS="-Djava.io.tmpdir=$APP_DIR/tmp"

# Logstash filter worker threads
#LS_WORKER_THREADS=1

# pidfiles aren't used for upstart; this is for sysv users.
#LS_PIDFILE=/var/run/logstash.pid

# user id to be invoked as; for upstart: edit /etc/init/logstash.conf
LS_USER=root
LS_GROUP=nagios

# logstash logging
#LS_LOG_FILE=/var/log/logstash/logstash.log
#LS_USE_GC_LOGGING="true"

# logstash configuration directory
LS_CONF_DIR="$LS_HOME/etc/conf.d"

# Open file limit; cannot be overridden in upstart
#LS_OPEN_FILES=2048

# Nice level
#LS_NICE=0

# Increase Filter workers to 4 threads
LS_OPTS=" -w 4"

if [ "x$1" == "xstart" -o "x$1" == "xrestart" -o "x$1" == "xreload" ];then
        GET_LOGSTASH_CONFIG_MESSAGE=$( php /usr/local/nagioslogserver/scripts/get_logstash_config.php )
        GET_LOGSTASH_CONFIG_RETURN=$?
        if [ "$GET_LOGSTASH_CONFIG_RETURN" != "0" ]; then
                echo $GET_LOGSTASH_CONFIG_MESSAGE
                exit 1
        fi
fi

[root@ ~]# tail -n300 /var/log/elasticsearch/*.log
==> /var/log/elasticsearch/5a2aeff9-fe3d-4f48-bd79-118614f9436d_index_indexing_slowlog.log <==

==> /var/log/elasticsearch/5a2aeff9-fe3d-4f48-bd79-118614f9436d_index_search_slowlog.log <==

==> /var/log/elasticsearch/5a2aeff9-fe3d-4f48-bd79-118614f9436d.log <==


jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Backup snapshots disappeared

Post by jolson »

I have assigned to the server in DMZi the same backup repository as this one for the 1st cluster
What backup repository are you using - I assume an NFS server or similar? Also, I'm interested in the mount-point you're using on Nagios Log Server.
I guess, when I restarted elasticsearch in the 1st one, the snapshots were lost and now I can see them all in the DMZi NLS
You mean your old snapshots appear on the second cluster and not the first? That's very strange behavior. It's worth noting that two distinct clusters must never be connected to the same share - they have the possibility of overwriting each others data.

If you disconnect the second cluster from your backup repository, I'm willing to bet that your first cluster will re-acquire the data once it runs another backup.
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
batzos
Posts: 21
Joined: Mon Oct 05, 2015 2:36 am

Re: Backup snapshots disappeared

Post by batzos »

I am using a CIFS share with NFS access. With mount point do you mean the "location" in the backup configuration? In that case it is: /net/logs.../.../...
The snapshots disappeared from both instances from the same cluster.
Regarding the bet, you would lose it, because the backup reappeared only when I removed the repository from the primary instance and remounted it. It did not work when I removed it from the 2nd cluster. Anyway all now is back to normal.
Some last issues before you close this thread.
- When we add new instances in the same cluster, the backup is automatically assigned to them as it is in the first instance or do we have to do it manually each time? I did it manually in the 2nd instance and in Backup snapshots list I get "Created / Name (Click )" and instead of the name I have "N/A i" and if I click on the i icon I get the name.
- I also got as a first entry a "curator" for logs of the past 10 days (I have set to close indexes after 10 days). Is it because backup is taken every day for the current logs and not after the period that is set to close them? I have lost though the backup of a period of 15 days before that, but there is no problem since they are test logs.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Backup snapshots disappeared

Post by jolson »

- When we add new instances in the same cluster, the backup is automatically assigned to them as it is in the first instance or do we have to do it manually each time? I did it manually in the 2nd instance and in Backup snapshots list I get "Created / Name (Click )" and instead of the name I have "N/A i" and if I click on the i icon I get the name.
Currently the process of mounting your backup share must be repeated manually on each new instance added to the cluster. This is because any instance in the cluster may pick up and run the backup job, and there's no telling which instance it will be.

The backup process was upgraded in Nagios Log Server 1.4.0, and the 'N/A' fields you are seeing are likely from backups taken before the upgrade. I'll refrain from betting - but did you recently upgrade your cluster? All of your old backups will still function properly, but they will have missing information (indicated by N/A) that is present in newer backups.
- I also got as a first entry a "curator" for logs of the past 10 days (I have set to close indexes after 10 days). Is it because backup is taken every day for the current logs and not after the period that is set to close them? I have lost though the backup of a period of 15 days before that, but there is no problem since they are test logs.
Would you please send us a screenshot of what this looks like on your end? The new backups will list all of the indices backed up every time the backup process runs - the new process is incremental. Even though it looks like there are duplicate backups being taken daily, there are not. A screenshot would help clarify your question.

Thanks!

Jesse
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
Locked