Page 1 of 2

Incides are not backing up

Posted: Wed Nov 11, 2015 11:34 am
by krobertson71
LogServer 201451.4

Noticed today that my backups have stopped. I dont normally check these very often as I have been on other projects. Looks like the last listed indice is for August.

Code: Select all

drwxr-xr-x 5 nagios nagcmd 4096 Aug 13 22:52 logstash-2015.08.13
drwxr-xr-x 5 nagios nagcmd 4096 Aug 14 22:52 logstash-2015.08.14
drwxr-xr-x 5 nagios nagcmd 4096 Aug 15 22:52 logstash-2015.08.15
drwxr-xr-x 5 nagios nagcmd 4096 Aug 16 22:52 logstash-2015.08.16
drwxr-xr-x 5 nagios nagcmd 4096 Aug 17 22:52 logstash-2015.08.17
drwxr-xr-x 3 nagios nagcmd 4096 Aug 18 22:53 logstash-2015.08.18
drwxr-xr-x 5 nagios nagcmd 4096 Aug 19 22:53 logstash-2015.08.19
drwxr-xr-x 5 nagios nagcmd 4096 Aug 20 22:53 logstash-2015.08.20
drwxr-xr-x 5 nagios nagcmd 4096 Aug 21 22:53 logstash-2015.08.21
drwxr-xr-x 5 nagios nagcmd 4096 Aug 22 22:53 logstash-2015.08.22
drwxr-xr-x 5 nagios nagcmd 4096 Aug 23 22:53 logstash-2015.08.23
drwxr-xr-x 5 nagios nagcmd 4096 Aug 24 22:53 logstash-2015.08.24
drwxr-xr-x 5 nagios nagcmd 4096 Aug 25 22:53 logstash-2015.08.25
Logs do not show any errors..

Can someone point me in the right direction on where to look to see why this has stopped?

Re: Incides are not backing up

Posted: Wed Nov 11, 2015 11:48 am
by jdalrymple
Without spending a ton of time debugging I would jump on recommending an upgrade. There have been vast improvements in the backup system since the release you're using:

https://assets.nagios.com/downloads/nag ... HANGES.TXT

VAST - I'd bet $1 that simply upgrading will 100% solve your issues.

Re: Incides are not backing up

Posted: Wed Nov 11, 2015 11:58 am
by krobertson71
I agree, and that is planned. However I want to make sure we have this function working properly before hand as we keep 7 days live. This was working fine.

I did fine a log I didn't look at before at /var/log/elasticsearch/ which I have attached.

Seeing this...

Code: Select all

[2015-11-09 18:02:53,080][DEBUG][action.admin.indices.optimize] [11fe29cc-9353-4cc1-a368-14a0b6977937] [logstash-2015.04.23][2], node[jcn8VnF5QayoHQkXiGZXog], [P], s[STARTED]: failed to executed [org.elasticsearch.action.admin.indices.optimize.OptimizeRequest@7f9f8c19]
org.elasticsearch.index.engine.OptimizeFailedEngineException: [logstash-2015.04.23][2] Optimize failed
        at org.elasticsearch.index.engine.internal.InternalEngine.optimize(InternalEngine.java:1021)
I see this repeating many times.

Re: Incides are not backing up

Posted: Wed Nov 11, 2015 12:22 pm
by jdalrymple
The best log to look at isn't persistent.

Can you do a tail -f on /usr/local/nagioslogserver/var/jobs.log and then run your backup? The output there is likely going to be the most useful.

Also can we check the health of your indices?

Code: Select all

curl 'localhost:9200/_cluster/health?level=indices&pretty'
Thanks

Re: Incides are not backing up

Posted: Wed Nov 11, 2015 12:49 pm
by krobertson71

Code: Select all

 var]$ tail -f jobs.log
Running command run_alerts with args ' ' for job id: run_all_alerts
SUCCESS
Processed 0 node jobs.
Processed 1 global jobs.
tail: jobs.log: file truncated
Running command do_backups with args ' ' for job id: backups
SUCCESS

As you can see the job executed with status SUCCESS.. however..no new entry in our /backups directory

Code: Select all

drwxr-xr-x 3 nagios nagcmd 4096 Aug 18 22:53 logstash-2015.08.18
drwxr-xr-x 5 nagios nagcmd 4096 Aug 19 22:53 logstash-2015.08.19
drwxr-xr-x 5 nagios nagcmd 4096 Aug 20 22:53 logstash-2015.08.20
drwxr-xr-x 5 nagios nagcmd 4096 Aug 21 22:53 logstash-2015.08.21
drwxr-xr-x 5 nagios nagcmd 4096 Aug 22 22:53 logstash-2015.08.22
drwxr-xr-x 5 nagios nagcmd 4096 Aug 23 22:53 logstash-2015.08.23
drwxr-xr-x 5 nagios nagcmd 4096 Aug 24 22:53 logstash-2015.08.24
drwxr-xr-x 5 nagios nagcmd 4096 Aug 25 22:53 logstash-2015.08.25
Here is the screenshot of our backup configuration:

[attachment=0]nlsbackupconfig.png[/attachmen

Code: Select all

{
  "cluster_name" : "907e60a9-dc29-411e-96e8-2dfe503e0867",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 2,
  "active_primary_shards" : 61,
  "active_shards" : 122,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "indices" : {
    "logstash-2015.04.25" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "nagioslogserver" : {
      "status" : "green",
      "number_of_shards" : 1,
      "number_of_replicas" : 1,
      "active_primary_shards" : 1,
      "active_shards" : 2,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "logstash-2015.04.26" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "logstash-2015.11.05" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "logstash-2015.11.04" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "logstash-2015.11.07" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "logstash-2015.11.06" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "logstash-2015.11.09" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "logstash-2015.11.08" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "nagioslogserver_log" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "logstash-2015.11.10" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "logstash-2015.11.11" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "kibana-int" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    }
  }
}

Re: Incides are not backing up

Posted: Wed Nov 11, 2015 12:52 pm
by krobertson71
Also I failed to mention this issue:

nlsoldindices.png




As you can see from the screenshot we have a source that seems to have it's system time off, or is sending old events. I can delete these and they will show up again within a few hours.

Since they are so far out of the index dates could this be causing the issue?

The 7 days limit works just fine with the normal daily indexes but leaves these two behind.

Re: Incides are not backing up

Posted: Wed Nov 11, 2015 1:20 pm
by jdalrymple
krobertson71 wrote:As you can see from the screenshot we have a source that seems to have it's system time off, or is sending old events. I can delete these and they will show up again within a few hours.
To the best of my limited knowledge, the indexes are based upon receipt timestamp, not having anything to do with a timestamp in the event. You have to write a filter if you want the @_timestamp field replaced, and even then I'm not sure it's possible.

I'm assuming that these indexes always have a fixed size? Or are they growing/shrinking when you delete them? If the time was really far off on a node I think we'd have a not-green cluster status. I'm really not sure what's going on. Take a look at their last modified date on the system:

Code: Select all

ls -l /usr/local/nagioslogserver/elasticsearch/data/907e60a9-dc29-411e-96e8-2dfe503e0867/nodes/0/indices

Re: Incides are not backing up

Posted: Wed Nov 11, 2015 1:29 pm
by krobertson71
Will do.

Can we address the other information I provided for the current backup of indexes issues?

Re: Incides are not backing up

Posted: Wed Nov 11, 2015 2:06 pm
by jdalrymple
I'm not entirely convinced that the problems aren't related...

Re: Incides are not backing up

Posted: Wed Nov 11, 2015 2:12 pm
by krobertson71
Here are the results on your command.

No the size will vary along with the date. Two days from now they will read 4-27 etc...

Code: Select all

 nagioslogserver]$ ls -l /usr/local/nagioslogserver/elasticsearch/data/907e60a9-dc29-411e-96e8-2dfe503e0867/nodes/0/indices
total 56
drwxr-xr-x 8 nagios nagcmd 4096 May 11  2015 kibana-int
drwxr-xr-x 8 nagios nagcmd 4096 Nov 10 18:03 logstash-2015.04.25
drwxr-xr-x 8 nagios nagcmd 4096 Nov 11 11:50 logstash-2015.04.26
drwxr-xr-x 8 nagios nagcmd 4096 Nov  3 14:00 logstash-2015.11.04
drwxr-xr-x 8 nagios nagcmd 4096 Nov  4 14:00 logstash-2015.11.05
drwxr-xr-x 8 nagios nagcmd 4096 Nov  5 14:00 logstash-2015.11.06
drwxr-xr-x 8 nagios nagcmd 4096 Nov  6 14:00 logstash-2015.11.07
drwxr-xr-x 8 nagios nagcmd 4096 Nov  7 14:00 logstash-2015.11.08
drwxr-xr-x 8 nagios nagcmd 4096 Nov  8 14:00 logstash-2015.11.09
drwxr-xr-x 8 nagios nagcmd 4096 Nov  9 14:00 logstash-2015.11.10
drwxr-xr-x 8 nagios nagcmd 4096 Nov 10 14:00 logstash-2015.11.11
drwxr-xr-x 8 nagios nagcmd 4096 Nov 11 14:00 logstash-2015.11.12
drwxr-xr-x 4 nagios nagcmd 4096 May 11  2015 nagioslogserver
drwxr-xr-x 8 nagios nagcmd 4096 May 11  2015 nagioslogserver_log