Incides are not backing up

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
krobertson71
Posts: 444
Joined: Tue Feb 11, 2014 10:16 pm

Incides are not backing up

Post by krobertson71 »

LogServer 201451.4

Noticed today that my backups have stopped. I dont normally check these very often as I have been on other projects. Looks like the last listed indice is for August.

Code: Select all

drwxr-xr-x 5 nagios nagcmd 4096 Aug 13 22:52 logstash-2015.08.13
drwxr-xr-x 5 nagios nagcmd 4096 Aug 14 22:52 logstash-2015.08.14
drwxr-xr-x 5 nagios nagcmd 4096 Aug 15 22:52 logstash-2015.08.15
drwxr-xr-x 5 nagios nagcmd 4096 Aug 16 22:52 logstash-2015.08.16
drwxr-xr-x 5 nagios nagcmd 4096 Aug 17 22:52 logstash-2015.08.17
drwxr-xr-x 3 nagios nagcmd 4096 Aug 18 22:53 logstash-2015.08.18
drwxr-xr-x 5 nagios nagcmd 4096 Aug 19 22:53 logstash-2015.08.19
drwxr-xr-x 5 nagios nagcmd 4096 Aug 20 22:53 logstash-2015.08.20
drwxr-xr-x 5 nagios nagcmd 4096 Aug 21 22:53 logstash-2015.08.21
drwxr-xr-x 5 nagios nagcmd 4096 Aug 22 22:53 logstash-2015.08.22
drwxr-xr-x 5 nagios nagcmd 4096 Aug 23 22:53 logstash-2015.08.23
drwxr-xr-x 5 nagios nagcmd 4096 Aug 24 22:53 logstash-2015.08.24
drwxr-xr-x 5 nagios nagcmd 4096 Aug 25 22:53 logstash-2015.08.25
Logs do not show any errors..

Can someone point me in the right direction on where to look to see why this has stopped?
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Incides are not backing up

Post by jdalrymple »

Without spending a ton of time debugging I would jump on recommending an upgrade. There have been vast improvements in the backup system since the release you're using:

https://assets.nagios.com/downloads/nag ... HANGES.TXT

VAST - I'd bet $1 that simply upgrading will 100% solve your issues.
krobertson71
Posts: 444
Joined: Tue Feb 11, 2014 10:16 pm

Re: Incides are not backing up

Post by krobertson71 »

I agree, and that is planned. However I want to make sure we have this function working properly before hand as we keep 7 days live. This was working fine.

I did fine a log I didn't look at before at /var/log/elasticsearch/ which I have attached.

Seeing this...

Code: Select all

[2015-11-09 18:02:53,080][DEBUG][action.admin.indices.optimize] [11fe29cc-9353-4cc1-a368-14a0b6977937] [logstash-2015.04.23][2], node[jcn8VnF5QayoHQkXiGZXog], [P], s[STARTED]: failed to executed [org.elasticsearch.action.admin.indices.optimize.OptimizeRequest@7f9f8c19]
org.elasticsearch.index.engine.OptimizeFailedEngineException: [logstash-2015.04.23][2] Optimize failed
        at org.elasticsearch.index.engine.internal.InternalEngine.optimize(InternalEngine.java:1021)
I see this repeating many times.
You do not have the required permissions to view the files attached to this post.
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Incides are not backing up

Post by jdalrymple »

The best log to look at isn't persistent.

Can you do a tail -f on /usr/local/nagioslogserver/var/jobs.log and then run your backup? The output there is likely going to be the most useful.

Also can we check the health of your indices?

Code: Select all

curl 'localhost:9200/_cluster/health?level=indices&pretty'
Thanks
krobertson71
Posts: 444
Joined: Tue Feb 11, 2014 10:16 pm

Re: Incides are not backing up

Post by krobertson71 »

Code: Select all

 var]$ tail -f jobs.log
Running command run_alerts with args ' ' for job id: run_all_alerts
SUCCESS
Processed 0 node jobs.
Processed 1 global jobs.
tail: jobs.log: file truncated
Running command do_backups with args ' ' for job id: backups
SUCCESS

As you can see the job executed with status SUCCESS.. however..no new entry in our /backups directory

Code: Select all

drwxr-xr-x 3 nagios nagcmd 4096 Aug 18 22:53 logstash-2015.08.18
drwxr-xr-x 5 nagios nagcmd 4096 Aug 19 22:53 logstash-2015.08.19
drwxr-xr-x 5 nagios nagcmd 4096 Aug 20 22:53 logstash-2015.08.20
drwxr-xr-x 5 nagios nagcmd 4096 Aug 21 22:53 logstash-2015.08.21
drwxr-xr-x 5 nagios nagcmd 4096 Aug 22 22:53 logstash-2015.08.22
drwxr-xr-x 5 nagios nagcmd 4096 Aug 23 22:53 logstash-2015.08.23
drwxr-xr-x 5 nagios nagcmd 4096 Aug 24 22:53 logstash-2015.08.24
drwxr-xr-x 5 nagios nagcmd 4096 Aug 25 22:53 logstash-2015.08.25
Here is the screenshot of our backup configuration:

[attachment=0]nlsbackupconfig.png[/attachmen

Code: Select all

{
  "cluster_name" : "907e60a9-dc29-411e-96e8-2dfe503e0867",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 2,
  "active_primary_shards" : 61,
  "active_shards" : 122,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "indices" : {
    "logstash-2015.04.25" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "nagioslogserver" : {
      "status" : "green",
      "number_of_shards" : 1,
      "number_of_replicas" : 1,
      "active_primary_shards" : 1,
      "active_shards" : 2,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "logstash-2015.04.26" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "logstash-2015.11.05" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "logstash-2015.11.04" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "logstash-2015.11.07" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "logstash-2015.11.06" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "logstash-2015.11.09" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "logstash-2015.11.08" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "nagioslogserver_log" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "logstash-2015.11.10" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "logstash-2015.11.11" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    },
    "kibana-int" : {
      "status" : "green",
      "number_of_shards" : 5,
      "number_of_replicas" : 1,
      "active_primary_shards" : 5,
      "active_shards" : 10,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0
    }
  }
}
You do not have the required permissions to view the files attached to this post.
krobertson71
Posts: 444
Joined: Tue Feb 11, 2014 10:16 pm

Re: Incides are not backing up

Post by krobertson71 »

Also I failed to mention this issue:

nlsoldindices.png




As you can see from the screenshot we have a source that seems to have it's system time off, or is sending old events. I can delete these and they will show up again within a few hours.

Since they are so far out of the index dates could this be causing the issue?

The 7 days limit works just fine with the normal daily indexes but leaves these two behind.
You do not have the required permissions to view the files attached to this post.
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Incides are not backing up

Post by jdalrymple »

krobertson71 wrote:As you can see from the screenshot we have a source that seems to have it's system time off, or is sending old events. I can delete these and they will show up again within a few hours.
To the best of my limited knowledge, the indexes are based upon receipt timestamp, not having anything to do with a timestamp in the event. You have to write a filter if you want the @_timestamp field replaced, and even then I'm not sure it's possible.

I'm assuming that these indexes always have a fixed size? Or are they growing/shrinking when you delete them? If the time was really far off on a node I think we'd have a not-green cluster status. I'm really not sure what's going on. Take a look at their last modified date on the system:

Code: Select all

ls -l /usr/local/nagioslogserver/elasticsearch/data/907e60a9-dc29-411e-96e8-2dfe503e0867/nodes/0/indices
krobertson71
Posts: 444
Joined: Tue Feb 11, 2014 10:16 pm

Re: Incides are not backing up

Post by krobertson71 »

Will do.

Can we address the other information I provided for the current backup of indexes issues?
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Incides are not backing up

Post by jdalrymple »

I'm not entirely convinced that the problems aren't related...
krobertson71
Posts: 444
Joined: Tue Feb 11, 2014 10:16 pm

Re: Incides are not backing up

Post by krobertson71 »

Here are the results on your command.

No the size will vary along with the date. Two days from now they will read 4-27 etc...

Code: Select all

 nagioslogserver]$ ls -l /usr/local/nagioslogserver/elasticsearch/data/907e60a9-dc29-411e-96e8-2dfe503e0867/nodes/0/indices
total 56
drwxr-xr-x 8 nagios nagcmd 4096 May 11  2015 kibana-int
drwxr-xr-x 8 nagios nagcmd 4096 Nov 10 18:03 logstash-2015.04.25
drwxr-xr-x 8 nagios nagcmd 4096 Nov 11 11:50 logstash-2015.04.26
drwxr-xr-x 8 nagios nagcmd 4096 Nov  3 14:00 logstash-2015.11.04
drwxr-xr-x 8 nagios nagcmd 4096 Nov  4 14:00 logstash-2015.11.05
drwxr-xr-x 8 nagios nagcmd 4096 Nov  5 14:00 logstash-2015.11.06
drwxr-xr-x 8 nagios nagcmd 4096 Nov  6 14:00 logstash-2015.11.07
drwxr-xr-x 8 nagios nagcmd 4096 Nov  7 14:00 logstash-2015.11.08
drwxr-xr-x 8 nagios nagcmd 4096 Nov  8 14:00 logstash-2015.11.09
drwxr-xr-x 8 nagios nagcmd 4096 Nov  9 14:00 logstash-2015.11.10
drwxr-xr-x 8 nagios nagcmd 4096 Nov 10 14:00 logstash-2015.11.11
drwxr-xr-x 8 nagios nagcmd 4096 Nov 11 14:00 logstash-2015.11.12
drwxr-xr-x 4 nagios nagcmd 4096 May 11  2015 nagioslogserver
drwxr-xr-x 8 nagios nagcmd 4096 May 11  2015 nagioslogserver_log
Locked