Page 5 of 6

Re: How to stop a currently running snapshot?

Posted: Tue Oct 29, 2019 3:42 pm
by rferebee
Should I run each command you listed individually or can I run them as a block all together?

Re: How to stop a currently running snapshot?

Posted: Tue Oct 29, 2019 4:30 pm
by rferebee
I tried to start by just deleting the curator job, but it threw an error:

Code: Select all

root@nagioslscc2:/root> curl -XDELETE 'http://localhost:9200/_snapshot/NLSREPCC/curator-curator-20191029053112'
{"error":"RemoteTransportException[[fc00210a-5231-46f3-84f2-6e3c61c7ac0e][inet[/10.128.207.112:9300]][cluster:admin/snapshot/delete]]; nested: SnapshotMissingException[[NLSREPCC:curator-curator-20191029053112] is missing]; nested: FileNotFoundException[/nlsrepcc/snapshot-curator-curator-20191029053112 (No such file or directory)]; ","status":404}
When I look at the snapshot jobs I'm seeing this:

Code: Select all

{
    "snapshot" : "curator-20191029053112",
    "version_id" : 1070699,
    "version" : "1.7.6",
    "indices" : [ "logstash-2019.10.08", "logstash-2019.10.09", "logstash-2019.10.10", "logstash-2019.10.11", "logstash-2019.10.12", "logstash-2019.10.13", "logstash-2019.10.14", "logstash-2019.10.15", "logstash-2019.10.16", "logstash-2019.10.17", "logstash-2019.10.18", "logstash-2019.10.19", "logstash-2019.10.20", "logstash-2019.10.21", "logstash-2019.10.22", "logstash-2019.10.23", "logstash-2019.10.24", "logstash-2019.10.25", "logstash-2019.10.26", "logstash-2019.10.27", "logstash-2019.10.28" ],
    "state" : "PARTIAL",
    "start_time" : "2019-10-29T05:31:12.493Z",
    "start_time_in_millis" : 1572327072493,
    "end_time" : "2019-10-29T21:14:22.438Z",
    "end_time_in_millis" : 1572383662438,
    "duration_in_millis" : 56589945,
    "failures" : [ {
      "node_id" : "pcqYieAcSAK7http3p3yzQ",
      "index" : "logstash-2019.10.11",
      "reason" : "IndexShardSnapshotFailedException[[logstash-2019.10.11][3] Failed to perform snapshot (index files)]; nested: FileSystemException[/nlsrepcc/indices/logstash-2019.10.11/3/__4c: Resource temporarily unavailable]; ",
      "shard_id" : 3,
      "status" : "INTERNAL_SERVER_ERROR"
    }, {
      "node_id" : "pcqYieAcSAK7http3p3yzQ",
      "index" : "logstash-2019.10.11",
      "reason" : "IndexShardSnapshotFailedException[[logstash-2019.10.11][4] Failed to perform snapshot (index files)]; nested: IOException[Input/output error]; ",
      "shard_id" : 4,
      "status" : "INTERNAL_SERVER_ERROR"
    }, {
      "node_id" : "pcqYieAcSAK7http3p3yzQ",
      "index" : "logstash-2019.10.13",
      "reason" : "IndexShardSnapshotFailedException[[logstash-2019.10.13][3] Failed to perform snapshot (index files)]; nested: IOException[Input/output error]; ",
      "shard_id" : 3,
      "status" : "INTERNAL_SERVER_ERROR"
    }, {
      "node_id" : "pcqYieAcSAK7http3p3yzQ",
      "index" : "logstash-2019.10.10",
      "reason" : "IndexShardSnapshotFailedException[[logstash-2019.10.10][3] Failed to perform snapshot (index files)]; nested: IOException[Input/output error]; ",
      "shard_id" : 3,
      "status" : "INTERNAL_SERVER_ERROR"
    }, {
      "node_id" : "pcqYieAcSAK7http3p3yzQ",
      "index" : "logstash-2019.10.22",
      "reason" : "IndexShardSnapshotFailedException[[logstash-2019.10.22][4] Failed to perform snapshot (index files)]; nested: IOException[Input/output error]; ",
      "shard_id" : 4,
      "status" : "INTERNAL_SERVER_ERROR"
    } ],
    "shards" : {
      "total" : 105,
      "failed" : 5,
      "successful" : 100
    }
  } ]
}

Re: How to stop a currently running snapshot?

Posted: Tue Oct 29, 2019 4:35 pm
by cdienger
Do them in a block(note the simcolons betwee commands):

Code: Select all

/usr/local/nagioslogserver/scripts/curator.sh snapshot --repository "NLSREPCC" --ignore_unavailable indices --older-than 15 --time-unit days --timestring %Y.%m.%d;/usr/local/nagioslogserver/scripts/curator.sh snapshot --repository "NLSREPCC" --ignore_unavailable indices --older-than 10 --time-unit days --timestring %Y.%m.%d;/usr/local/nagioslogserver/scripts/curator.sh snapshot --repository "NLSREPCC" --ignore_unavailable indices --older-than 5 --time-unit days --timestring %Y.%m.%d;/usr/local/nagioslogserver/scripts/curator.sh snapshot --repository "NLSREPCC" --ignore_unavailable indices --older-than 1 --time-unit days --timestring %Y.%m.%d
Do it directly on the console as well so we don't run into any issues with the ssh client timing out and can see status messages.

Re: How to stop a currently running snapshot?

Posted: Tue Oct 29, 2019 4:36 pm
by cdienger
Small typo. This should work better:

Code: Select all

curl -XDELETE 'http://localhost:9200/_snapshot/NLSREPCC/curator-20191029053112'

Re: How to stop a currently running snapshot?

Posted: Tue Oct 29, 2019 5:57 pm
by rferebee
When you run a snapshot manually like this, does it not show up in the Command Subsystem?

I'm trying to get a sense of if it's still running and when it's done. I guess I can just keep running the all?pretty command from an SSH session?

Re: How to stop a currently running snapshot?

Posted: Wed Oct 30, 2019 11:23 am
by rferebee
Ok, the snapshot completed. Please leave this support thread open. I would like to see a snapshot run on it's own this evening.

Re: How to stop a currently running snapshot?

Posted: Wed Oct 30, 2019 11:30 am
by cdienger
I don't think it will show up in the command subsytem since it wasn't initialized there, but you should see output on the command line when you run it and it wont allow you to use the command line until it is complete. You can also check to see if the curator script is running from another terminal window:

Code: Select all

ps aux | grep curator

Re: How to stop a currently running snapshot?

Posted: Wed Oct 30, 2019 11:30 am
by cdienger
Looks like our updates passed each other. Glad to hear the commands worked. We'll wait for your update.

Re: How to stop a currently running snapshot?

Posted: Thu Oct 31, 2019 1:03 pm
by rferebee
The snapshot from last night finished this morning PARTIAL, here's the output with errors:

Code: Select all

{
    "snapshot" : "curator-20191031053043",
    "version_id" : 1070699,
    "version" : "1.7.6",
    "indices" : [ "logstash-2019.10.10", "logstash-2019.10.11", "logstash-2019.10.12", "logstash-2019.10.13", "logstash-2019.10.14", "logstash-2019.10.15", "logstash-2019.10.16", "logstash-2019.10.17", "logstash-2019.10.18", "logstash-2019.10.19", "logstash-2019.10.20", "logstash-2019.10.21", "logstash-2019.10.22", "logstash-2019.10.23", "logstash-2019.10.24", "logstash-2019.10.25", "logstash-2019.10.26", "logstash-2019.10.27", "logstash-2019.10.28", "logstash-2019.10.29", "logstash-2019.10.30" ],
    "state" : "PARTIAL",
    "start_time" : "2019-10-31T05:30:44.111Z",
    "start_time_in_millis" : 1572499844111,
    "end_time" : "2019-10-31T15:36:03.246Z",
    "end_time_in_millis" : 1572536163246,
    "duration_in_millis" : 36319135,
    "failures" : [ {
      "node_id" : "D_ao-uEBSi62BE2caGHsHw",
      "index" : "logstash-2019.10.24",
      "reason" : "IndexShardSnapshotFailedException[[logstash-2019.10.24][4] Failed to perform snapshot (index files)]; nested: IOException[Input/output error]; ",
      "shard_id" : 4,
      "status" : "INTERNAL_SERVER_ERROR"
    }, {
      "node_id" : "pcqYieAcSAK7http3p3yzQ",
      "index" : "logstash-2019.10.25",
      "reason" : "IndexShardSnapshotFailedException[[logstash-2019.10.25][2] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/nlsrepcc/indices/logstash-2019.10.25/2/__5 (Stale file handle)]; ",
      "shard_id" : 2,
      "status" : "INTERNAL_SERVER_ERROR"
    }, {
      "node_id" : "D_ao-uEBSi62BE2caGHsHw",
      "index" : "logstash-2019.10.25",
      "reason" : "IndexShardSnapshotFailedException[[logstash-2019.10.25][3] Failed to perform snapshot (index files)]; nested: IOException[Input/output error]; ",
      "shard_id" : 3,
      "status" : "INTERNAL_SERVER_ERROR"
    }, {
      "node_id" : "olHQlucoSmO-CIohLIKLhQ",
      "index" : "logstash-2019.10.18",
      "reason" : "IndexShardSnapshotFailedException[[logstash-2019.10.18][2] Failed to perform snapshot (index files)]; nested: IOException[Input/output error]; ",
      "shard_id" : 2,
      "status" : "INTERNAL_SERVER_ERROR"
    }, {
      "node_id" : "pcqYieAcSAK7http3p3yzQ",
      "index" : "logstash-2019.10.21",
      "reason" : "IndexShardSnapshotFailedException[[logstash-2019.10.21][0] Failed to perform snapshot (index files)]; nested: FileNotFoundException[/nlsrepcc/indices/logstash-2019.10.21/0/__1 (Stale file handle)]; ",
      "shard_id" : 0,
      "status" : "INTERNAL_SERVER_ERROR"
    } ],
    "shards" : {
      "total" : 105,
      "failed" : 5,
      "successful" : 100
    }
  } ]
}
Anything I should be concerned with or are these standard errors? Thank you.

Re: How to stop a currently running snapshot?

Posted: Thu Oct 31, 2019 1:48 pm
by cdienger
Are there any unassigned shards or is the cluster in a state other than green? It may have been a temporary issue and since corrected itself. Keep an eye out for tomorrow's snapshot results or force another snapshot now. PM me a fresh profile if you continue getting problems with these shards.