Page 1 of 3

nagioslogserver_log INITIALIZING - after logstash failure

Posted: Thu Dec 20, 2018 11:55 am
by rferebee
Hello,

I think I might be experiencing an issue after a recent logstash failure.

I ran the following command on the primary node in my Log Server cluster: curl -XGET 'http://localhost:9200/_cat/shards?v'

It appears that the: nagioslogserver_log is stuck INITIALIZING and it's keeping my cluster status in Yellow.

Please see attached export from Log Server.

Not sure what to do to get it back to Green. Also, I'm having trouble with snapshots again. It seems that once we try to snap 30 days, the system just can't handle it and we get logstash failures.

Re: nagioslogserver_log INITIALIZING - after logstash failur

Posted: Thu Dec 20, 2018 6:01 pm
by cdienger
Was the output modified to remove the node information(uuid/ip)? I would start by restarting the elasticsearch service where the replica's are found and if that doesn't do the trick, restart the service on the primary(after the first server's service comes back).

Re: nagioslogserver_log INITIALIZING - after logstash failur

Posted: Thu Dec 20, 2018 6:06 pm
by rferebee
Yes, I omitted the IPs. I didn't think they would matter for my question.

Thank you, I'll try that.

Re: nagioslogserver_log INITIALIZING - after logstash failur

Posted: Thu Dec 20, 2018 7:02 pm
by rferebee
Unfortunately, that did not work. There are still two log shards that are stuck initializing.

I even took the cluster completely down and brought it back up.

Any other ideas?

Re: nagioslogserver_log INITIALIZING - after logstash failur

Posted: Fri Dec 21, 2018 9:48 am
by rferebee
Actually, disregard my last post. We're back. It looks like taking down Log Server completely and bringing it back up did the trick. There's a snapshot running now and the system is Green.

Now onto my second question, we seem to be having a lot of inconsistency with our snapshots. We're good doing anything between 14-29 days, but as soon as we try to snap 30 days logstash starts failing or the system becomes unresponsive.

I used this article to resolve our issues before: https://support.nagios.com/kb/article/n ... g-576.html

Is it possible we need to increase those numbers even further?

Thank you.

Re: nagioslogserver_log INITIALIZING - after logstash failur

Posted: Fri Dec 21, 2018 4:04 pm
by tgriep
If you see those errors in the console, you can try and increase those variables again to see if it helps.

Also, check the log files in the following folders when the server is having the issue and if you find any errors, post them here.
/var/log/logstash/ and /var/log/elasticsearch/

Re: nagioslogserver_log INITIALIZING - after logstash failur

Posted: Fri Dec 21, 2018 4:15 pm
by rferebee
Thank you.

What command would I use to see if there is currently a snapshot running or the status of previous snapshot?

Re: nagioslogserver_log INITIALIZING - after logstash failur

Posted: Fri Dec 21, 2018 4:43 pm
by rferebee
What does it mean with the Command Subsystem says WAITING for the snapshot, but Snapshots and Maintenance still shows the snapshot IN PROGRESS?

This is the issue we keep seeing. The subsystem says the snapshot is complete, but when we go to check the system locks up and says the snapshot is still running.

Re: nagioslogserver_log INITIALIZING - after logstash failur

Posted: Fri Dec 21, 2018 5:09 pm
by rferebee
I ran the following command and received this output. Currently, my log server system is completely locked up...

[root@nagioslscc2 xxxxxxx]# curl -s -XGET 'http://localhost:9200/_cluster/state?pretty' | grep snapshot -A 100
"snapshot" : {
"_timestamp" : {
"enabled" : true
},
"properties" : {
"path" : {
"type" : "string"
},
"auto" : {
"type" : "long"
},
"filename" : {
"type" : "string"
},
"created" : {
"type" : "long"
},
"name" : {
"type" : "string"
},
"clean_filename" : {
"type" : "string"
}
}
},
"_default_" : {
"_timestamp" : {
"enabled" : true
},
"properties" : { }
},
"commands" : {
"_timestamp" : {
"enabled" : true
},
"properties" : {
"last_run_status" : {
"type" : "string"
},
"run_time" : {
"type" : "long"
},
"created" : {
"type" : "string"
},
"active" : {
"type" : "long"
},
"type" : {
"type" : "string"
},
"created_by" : {
"type" : "string"
},
"command" : {
"type" : "string"
},
"last_run_output" : {
"type" : "string"
},
"last_run_time" : {
"type" : "string"
},
"frequency" : {
"type" : "string"
},
"args" : {
"properties" : {
"sh_id" : {
"type" : "string"
},
"path" : {
"type" : "string"
},
"timezone" : {
"type" : "string"
},
"sh_created" : {
"type" : "long"
},
"id" : {
"type" : "string"
}
}
},
"node" : {
"index" : "not_analyzed",
"type" : "string"
},
"modified_by" : {
"type" : "string"
},
"modified" : {
"type" : "string"
},
"status" : {
"type" : "string"
}
}
},
"node" : {
--
"snapshots" : {
"snapshots" : [ {
"repository" : "nlsrep",
"snapshot" : "curator-20181221123022",
"include_global_state" : true,
"state" : "STARTED",
"indices" : [ "logstash-2018.11.21", "logstash-2018.11.22", "logstash-2018.11.23", "logstash-2018.11.24", "logstash-2018.11.25", "logstash-2018.11.26", "logstash-2018.11.27", "logstash-2018.11.28", "logstash-2018.11.29", "logstash-2018.11.30", "logstash-2018.12.01", "logstash-2018.12.02", "logstash-2018.12.03", "logstash-2018.12.04", "logstash-2018.12.05", "logstash-2018.12.06", "logstash-2018.12.07", "logstash-2018.12.08", "logstash-2018.12.09", "logstash-2018.12.10", "logstash-2018.12.11", "logstash-2018.12.12", "logstash-2018.12.13", "logstash-2018.12.14", "logstash-2018.12.15", "logstash-2018.12.16", "logstash-2018.12.17", "logstash-2018.12.18", "logstash-2018.12.19", "logstash-2018.12.20" ],
"shards" : [ {
"index" : "logstash-2018.11.22",
"shard" : 2,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.12.03",
"shard" : 2,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.12.15",
"shard" : 3,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.11.22",
"shard" : 1,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.12.03",
"shard" : 1,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.12.15",
"shard" : 2,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.11.22",
"shard" : 4,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.12.03",
"shard" : 0,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.12.15",
"shard" : 1,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.11.22",
"shard" : 3,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.12.15",
"shard" : 0,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.12.02",
"shard" : 4,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.11.23",
"shard" : 1,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.12.02",
"shard" : 3,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.12.14",
"shard" : 4,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.11.23",
"shard" : 0,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.12.02",
"shard" : 2,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.12.14",
"shard" : 3,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.11.23",
"shard" : 3,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.12.04",

Re: nagioslogserver_log INITIALIZING - after logstash failur

Posted: Fri Dec 21, 2018 6:47 pm
by rferebee
See attached screenshots for issue described. There are two snapshots that show IN_PROCESS simultaneously even though the Command Subsystem says WAITING.