nagioslogserver_log INITIALIZING - after logstash failure

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

nagioslogserver_log INITIALIZING - after logstash failure

Post by rferebee »

Hello,

I think I might be experiencing an issue after a recent logstash failure.

I ran the following command on the primary node in my Log Server cluster: curl -XGET 'http://localhost:9200/_cat/shards?v'

It appears that the: nagioslogserver_log is stuck INITIALIZING and it's keeping my cluster status in Yellow.

Please see attached export from Log Server.

Not sure what to do to get it back to Green. Also, I'm having trouble with snapshots again. It seems that once we try to snap 30 days, the system just can't handle it and we get logstash failures.
You do not have the required permissions to view the files attached to this post.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: nagioslogserver_log INITIALIZING - after logstash failur

Post by cdienger »

Was the output modified to remove the node information(uuid/ip)? I would start by restarting the elasticsearch service where the replica's are found and if that doesn't do the trick, restart the service on the primary(after the first server's service comes back).
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: nagioslogserver_log INITIALIZING - after logstash failur

Post by rferebee »

Yes, I omitted the IPs. I didn't think they would matter for my question.

Thank you, I'll try that.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: nagioslogserver_log INITIALIZING - after logstash failur

Post by rferebee »

Unfortunately, that did not work. There are still two log shards that are stuck initializing.

I even took the cluster completely down and brought it back up.

Any other ideas?
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: nagioslogserver_log INITIALIZING - after logstash failur

Post by rferebee »

Actually, disregard my last post. We're back. It looks like taking down Log Server completely and bringing it back up did the trick. There's a snapshot running now and the system is Green.

Now onto my second question, we seem to be having a lot of inconsistency with our snapshots. We're good doing anything between 14-29 days, but as soon as we try to snap 30 days logstash starts failing or the system becomes unresponsive.

I used this article to resolve our issues before: https://support.nagios.com/kb/article/n ... g-576.html

Is it possible we need to increase those numbers even further?

Thank you.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: nagioslogserver_log INITIALIZING - after logstash failur

Post by tgriep »

If you see those errors in the console, you can try and increase those variables again to see if it helps.

Also, check the log files in the following folders when the server is having the issue and if you find any errors, post them here.
/var/log/logstash/ and /var/log/elasticsearch/
Be sure to check out our Knowledgebase for helpful articles and solutions!
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: nagioslogserver_log INITIALIZING - after logstash failur

Post by rferebee »

Thank you.

What command would I use to see if there is currently a snapshot running or the status of previous snapshot?
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: nagioslogserver_log INITIALIZING - after logstash failur

Post by rferebee »

What does it mean with the Command Subsystem says WAITING for the snapshot, but Snapshots and Maintenance still shows the snapshot IN PROGRESS?

This is the issue we keep seeing. The subsystem says the snapshot is complete, but when we go to check the system locks up and says the snapshot is still running.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: nagioslogserver_log INITIALIZING - after logstash failur

Post by rferebee »

I ran the following command and received this output. Currently, my log server system is completely locked up...

[root@nagioslscc2 xxxxxxx]# curl -s -XGET 'http://localhost:9200/_cluster/state?pretty' | grep snapshot -A 100
"snapshot" : {
"_timestamp" : {
"enabled" : true
},
"properties" : {
"path" : {
"type" : "string"
},
"auto" : {
"type" : "long"
},
"filename" : {
"type" : "string"
},
"created" : {
"type" : "long"
},
"name" : {
"type" : "string"
},
"clean_filename" : {
"type" : "string"
}
}
},
"_default_" : {
"_timestamp" : {
"enabled" : true
},
"properties" : { }
},
"commands" : {
"_timestamp" : {
"enabled" : true
},
"properties" : {
"last_run_status" : {
"type" : "string"
},
"run_time" : {
"type" : "long"
},
"created" : {
"type" : "string"
},
"active" : {
"type" : "long"
},
"type" : {
"type" : "string"
},
"created_by" : {
"type" : "string"
},
"command" : {
"type" : "string"
},
"last_run_output" : {
"type" : "string"
},
"last_run_time" : {
"type" : "string"
},
"frequency" : {
"type" : "string"
},
"args" : {
"properties" : {
"sh_id" : {
"type" : "string"
},
"path" : {
"type" : "string"
},
"timezone" : {
"type" : "string"
},
"sh_created" : {
"type" : "long"
},
"id" : {
"type" : "string"
}
}
},
"node" : {
"index" : "not_analyzed",
"type" : "string"
},
"modified_by" : {
"type" : "string"
},
"modified" : {
"type" : "string"
},
"status" : {
"type" : "string"
}
}
},
"node" : {
--
"snapshots" : {
"snapshots" : [ {
"repository" : "nlsrep",
"snapshot" : "curator-20181221123022",
"include_global_state" : true,
"state" : "STARTED",
"indices" : [ "logstash-2018.11.21", "logstash-2018.11.22", "logstash-2018.11.23", "logstash-2018.11.24", "logstash-2018.11.25", "logstash-2018.11.26", "logstash-2018.11.27", "logstash-2018.11.28", "logstash-2018.11.29", "logstash-2018.11.30", "logstash-2018.12.01", "logstash-2018.12.02", "logstash-2018.12.03", "logstash-2018.12.04", "logstash-2018.12.05", "logstash-2018.12.06", "logstash-2018.12.07", "logstash-2018.12.08", "logstash-2018.12.09", "logstash-2018.12.10", "logstash-2018.12.11", "logstash-2018.12.12", "logstash-2018.12.13", "logstash-2018.12.14", "logstash-2018.12.15", "logstash-2018.12.16", "logstash-2018.12.17", "logstash-2018.12.18", "logstash-2018.12.19", "logstash-2018.12.20" ],
"shards" : [ {
"index" : "logstash-2018.11.22",
"shard" : 2,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.12.03",
"shard" : 2,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.12.15",
"shard" : 3,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.11.22",
"shard" : 1,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.12.03",
"shard" : 1,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.12.15",
"shard" : 2,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.11.22",
"shard" : 4,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.12.03",
"shard" : 0,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.12.15",
"shard" : 1,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.11.22",
"shard" : 3,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.12.15",
"shard" : 0,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.12.02",
"shard" : 4,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.11.23",
"shard" : 1,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.12.02",
"shard" : 3,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.12.14",
"shard" : 4,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.11.23",
"shard" : 0,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.12.02",
"shard" : 2,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.12.14",
"shard" : 3,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.11.23",
"shard" : 3,
"state" : "SUCCESS",
"node" : "OMSbsV-8RXeTUT1LcgeotQ"
}, {
"index" : "logstash-2018.12.04",
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: nagioslogserver_log INITIALIZING - after logstash failur

Post by rferebee »

See attached screenshots for issue described. There are two snapshots that show IN_PROCESS simultaneously even though the Command Subsystem says WAITING.
You do not have the required permissions to view the files attached to this post.
Locked