Page 3 of 7

Re: Nagios user java command using over 200% CPU

Posted: Wed Feb 13, 2019 11:03 am
by rferebee
Here's what I got when I ran that command.

[root@nagioslscc2 rferebee]# curl -XGET http://localhost:9200/_cluster/settings
{"persistent":{"cluster":{"routing":{"allocation":{"disk":{"watermark":{"low":"99%"}}}}}},"transient":{"plugin":{"knapsack":{"export":{"state":"[]"}}}}}

It appears my predecessor made some sort of change to the cluster settings.

Re: Nagios user java command using over 200% CPU

Posted: Wed Feb 13, 2019 1:44 pm
by cdienger
It looks like there was an attempt anyway. The format is different than what we'd expect and judging by the logged messages, not effective. You can run the command that was provided to overwrite it.

Re: Nagios user java command using over 200% CPU

Posted: Thu Feb 14, 2019 12:23 pm
by rferebee
Ok, I made the change yesterday. A snapshot ran last night, neither logstash or elasticsearch failed.

However, it looks like the snapshot is still in progress which is odd. Typically if it hasn't failed it's done by now.

In the Command Subsystem it shows State = Waiting, but when I run: curl -s -XGET 'http://localhost:9200/_cluster/state?pretty' | grep snapshot -A 100

I get the following output:

"snapshot" : {
"_timestamp" : {
"enabled" : true
},
"properties" : {
"path" : {
"type" : "string"
},
"auto" : {
"type" : "long"
},
"filename" : {
"type" : "string"
},
"created" : {
"type" : "long"
},
"name" : {
"type" : "string"
},
"clean_filename" : {
"type" : "string"
}
}
},
"_default_" : {
"_timestamp" : {
"enabled" : true
},
"properties" : { }
},
"commands" : {
"_timestamp" : {
"enabled" : true
},
"properties" : {
"last_run_status" : {
"type" : "string"
},
"run_time" : {
"type" : "long"
},
"created" : {
"type" : "string"
},
"active" : {
"type" : "long"
},
"type" : {
"type" : "string"
},
"created_by" : {
"type" : "string"
},
"command" : {
"type" : "string"
},
"last_run_output" : {
"type" : "string"
},
"last_run_time" : {
"type" : "string"
},
"frequency" : {
"type" : "string"
},
"args" : {
"properties" : {
"sh_id" : {
"type" : "string"
},
"path" : {
"type" : "string"
},
"timezone" : {
"type" : "string"
},
"sh_created" : {
"type" : "long"
},
"id" : {
"type" : "string"
}
}
},
"node" : {
"index" : "not_analyzed",
"type" : "string"
},
"modified_by" : {
"type" : "string"
},
"modified" : {
"type" : "string"
},
"status" : {
"type" : "string"
}
}
},
"node" : {
--
"snapshots" : {
"snapshots" : [ {
"repository" : "NLSREPCC",
"snapshot" : "curator-20190214064611",
"include_global_state" : true,
"state" : "STARTED",
"indices" : [ "logstash-2019.01.15", "logstash-2019.01.16", "logstash-2019.01.17", "logstash-2019.01.18", "logstash-2019.01.19", "logstash-2019.01.20", "logstash-2019.01.21", "logstash-2019.01.22", "logstash-2019.01.23", "logstash-2019.01.24", "logstash-2019.01.25", "logstash-2019.01.26", "logstash-2019.01.27", "logstash-2019.01.28", "logstash-2019.01.29", "logstash-2019.01.30", "logstash-2019.01.31", "logstash-2019.02.01", "logstash-2019.02.02", "logstash-2019.02.03", "logstash-2019.02.04", "logstash-2019.02.05", "logstash-2019.02.06", "logstash-2019.02.07", "logstash-2019.02.08", "logstash-2019.02.09", "logstash-2019.02.10", "logstash-2019.02.11", "logstash-2019.02.12", "logstash-2019.02.13" ],
"shards" : [ {
"index" : "logstash-2019.02.02",
"shard" : 4,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.02.02",
"shard" : 3,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.02.02",
"shard" : 2,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.18",
"shard" : 4,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.22",
"shard" : 0,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.18",
"shard" : 1,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.18",
"shard" : 0,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.18",
"shard" : 3,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.18",
"shard" : 2,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.21",
"shard" : 4,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.21",
"shard" : 1,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.21",
"shard" : 0,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.21",
"shard" : 3,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.21",
"shard" : 2,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.02.03",
"shard" : 2,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.02.03",
"shard" : 1,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.02.03",
"shard" : 0,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.02.03",
"shard" : 4,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.02.03",
"shard" : 3,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.17",

Re: Nagios user java command using over 200% CPU

Posted: Thu Feb 14, 2019 3:00 pm
by cdienger
Can you post a screenshot of the snapshot & maintenance settings?

Re: Nagios user java command using over 200% CPU

Posted: Thu Feb 14, 2019 4:21 pm
by rferebee
I cannot access the Snapshot & Maintenance settings while a snapshot is in progress. This is one of the issues I've been trying to get resolved with your team. It causes the GUI to lock up.

Re: Nagios user java command using over 200% CPU

Posted: Thu Feb 14, 2019 4:57 pm
by cdienger
Let's grab it once it becomes available and in the meantime please PM me a profile. This can be generated under Admin > System > Command Subsystem. If it is too large to PM, please open a ticket at https://support.nagios.com/tickets/ and attach it there.

Re: Nagios user java command using over 200% CPU

Posted: Thu Feb 14, 2019 6:29 pm
by rferebee
It finally came up! See attached.

Re: Nagios user java command using over 200% CPU

Posted: Fri Feb 15, 2019 12:21 pm
by cdienger
I typically recommend setting the optimization option to 0 to disable it since it requires resources - space, cpu, mem - causing more problems like this than it provides benefit. The main benefit being quicker restart times of the elasticsearch service.

Re: Nagios user java command using over 200% CPU

Posted: Fri Feb 15, 2019 1:28 pm
by rferebee
Are there any storage ramifications we need to worry about? I think we tried this about a month ago and it appeared that our snapshot size grew quite a bit.

Re: Nagios user java command using over 200% CPU

Posted: Fri Feb 15, 2019 3:15 pm
by cdienger
Optimization does merge segments so that there are fewer of them and this can be more efficient for storage. And look at the data again, it may not actually be the optimization part that is causing the hang. The next time you see it hang please run "ps aux | grep curator" and gather another profile.