Nagios user java command using over 200% CPU

rferebee · Post by **rferebee** » Wed Feb 13, 2019 11:03 am

Here's what I got when I ran that command.

[root@nagioslscc2 rferebee]# curl -XGET http://localhost:9200/_cluster/settings
{"persistent":{"cluster":{"routing":{"allocation":{"disk":{"watermark":{"low":"99%"}}}}}},"transient":{"plugin":{"knapsack":{"export":{"state":"[]"}}}}}

It appears my predecessor made some sort of change to the cluster settings.

Post by **cdienger** » Wed Feb 13, 2019 1:44 pm

It looks like there was an attempt anyway. The format is different than what we'd expect and judging by the logged messages, not effective. You can run the command that was provided to overwrite it.

rferebee · Post by **rferebee** » Thu Feb 14, 2019 12:23 pm

Ok, I made the change yesterday. A snapshot ran last night, neither logstash or elasticsearch failed.

However, it looks like the snapshot is still in progress which is odd. Typically if it hasn't failed it's done by now.

In the Command Subsystem it shows State = Waiting, but when I run: curl -s -XGET 'http://localhost:9200/_cluster/state?pretty' | grep snapshot -A 100

I get the following output:

"snapshot" : {
"_timestamp" : {
"enabled" : true
},
"properties" : {
"path" : {
"type" : "string"
},
"auto" : {
"type" : "long"
},
"filename" : {
"type" : "string"
},
"created" : {
"type" : "long"
},
"name" : {
"type" : "string"
},
"clean_filename" : {
"type" : "string"
}
}
},
"_default_" : {
"_timestamp" : {
"enabled" : true
},
"properties" : { }
},
"commands" : {
"_timestamp" : {
"enabled" : true
},
"properties" : {
"last_run_status" : {
"type" : "string"
},
"run_time" : {
"type" : "long"
},
"created" : {
"type" : "string"
},
"active" : {
"type" : "long"
},
"type" : {
"type" : "string"
},
"created_by" : {
"type" : "string"
},
"command" : {
"type" : "string"
},
"last_run_output" : {
"type" : "string"
},
"last_run_time" : {
"type" : "string"
},
"frequency" : {
"type" : "string"
},
"args" : {
"properties" : {
"sh_id" : {
"type" : "string"
},
"path" : {
"type" : "string"
},
"timezone" : {
"type" : "string"
},
"sh_created" : {
"type" : "long"
},
"id" : {
"type" : "string"
}
}
},
"node" : {
"index" : "not_analyzed",
"type" : "string"
},
"modified_by" : {
"type" : "string"
},
"modified" : {
"type" : "string"
},
"status" : {
"type" : "string"
}
}
},
"node" : {
--
"snapshots" : {
"snapshots" : [ {
"repository" : "NLSREPCC",
"snapshot" : "curator-20190214064611",
"include_global_state" : true,
"state" : "STARTED",
"indices" : [ "logstash-2019.01.15", "logstash-2019.01.16", "logstash-2019.01.17", "logstash-2019.01.18", "logstash-2019.01.19", "logstash-2019.01.20", "logstash-2019.01.21", "logstash-2019.01.22", "logstash-2019.01.23", "logstash-2019.01.24", "logstash-2019.01.25", "logstash-2019.01.26", "logstash-2019.01.27", "logstash-2019.01.28", "logstash-2019.01.29", "logstash-2019.01.30", "logstash-2019.01.31", "logstash-2019.02.01", "logstash-2019.02.02", "logstash-2019.02.03", "logstash-2019.02.04", "logstash-2019.02.05", "logstash-2019.02.06", "logstash-2019.02.07", "logstash-2019.02.08", "logstash-2019.02.09", "logstash-2019.02.10", "logstash-2019.02.11", "logstash-2019.02.12", "logstash-2019.02.13" ],
"shards" : [ {
"index" : "logstash-2019.02.02",
"shard" : 4,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.02.02",
"shard" : 3,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.02.02",
"shard" : 2,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.18",
"shard" : 4,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.22",
"shard" : 0,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.18",
"shard" : 1,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.18",
"shard" : 0,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.18",
"shard" : 3,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.18",
"shard" : 2,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.21",
"shard" : 4,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.21",
"shard" : 1,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.21",
"shard" : 0,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.21",
"shard" : 3,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.21",
"shard" : 2,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.02.03",
"shard" : 2,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.02.03",
"shard" : 1,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.02.03",
"shard" : 0,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.02.03",
"shard" : 4,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.02.03",
"shard" : 3,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.17",

Post by **cdienger** » Thu Feb 14, 2019 3:00 pm

Can you post a screenshot of the snapshot & maintenance settings?

rferebee · Post by **rferebee** » Thu Feb 14, 2019 4:21 pm

I cannot access the Snapshot & Maintenance settings while a snapshot is in progress. This is one of the issues I've been trying to get resolved with your team. It causes the GUI to lock up.

Post by **cdienger** » Thu Feb 14, 2019 4:57 pm

Let's grab it once it becomes available and in the meantime please PM me a profile. This can be generated under Admin > System > Command Subsystem. If it is too large to PM, please open a ticket at https://support.nagios.com/tickets/ and attach it there.

rferebee · Post by **rferebee** » Thu Feb 14, 2019 6:29 pm

It finally came up! See attached.

Post by **cdienger** » Fri Feb 15, 2019 12:21 pm

I typically recommend setting the optimization option to 0 to disable it since it requires resources - space, cpu, mem - causing more problems like this than it provides benefit. The main benefit being quicker restart times of the elasticsearch service.

rferebee · Post by **rferebee** » Fri Feb 15, 2019 1:28 pm

Are there any storage ramifications we need to worry about? I think we tried this about a month ago and it appeared that our snapshot size grew quite a bit.

Post by **cdienger** » Fri Feb 15, 2019 3:15 pm

Optimization does merge segments so that there are fewer of them and this can be more efficient for storage. And look at the data again, it may not actually be the optimization part that is causing the hang. The next time you see it hang please run "ps aux | grep curator" and gather another profile.

Nagios Support Forum

Nagios user java command using over 200% CPU

Re: Nagios user java command using over 200% CPU

Re: Nagios user java command using over 200% CPU

Re: Nagios user java command using over 200% CPU

Re: Nagios user java command using over 200% CPU

Re: Nagios user java command using over 200% CPU

Re: Nagios user java command using over 200% CPU

Re: Nagios user java command using over 200% CPU

Re: Nagios user java command using over 200% CPU

Re: Nagios user java command using over 200% CPU

Re: Nagios user java command using over 200% CPU