Nagios user java command using over 200% CPU

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Nagios user java command using over 200% CPU

Post by rferebee »

Here's what I got when I ran that command.

[root@nagioslscc2 rferebee]# curl -XGET http://localhost:9200/_cluster/settings
{"persistent":{"cluster":{"routing":{"allocation":{"disk":{"watermark":{"low":"99%"}}}}}},"transient":{"plugin":{"knapsack":{"export":{"state":"[]"}}}}}

It appears my predecessor made some sort of change to the cluster settings.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Nagios user java command using over 200% CPU

Post by cdienger »

It looks like there was an attempt anyway. The format is different than what we'd expect and judging by the logged messages, not effective. You can run the command that was provided to overwrite it.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Nagios user java command using over 200% CPU

Post by rferebee »

Ok, I made the change yesterday. A snapshot ran last night, neither logstash or elasticsearch failed.

However, it looks like the snapshot is still in progress which is odd. Typically if it hasn't failed it's done by now.

In the Command Subsystem it shows State = Waiting, but when I run: curl -s -XGET 'http://localhost:9200/_cluster/state?pretty' | grep snapshot -A 100

I get the following output:

"snapshot" : {
"_timestamp" : {
"enabled" : true
},
"properties" : {
"path" : {
"type" : "string"
},
"auto" : {
"type" : "long"
},
"filename" : {
"type" : "string"
},
"created" : {
"type" : "long"
},
"name" : {
"type" : "string"
},
"clean_filename" : {
"type" : "string"
}
}
},
"_default_" : {
"_timestamp" : {
"enabled" : true
},
"properties" : { }
},
"commands" : {
"_timestamp" : {
"enabled" : true
},
"properties" : {
"last_run_status" : {
"type" : "string"
},
"run_time" : {
"type" : "long"
},
"created" : {
"type" : "string"
},
"active" : {
"type" : "long"
},
"type" : {
"type" : "string"
},
"created_by" : {
"type" : "string"
},
"command" : {
"type" : "string"
},
"last_run_output" : {
"type" : "string"
},
"last_run_time" : {
"type" : "string"
},
"frequency" : {
"type" : "string"
},
"args" : {
"properties" : {
"sh_id" : {
"type" : "string"
},
"path" : {
"type" : "string"
},
"timezone" : {
"type" : "string"
},
"sh_created" : {
"type" : "long"
},
"id" : {
"type" : "string"
}
}
},
"node" : {
"index" : "not_analyzed",
"type" : "string"
},
"modified_by" : {
"type" : "string"
},
"modified" : {
"type" : "string"
},
"status" : {
"type" : "string"
}
}
},
"node" : {
--
"snapshots" : {
"snapshots" : [ {
"repository" : "NLSREPCC",
"snapshot" : "curator-20190214064611",
"include_global_state" : true,
"state" : "STARTED",
"indices" : [ "logstash-2019.01.15", "logstash-2019.01.16", "logstash-2019.01.17", "logstash-2019.01.18", "logstash-2019.01.19", "logstash-2019.01.20", "logstash-2019.01.21", "logstash-2019.01.22", "logstash-2019.01.23", "logstash-2019.01.24", "logstash-2019.01.25", "logstash-2019.01.26", "logstash-2019.01.27", "logstash-2019.01.28", "logstash-2019.01.29", "logstash-2019.01.30", "logstash-2019.01.31", "logstash-2019.02.01", "logstash-2019.02.02", "logstash-2019.02.03", "logstash-2019.02.04", "logstash-2019.02.05", "logstash-2019.02.06", "logstash-2019.02.07", "logstash-2019.02.08", "logstash-2019.02.09", "logstash-2019.02.10", "logstash-2019.02.11", "logstash-2019.02.12", "logstash-2019.02.13" ],
"shards" : [ {
"index" : "logstash-2019.02.02",
"shard" : 4,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.02.02",
"shard" : 3,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.02.02",
"shard" : 2,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.18",
"shard" : 4,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.22",
"shard" : 0,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.18",
"shard" : 1,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.18",
"shard" : 0,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.18",
"shard" : 3,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.18",
"shard" : 2,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.21",
"shard" : 4,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.21",
"shard" : 1,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.21",
"shard" : 0,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.21",
"shard" : 3,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.21",
"shard" : 2,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.02.03",
"shard" : 2,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.02.03",
"shard" : 1,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.02.03",
"shard" : 0,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.02.03",
"shard" : 4,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.02.03",
"shard" : 3,
"state" : "SUCCESS",
"node" : "XiP7eUflQ9uppcZnHxZP0A"
}, {
"index" : "logstash-2019.01.17",
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Nagios user java command using over 200% CPU

Post by cdienger »

Can you post a screenshot of the snapshot & maintenance settings?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Nagios user java command using over 200% CPU

Post by rferebee »

I cannot access the Snapshot & Maintenance settings while a snapshot is in progress. This is one of the issues I've been trying to get resolved with your team. It causes the GUI to lock up.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Nagios user java command using over 200% CPU

Post by cdienger »

Let's grab it once it becomes available and in the meantime please PM me a profile. This can be generated under Admin > System > Command Subsystem. If it is too large to PM, please open a ticket at https://support.nagios.com/tickets/ and attach it there.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Nagios user java command using over 200% CPU

Post by rferebee »

It finally came up! See attached.
You do not have the required permissions to view the files attached to this post.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Nagios user java command using over 200% CPU

Post by cdienger »

I typically recommend setting the optimization option to 0 to disable it since it requires resources - space, cpu, mem - causing more problems like this than it provides benefit. The main benefit being quicker restart times of the elasticsearch service.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Nagios user java command using over 200% CPU

Post by rferebee »

Are there any storage ramifications we need to worry about? I think we tried this about a month ago and it appeared that our snapshot size grew quite a bit.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Nagios user java command using over 200% CPU

Post by cdienger »

Optimization does merge segments so that there are fewer of them and this can be more efficient for storage. And look at the data again, it may not actually be the optimization part that is causing the hang. The next time you see it hang please run "ps aux | grep curator" and gather another profile.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked