Nagios user java command using over 200% CPU

rferebee · Post by **rferebee** » Thu Feb 07, 2019 4:15 pm

Yes, we experience intermittent slowness and unresponsiveness through out the day.

Is NCPA and free to use product?

Post by **cdienger** » Fri Feb 08, 2019 12:43 pm

Yes, it is a free product.

Please gather a profile the next time you experience slowness and PM it to me as well with a description of where you were seeing slowness.

rferebee · Post by **rferebee** » Tue Feb 12, 2019 10:41 am

Ok. I installed NCPA per the instructions provided, but I cannot hit the landing page post install. All I'm getting is "The webpage cannot be found".

I can telnet on port 5693 to the server I installed it on, so I know it's not a permit issue. The NCPA_listener service is running.

Any ideas?

rferebee · Post by **rferebee** » Tue Feb 12, 2019 10:46 am

Also, top is still showing over 400% CPU usage for logstash. There's no way that's normal.

Something is causing major issues with my Log Server cluster. Two nights in a row now logstash has failed after my snapshot started.

TOP is built into Linux, like task manager for Windows. How can the data it's showing me be wrong or inaccurate?

rferebee · Post by **rferebee** » Tue Feb 12, 2019 11:14 am

Here are the last two log files from logstash and elasticsearch... I'm not certain, but I think our problem might be with elasticsearch.

Your help is greatly appreciated!

Post by **cdienger** » Tue Feb 12, 2019 2:15 pm

Yes, there appears to be an issue with disk space which is impacting Elasticsearch which can then cause issues with Logstash:

[2019-02-10 15:26:06,457][WARN ][cluster.routing.allocation.decider] [38c1d226-cee5-4f13-aa24-49e3ebcfc201] After allocating, node [zvr-xFzcSzesBXORYOELcQ] would have more than the allowed 10% free disk threshold (6.4% free), preventing allocation
[2019-02-10 15:26:06,457][WARN ][cluster.routing.allocation.decider] [38c1d226-cee5-4f13-aa24-49e3ebcfc201] After allocating, node [9yb1dZPPTn2_L10AxVGhYQ] would have more than the allowed 10% free disk threshold (5.5% free), preventing allocation

What does disk space look like if you run a "df -h" ? How large is the primary size seen under Admin > System > Cluster Status? A possible solution is to move the Elasticsearch database to a larger partition, see: https://assets.nagios.com/downloads/nag ... Server.pdf#

Note that it's not uncommon to see percentages that exceed a 100% on systems with multiple cpus/cores.

And for the ncpa agent make sure you're trying to connect using https and not http.

rferebee · Post by **rferebee** » Tue Feb 12, 2019 2:25 pm

See attached screen shot for 'df -h' output. Looks like there's 800+GBs free.

Primary size is listed as 5TB under Admin > System > Cluster Status.

If drive space is an issue, since this is a virtual server, could we just expand the partition rather than having to move the Elasticsearch DB?

Post by **cdienger** » Tue Feb 12, 2019 3:00 pm

Resizing is an option and we have a guide if you're using the VMs supplied by us:

https://support.nagios.com/kb/article/n ... e-486.html

Another option would to change the high and low water marks since there does seem to be a lot of wiggle room:

https://www.elastic.co/guide/en/elastic ... cator.html

For example to set the low watermark and high watermark to 70gb and 50gb:

Code: Select all

curl -s -XPUT http://localhost:9200/_cluster/settings -d '{ "persistent" : { "cluster.routing.allocation.disk.watermark.low" : "70gb","cluster.routing.allocation.disk.watermark.high" : "50gb" } }'

rferebee · Post by **rferebee** » Tue Feb 12, 2019 4:33 pm

What command would I use to view the current settings? Just in case I need to rollback the change.

Also, are 70gb and 50gb your recommendations based on our environment?

Post by **cdienger** » Wed Feb 13, 2019 10:26 am

You can get the current settings with:

curl -XGET http://localhost:9200/_cluster/settings

Which will likely return:

{"persistent":{},"transient":{}}

which is normal and forces elaticsearch to use the defaults of to 85% and 90%.

I would go with 50gb and 70gb as a start. It can be adjusted again if need.

Nagios Support Forum

Nagios user java command using over 200% CPU

Re: Nagios user java command using over 200% CPU

Re: Nagios user java command using over 200% CPU

Re: Nagios user java command using over 200% CPU

Re: Nagios user java command using over 200% CPU

Re: Nagios user java command using over 200% CPU

Re: Nagios user java command using over 200% CPU

Re: Nagios user java command using over 200% CPU

Re: Nagios user java command using over 200% CPU

Re: Nagios user java command using over 200% CPU

Re: Nagios user java command using over 200% CPU