Page 2 of 7

Re: Nagios user java command using over 200% CPU

Posted: Thu Feb 07, 2019 4:15 pm
by rferebee
Yes, we experience intermittent slowness and unresponsiveness through out the day.

Is NCPA and free to use product?

Re: Nagios user java command using over 200% CPU

Posted: Fri Feb 08, 2019 12:43 pm
by cdienger
Yes, it is a free product.

Please gather a profile the next time you experience slowness and PM it to me as well with a description of where you were seeing slowness.

Re: Nagios user java command using over 200% CPU

Posted: Tue Feb 12, 2019 10:41 am
by rferebee
Ok. I installed NCPA per the instructions provided, but I cannot hit the landing page post install. All I'm getting is "The webpage cannot be found".

I can telnet on port 5693 to the server I installed it on, so I know it's not a permit issue. The NCPA_listener service is running.

Any ideas?

Re: Nagios user java command using over 200% CPU

Posted: Tue Feb 12, 2019 10:46 am
by rferebee
Also, top is still showing over 400% CPU usage for logstash. There's no way that's normal.

Something is causing major issues with my Log Server cluster. Two nights in a row now logstash has failed after my snapshot started.

TOP is built into Linux, like task manager for Windows. How can the data it's showing me be wrong or inaccurate?

Re: Nagios user java command using over 200% CPU

Posted: Tue Feb 12, 2019 11:14 am
by rferebee
Here are the last two log files from logstash and elasticsearch... I'm not certain, but I think our problem might be with elasticsearch.

Your help is greatly appreciated!

Re: Nagios user java command using over 200% CPU

Posted: Tue Feb 12, 2019 2:15 pm
by cdienger
Yes, there appears to be an issue with disk space which is impacting Elasticsearch which can then cause issues with Logstash:
[2019-02-10 15:26:06,457][WARN ][cluster.routing.allocation.decider] [38c1d226-cee5-4f13-aa24-49e3ebcfc201] After allocating, node [zvr-xFzcSzesBXORYOELcQ] would have more than the allowed 10% free disk threshold (6.4% free), preventing allocation
[2019-02-10 15:26:06,457][WARN ][cluster.routing.allocation.decider] [38c1d226-cee5-4f13-aa24-49e3ebcfc201] After allocating, node [9yb1dZPPTn2_L10AxVGhYQ] would have more than the allowed 10% free disk threshold (5.5% free), preventing allocation
What does disk space look like if you run a "df -h" ? How large is the primary size seen under Admin > System > Cluster Status? A possible solution is to move the Elasticsearch database to a larger partition, see: https://assets.nagios.com/downloads/nag ... Server.pdf#

Note that it's not uncommon to see percentages that exceed a 100% on systems with multiple cpus/cores.

And for the ncpa agent make sure you're trying to connect using https and not http.

Re: Nagios user java command using over 200% CPU

Posted: Tue Feb 12, 2019 2:25 pm
by rferebee
See attached screen shot for 'df -h' output. Looks like there's 800+GBs free.

Primary size is listed as 5TB under Admin > System > Cluster Status.

If drive space is an issue, since this is a virtual server, could we just expand the partition rather than having to move the Elasticsearch DB?

Re: Nagios user java command using over 200% CPU

Posted: Tue Feb 12, 2019 3:00 pm
by cdienger
Resizing is an option and we have a guide if you're using the VMs supplied by us:

https://support.nagios.com/kb/article/n ... e-486.html

Another option would to change the high and low water marks since there does seem to be a lot of wiggle room:

https://www.elastic.co/guide/en/elastic ... cator.html

For example to set the low watermark and high watermark to 70gb and 50gb:

Code: Select all

curl -s -XPUT http://localhost:9200/_cluster/settings -d '{ "persistent" : { "cluster.routing.allocation.disk.watermark.low" : "70gb","cluster.routing.allocation.disk.watermark.high" : "50gb" } }'

Re: Nagios user java command using over 200% CPU

Posted: Tue Feb 12, 2019 4:33 pm
by rferebee
What command would I use to view the current settings? Just in case I need to rollback the change.

Also, are 70gb and 50gb your recommendations based on our environment?

Re: Nagios user java command using over 200% CPU

Posted: Wed Feb 13, 2019 10:26 am
by cdienger
You can get the current settings with:

curl -XGET http://localhost:9200/_cluster/settings

Which will likely return:

{"persistent":{},"transient":{}}

which is normal and forces elaticsearch to use the defaults of to 85% and 90%.

I would go with 50gb and 70gb as a start. It can be adjusted again if need.