Nagios user java command using over 200% CPU

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Nagios user java command using over 200% CPU

Post by rferebee »

Yes, we experience intermittent slowness and unresponsiveness through out the day.

Is NCPA and free to use product?
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Nagios user java command using over 200% CPU

Post by cdienger »

Yes, it is a free product.

Please gather a profile the next time you experience slowness and PM it to me as well with a description of where you were seeing slowness.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Nagios user java command using over 200% CPU

Post by rferebee »

Ok. I installed NCPA per the instructions provided, but I cannot hit the landing page post install. All I'm getting is "The webpage cannot be found".

I can telnet on port 5693 to the server I installed it on, so I know it's not a permit issue. The NCPA_listener service is running.

Any ideas?
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Nagios user java command using over 200% CPU

Post by rferebee »

Also, top is still showing over 400% CPU usage for logstash. There's no way that's normal.

Something is causing major issues with my Log Server cluster. Two nights in a row now logstash has failed after my snapshot started.

TOP is built into Linux, like task manager for Windows. How can the data it's showing me be wrong or inaccurate?
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Nagios user java command using over 200% CPU

Post by rferebee »

Here are the last two log files from logstash and elasticsearch... I'm not certain, but I think our problem might be with elasticsearch.

Your help is greatly appreciated!
You do not have the required permissions to view the files attached to this post.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Nagios user java command using over 200% CPU

Post by cdienger »

Yes, there appears to be an issue with disk space which is impacting Elasticsearch which can then cause issues with Logstash:
[2019-02-10 15:26:06,457][WARN ][cluster.routing.allocation.decider] [38c1d226-cee5-4f13-aa24-49e3ebcfc201] After allocating, node [zvr-xFzcSzesBXORYOELcQ] would have more than the allowed 10% free disk threshold (6.4% free), preventing allocation
[2019-02-10 15:26:06,457][WARN ][cluster.routing.allocation.decider] [38c1d226-cee5-4f13-aa24-49e3ebcfc201] After allocating, node [9yb1dZPPTn2_L10AxVGhYQ] would have more than the allowed 10% free disk threshold (5.5% free), preventing allocation
What does disk space look like if you run a "df -h" ? How large is the primary size seen under Admin > System > Cluster Status? A possible solution is to move the Elasticsearch database to a larger partition, see: https://assets.nagios.com/downloads/nag ... Server.pdf#

Note that it's not uncommon to see percentages that exceed a 100% on systems with multiple cpus/cores.

And for the ncpa agent make sure you're trying to connect using https and not http.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Nagios user java command using over 200% CPU

Post by rferebee »

See attached screen shot for 'df -h' output. Looks like there's 800+GBs free.

Primary size is listed as 5TB under Admin > System > Cluster Status.

If drive space is an issue, since this is a virtual server, could we just expand the partition rather than having to move the Elasticsearch DB?
You do not have the required permissions to view the files attached to this post.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Nagios user java command using over 200% CPU

Post by cdienger »

Resizing is an option and we have a guide if you're using the VMs supplied by us:

https://support.nagios.com/kb/article/n ... e-486.html

Another option would to change the high and low water marks since there does seem to be a lot of wiggle room:

https://www.elastic.co/guide/en/elastic ... cator.html

For example to set the low watermark and high watermark to 70gb and 50gb:

Code: Select all

curl -s -XPUT http://localhost:9200/_cluster/settings -d '{ "persistent" : { "cluster.routing.allocation.disk.watermark.low" : "70gb","cluster.routing.allocation.disk.watermark.high" : "50gb" } }'
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Nagios user java command using over 200% CPU

Post by rferebee »

What command would I use to view the current settings? Just in case I need to rollback the change.

Also, are 70gb and 50gb your recommendations based on our environment?
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Nagios user java command using over 200% CPU

Post by cdienger »

You can get the current settings with:

curl -XGET http://localhost:9200/_cluster/settings

Which will likely return:

{"persistent":{},"transient":{}}

which is normal and forces elaticsearch to use the defaults of to 85% and 90%.

I would go with 50gb and 70gb as a start. It can be adjusted again if need.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked