Java running at 100%

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
vmwareguy
Posts: 69
Joined: Wed Mar 16, 2016 9:41 am

Java running at 100%

Post by vmwareguy »

I have a fresh install of NLS with 45 hosts pushing logs to the server. Within 10 min of receiving logs user nagios command java is 100%. I'm running centos 6.7 with a source install of NLM. Any thought?
User avatar
hsmith
Agent Smith
Posts: 3539
Joined: Thu Jul 30, 2015 11:09 am
Location: 127.0.0.1
Contact:

Re: Java running at 100%

Post by hsmith »

Some information that will help us:

How much memory do you have?

Code: Select all

free -m
How's your CPU doing?

Code: Select all

top | head -n5
How much data per day worth of logs are you receiving? You can find this information under Administration tab, by following the 'Index Status' link.

Anything in the logstash log that might be a hint?

Code: Select all

tail /var/log/logstash/logstash.log
How about the elasticsearch log?

Code: Select all

tail /var/log/elasticsearch/*.log
Former Nagios Employee.
me.
vmwareguy
Posts: 69
Joined: Wed Mar 16, 2016 9:41 am

Re: Java running at 100%

Post by vmwareguy »

Free mem
Mem: Total 3832 Used 3694 Free 138

logstash has been throwing this error alot
:timestamp=>"2016-05-12T07:16:50.680000-0400", :message=>"retrying failed action with response code: 503", :level=>:warn}

elasticsearch tail log

==> /var/log/elasticsearch/de670d53-cf3c-4c88-876a-d74e20244397.log <==
at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:205)
at org.elasticsearch.common.io.Channels.writeToChannel(Channels.java:193)
at org.elasticsearch.index.translog.fs.BufferingFsTranslogFile.flushBuffer(BufferingFsTranslogFile.java:116)
at org.elasticsearch.index.translog.fs.BufferingFsTranslogFile.add(BufferingFsTranslogFile.java:101)
at org.elasticsearch.index.translog.fs.FsTranslog.add(FsTranslog.java:379)
... 9 more
[2016-05-12 07:18:10,562][WARN ][cluster.action.shard ] [ad929a48-ccb5-4e5b-a5b7-8853e7aa30db] [logstash-2016.05.12][0] received shard failed for [logstash-2016.05.12][0], node[LGTW1UYZS-2XS6Xjc-Qnpg], [P], s[INITIALIZING], indexUUID [wJtFiyg3TfCYFb8g89XpPQ], reason [shard failure [failed recovery][IndexShardGatewayRecoveryException[[logstash-2016.05.12][0] failed to recover shard]; nested: TranslogException[[logstash-2016.05.12][0] Failed to write operation [org.elasticsearch.index.translog.Translog$Create@3f5e0b6d]]; nested: IOException[No space left on device]; ]]


At the moment java is fine but it early in the morning - I will post CPU status once java starts acting up again. Thanks for the help
vmwareguy
Posts: 69
Joined: Wed Mar 16, 2016 9:41 am

Re: Java running at 100%

Post by vmwareguy »

I've also noticed that log stash service keeps dying. "Logstash Daemon dead but pid file exists"
User avatar
eloyd
Cool Title Here
Posts: 2190
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: Java running at 100%

Post by eloyd »

If you have the ability to increase physical memory (I do not know if this is a virtual machine or a physical machine), as that will help immensely.

By default, NLS allocates 50% of available system RAM to elasticsearch. You can verify this in /etc/sysconfig/elasticsearch. It MAY be worthwhile to increase the ES_HEAP_SIZE to something bigger than 50% of RAM. You can test this out by changing the last line in this code:

Code: Select all

# Heap Size (defaults to 256m min, 1g max)
# Nagios Log Server Default to 0.5 physical Memory
ES_HEAP_SIZE=$(expr $(free -m|awk '/^Mem:/{print $2}') / 2 )m
to

Code: Select all

ES_HEAP_SIZE=5120m
Which will allocate 5G of RAM instead of ~4G of RAM to elasticsearch's Java heap. This might work. You might be good to go as far as 6114m, which would be 6GB, but I wouldn't push it any farther than that on a system with 8GB of RAM. Increasing swap will likely not gain you anything on this system, and in fact, will degrade performance. Whatever remaining memory is left can be sent to Logstash by changing /etc/sysconfig/logstash similarly:

Code: Select all

# Arguments to pass to java
#LS_HEAP_SIZE="256m"
LS_JAVA_OPTS="-Djava.io.tmpdir=$APP_DIR/tmp"
Change that middle line to be something like:

Code: Select all

LS_HEAP_SIZE="1024m"
(Note that you have to uncomment it). Restart logsearch and elasticsearch. You may need to play with the numbers between elasticsearch and logstash. Don't go less than 50% memory for elasticsearch though. A good article on elasticsearch JVM tuning can be found at https://www.elastic.co/guide/en/elastic ... izing.html

Note that, no matter what you do, 100% CPU is fine. There's nothing wrong with using all of your processor if it's doing actual work.
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
vmwareguy
Posts: 69
Joined: Wed Mar 16, 2016 9:41 am

Re: Java running at 100%

Post by vmwareguy »

eloyd

thanks for the reply - I'm running as a vm and have uped memory to 8GB and added a second CPU. I made both of your recommended changes. I'll keep an eye on NLS and post again later.

Thanks you the advice.
User avatar
eloyd
Cool Title Here
Posts: 2190
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: Java running at 100%

Post by eloyd »

No problem. Again, it's a tuning issue so what works for one person may not work for another based on the overall workload that your server is under. Also, make sure not to allocate 100% of memory to ES and LS. You need some leftover for the OS. :-)
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
vmwareguy
Posts: 69
Joined: Wed Mar 16, 2016 9:41 am

Re: Java running at 100%

Post by vmwareguy »

eloyd wrote:No problem. Again, it's a tuning issue so what works for one person may not work for another based on the overall workload that your server is under. Also, make sure not to allocate 100% of memory to ES and LS. You need some leftover for the OS. :-)

So far she is cooking along nicely, I'll let her run for the day and report back tomorrow.
User avatar
hsmith
Agent Smith
Posts: 3539
Joined: Thu Jul 30, 2015 11:09 am
Location: 127.0.0.1
Contact:

Re: Java running at 100%

Post by hsmith »

Sounds good - let us know!
Former Nagios Employee.
me.
vmwareguy
Posts: 69
Joined: Wed Mar 16, 2016 9:41 am

Re: Java running at 100%

Post by vmwareguy »

So after working with the memory settings here is what I've found. I have 30 mix of servers / workstations pushing logs to NLS without any problems. Problems start when ever I start pushing logs from 14 esxi hosts to NLS. I know that esxi spits out alot of logs but I'm surprised they send out enough to kill the NLS. As soon as I open up port 1514 the NLS cpu hits high 90% within 5 min.
Locked