Java running at 100%
Java running at 100%
I have a fresh install of NLS with 45 hosts pushing logs to the server. Within 10 min of receiving logs user nagios command java is 100%. I'm running centos 6.7 with a source install of NLM. Any thought?
Re: Java running at 100%
Some information that will help us:
How much memory do you have?
How's your CPU doing?
How much data per day worth of logs are you receiving? You can find this information under Administration tab, by following the 'Index Status' link.
Anything in the logstash log that might be a hint?
How about the elasticsearch log?
How much memory do you have?
Code: Select all
free -mCode: Select all
top | head -n5Anything in the logstash log that might be a hint?
Code: Select all
tail /var/log/logstash/logstash.logCode: Select all
tail /var/log/elasticsearch/*.logFormer Nagios Employee.
me.
me.
Re: Java running at 100%
Free mem
Mem: Total 3832 Used 3694 Free 138
logstash has been throwing this error alot
:timestamp=>"2016-05-12T07:16:50.680000-0400", :message=>"retrying failed action with response code: 503", :level=>:warn}
elasticsearch tail log
==> /var/log/elasticsearch/de670d53-cf3c-4c88-876a-d74e20244397.log <==
at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:205)
at org.elasticsearch.common.io.Channels.writeToChannel(Channels.java:193)
at org.elasticsearch.index.translog.fs.BufferingFsTranslogFile.flushBuffer(BufferingFsTranslogFile.java:116)
at org.elasticsearch.index.translog.fs.BufferingFsTranslogFile.add(BufferingFsTranslogFile.java:101)
at org.elasticsearch.index.translog.fs.FsTranslog.add(FsTranslog.java:379)
... 9 more
[2016-05-12 07:18:10,562][WARN ][cluster.action.shard ] [ad929a48-ccb5-4e5b-a5b7-8853e7aa30db] [logstash-2016.05.12][0] received shard failed for [logstash-2016.05.12][0], node[LGTW1UYZS-2XS6Xjc-Qnpg], [P], s[INITIALIZING], indexUUID [wJtFiyg3TfCYFb8g89XpPQ], reason [shard failure [failed recovery][IndexShardGatewayRecoveryException[[logstash-2016.05.12][0] failed to recover shard]; nested: TranslogException[[logstash-2016.05.12][0] Failed to write operation [org.elasticsearch.index.translog.Translog$Create@3f5e0b6d]]; nested: IOException[No space left on device]; ]]
At the moment java is fine but it early in the morning - I will post CPU status once java starts acting up again. Thanks for the help
Mem: Total 3832 Used 3694 Free 138
logstash has been throwing this error alot
:timestamp=>"2016-05-12T07:16:50.680000-0400", :message=>"retrying failed action with response code: 503", :level=>:warn}
elasticsearch tail log
==> /var/log/elasticsearch/de670d53-cf3c-4c88-876a-d74e20244397.log <==
at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:205)
at org.elasticsearch.common.io.Channels.writeToChannel(Channels.java:193)
at org.elasticsearch.index.translog.fs.BufferingFsTranslogFile.flushBuffer(BufferingFsTranslogFile.java:116)
at org.elasticsearch.index.translog.fs.BufferingFsTranslogFile.add(BufferingFsTranslogFile.java:101)
at org.elasticsearch.index.translog.fs.FsTranslog.add(FsTranslog.java:379)
... 9 more
[2016-05-12 07:18:10,562][WARN ][cluster.action.shard ] [ad929a48-ccb5-4e5b-a5b7-8853e7aa30db] [logstash-2016.05.12][0] received shard failed for [logstash-2016.05.12][0], node[LGTW1UYZS-2XS6Xjc-Qnpg], [P], s[INITIALIZING], indexUUID [wJtFiyg3TfCYFb8g89XpPQ], reason [shard failure [failed recovery][IndexShardGatewayRecoveryException[[logstash-2016.05.12][0] failed to recover shard]; nested: TranslogException[[logstash-2016.05.12][0] Failed to write operation [org.elasticsearch.index.translog.Translog$Create@3f5e0b6d]]; nested: IOException[No space left on device]; ]]
At the moment java is fine but it early in the morning - I will post CPU status once java starts acting up again. Thanks for the help
Re: Java running at 100%
I've also noticed that log stash service keeps dying. "Logstash Daemon dead but pid file exists"
Re: Java running at 100%
If you have the ability to increase physical memory (I do not know if this is a virtual machine or a physical machine), as that will help immensely.
By default, NLS allocates 50% of available system RAM to elasticsearch. You can verify this in /etc/sysconfig/elasticsearch. It MAY be worthwhile to increase the ES_HEAP_SIZE to something bigger than 50% of RAM. You can test this out by changing the last line in this code:
to
Which will allocate 5G of RAM instead of ~4G of RAM to elasticsearch's Java heap. This might work. You might be good to go as far as 6114m, which would be 6GB, but I wouldn't push it any farther than that on a system with 8GB of RAM. Increasing swap will likely not gain you anything on this system, and in fact, will degrade performance. Whatever remaining memory is left can be sent to Logstash by changing /etc/sysconfig/logstash similarly:
Change that middle line to be something like:
(Note that you have to uncomment it). Restart logsearch and elasticsearch. You may need to play with the numbers between elasticsearch and logstash. Don't go less than 50% memory for elasticsearch though. A good article on elasticsearch JVM tuning can be found at https://www.elastic.co/guide/en/elastic ... izing.html
Note that, no matter what you do, 100% CPU is fine. There's nothing wrong with using all of your processor if it's doing actual work.
By default, NLS allocates 50% of available system RAM to elasticsearch. You can verify this in /etc/sysconfig/elasticsearch. It MAY be worthwhile to increase the ES_HEAP_SIZE to something bigger than 50% of RAM. You can test this out by changing the last line in this code:
Code: Select all
# Heap Size (defaults to 256m min, 1g max)
# Nagios Log Server Default to 0.5 physical Memory
ES_HEAP_SIZE=$(expr $(free -m|awk '/^Mem:/{print $2}') / 2 )m
Code: Select all
ES_HEAP_SIZE=5120m
Code: Select all
# Arguments to pass to java
#LS_HEAP_SIZE="256m"
LS_JAVA_OPTS="-Djava.io.tmpdir=$APP_DIR/tmp"
Code: Select all
LS_HEAP_SIZE="1024m"
Note that, no matter what you do, 100% CPU is fine. There's nothing wrong with using all of your processor if it's doing actual work.
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
Re: Java running at 100%
eloyd
thanks for the reply - I'm running as a vm and have uped memory to 8GB and added a second CPU. I made both of your recommended changes. I'll keep an eye on NLS and post again later.
Thanks you the advice.
thanks for the reply - I'm running as a vm and have uped memory to 8GB and added a second CPU. I made both of your recommended changes. I'll keep an eye on NLS and post again later.
Thanks you the advice.
Re: Java running at 100%
No problem. Again, it's a tuning issue so what works for one person may not work for another based on the overall workload that your server is under. Also, make sure not to allocate 100% of memory to ES and LS. You need some leftover for the OS. 
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
Re: Java running at 100%
eloyd wrote:No problem. Again, it's a tuning issue so what works for one person may not work for another based on the overall workload that your server is under. Also, make sure not to allocate 100% of memory to ES and LS. You need some leftover for the OS.
So far she is cooking along nicely, I'll let her run for the day and report back tomorrow.
Re: Java running at 100%
So after working with the memory settings here is what I've found. I have 30 mix of servers / workstations pushing logs to NLS without any problems. Problems start when ever I start pushing logs from 14 esxi hosts to NLS. I know that esxi spits out alot of logs but I'm surprised they send out enough to kill the NLS. As soon as I open up port 1514 the NLS cpu hits high 90% within 5 min.