Nagios Support Forum

Posted: **Wed Feb 04, 2015 12:14 pm**

I had a developer drop a ton of SOLR and MongoDB logs into NLS via syslog and NLS isn't liking it.

I've got java processes consuming 100% CPU and logstash.log shows a ton of these:

Code: Select all

{:timestamp=>"2015-02-04T15:59:10.848000-0500", :message=>"Failed parsing date from field", :field=>"timestamp", :value=>"Feb  4 17:05:59", :exception=>java.lang.IllegalArgumentException: Invalid format: "Feb  4 17:05:59", :level=>:warn}

Restarting elasticsearch and logstash did nothing, CPU spike right away. Any way to clear these out so I can get proper filters in place for these logs?

Posted: **Wed Feb 04, 2015 1:37 pm**

If they are still streaming in I would stop the logstash service, modify the filter and apply it, and then restart the logstash service.

Posted: **Wed Feb 04, 2015 3:05 pm**

Not still streaming. Apparently, it's the elasticsearch java processes that are running wild.

Can't shut them down because I lose the web UI then. Can't use the web UI to get filter in correctly.

Assuming this is what I need:

Code: Select all

match => [ "logdate", "MMM dd YYY HH:mm:ss",
          "MMM  d YYY HH:mm:ss", "ISO8601" ]

To be able to handle this:

Code: Select all

2015-02-04T16:06:52.629+0000 [initandlisten] connection accepted from 10.89.6.205:35322 #10867 (2 connections now open)

along with standard ISO datetime stamps

Posted: **Wed Feb 04, 2015 5:06 pm**

vAJ wrote:Not still streaming. Apparently, it's the elasticsearch java processes that are running wild.

They could have been really backlogged, how many messages did you send?
How much memory is on the instances?
How many instances in your cluster?

vAJ wrote:Assuming this is what I need:
Code: Select all
    match => [ "logdate", "MMM dd YYY HH:mm:ss",
              "MMM  d YYY HH:mm:ss", "ISO8601" ]

This looks correct, especially because the previous items seemed to have the extra space which you have covered with "MMM d YYY HH:mm:ss"

Posted: **Wed Feb 04, 2015 5:15 pm**

Right now I'm still in POC.

1 instance: Your ova with 1CPU, 8GB Mem, 100GB partition on fast i/o datastore.

Symptoms were CPU pegged, 2 java processes taking up all of it. An hour or so ago, realized that was elasticsearch and found some threads about clearing the caches. Did that and CPU came down to reasonable levels, but still spiking to 100% on occasion.

Having VM admins give me more CPU for this instance.

Eventually, in production, the layout will be an instance at each datacenter. We have 10Gb wave between them, so replication latency will not be an issue. I still need to ramp up the data load on POC to determine what resources to dedicate in production.

Posted: **Thu Feb 05, 2015 8:11 am**

Glad to hear the CPU came down to an acceptable level. I would guess 2-4 CPU's would be good to help the processes from fighting for time.

If you are feeding massive amounts of logs too, elasticsearch loves memory, you can see this post on increasing the heap size if you add a bunch of RAM
http://support.nagios.com/forum/viewtop ... 32#p120532

Posted: **Thu Feb 05, 2015 4:51 pm**

Wow. Made a HUGE difference. The UI is much faster now.

Thanks Scott! We can close this now as it all appears to be running well.

Nagios Support Forum

Large volume of new logs dumped into NLS, now CPU pegged

Large volume of new logs dumped into NLS, now CPU pegged

Re: Large volume of new logs dumped into NLS, now CPU pegged

Re: Large volume of new logs dumped into NLS, now CPU pegged

Re: Large volume of new logs dumped into NLS, now CPU pegged

Re: Large volume of new logs dumped into NLS, now CPU pegged

Re: Large volume of new logs dumped into NLS, now CPU pegged

Re: Large volume of new logs dumped into NLS, now CPU pegged