Three instance failure

myriad · Post by **myriad** » Mon Dec 22, 2014 11:17 am

Hi, I've got three instances, initially it was working great, then it has slowly started crumbling. I noticed a couple of bugs, and they've been fixed. but now I have a major issue.

1. After a period of up-time (the logsvrs) performance drops and the servers stall out to the point of failing to receive logs.
2. Only on a reset will they begin to receive logs again and only for a short amount of time.
3. Servers are vmware machines and have two processors, 8GB RAM, and 600 GB Drive space on tiered storage.

logcorp, logDC1, logDC2 clustername logsvr

tmcdonald · Post by **tmcdonald** » Mon Dec 22, 2014 11:40 am

Some things to check:

Are elasticsearch and logstash running on all three machines?
Are your disks full on any of the three machines?
Do you have a SAN/NAS in use instead of local disks?

myriad · Post by **myriad** » Mon Dec 22, 2014 4:33 pm

This was the Java VM running out of heap memory.
"increase this by setting ES_HEAP_SIZE in /etc/sysconfig/elasticsearch. The general recommendation is to give it half of the available RAM:
CODE: SELECT ALL
ES_HEAP_SIZE=8g
MAX_LOCKED_MEMORY=unlimited
That last line keeps the ES heap from being swapped if possible (system limits may need to be adjusted to allow this)."

scottwilkerson · Post by **scottwilkerson** » Mon Dec 22, 2014 5:35 pm

@myriad - Has the system stabilized after increasing the heap?

Nagios Support Forum

Three instance failure

Three instance failure

Re: Three instance failure

Re: Three instance failure

Re: Three instance failure