Page 1 of 1

Three instance failure

Posted: Mon Dec 22, 2014 11:17 am
by myriad
Hi, I've got three instances, initially it was working great, then it has slowly started crumbling. I noticed a couple of bugs, and they've been fixed. but now I have a major issue.

1. After a period of up-time (the logsvrs) performance drops and the servers stall out to the point of failing to receive logs.
2. Only on a reset will they begin to receive logs again and only for a short amount of time.
3. Servers are vmware machines and have two processors, 8GB RAM, and 600 GB Drive space on tiered storage.

logcorp, logDC1, logDC2 clustername logsvr

Re: Three instance failure

Posted: Mon Dec 22, 2014 11:40 am
by tmcdonald
Some things to check:
  • Are elasticsearch and logstash running on all three machines?
  • Are your disks full on any of the three machines?
  • Do you have a SAN/NAS in use instead of local disks?

Re: Three instance failure

Posted: Mon Dec 22, 2014 4:33 pm
by myriad
This was the Java VM running out of heap memory.
"increase this by setting ES_HEAP_SIZE in /etc/sysconfig/elasticsearch. The general recommendation is to give it half of the available RAM:
CODE: SELECT ALL
ES_HEAP_SIZE=8g
MAX_LOCKED_MEMORY=unlimited
That last line keeps the ES heap from being swapped if possible (system limits may need to be adjusted to allow this)."

Re: Three instance failure

Posted: Mon Dec 22, 2014 5:35 pm
by scottwilkerson
@myriad - Has the system stabilized after increasing the heap?