Hi, I've got three instances, initially it was working great, then it has slowly started crumbling. I noticed a couple of bugs, and they've been fixed. but now I have a major issue.
1. After a period of up-time (the logsvrs) performance drops and the servers stall out to the point of failing to receive logs.
2. Only on a reset will they begin to receive logs again and only for a short amount of time.
3. Servers are vmware machines and have two processors, 8GB RAM, and 600 GB Drive space on tiered storage.
logcorp, logDC1, logDC2 clustername logsvr
Three instance failure
Re: Three instance failure
Some things to check:
- Are elasticsearch and logstash running on all three machines?
- Are your disks full on any of the three machines?
- Do you have a SAN/NAS in use instead of local disks?
Former Nagios employee
Re: Three instance failure
This was the Java VM running out of heap memory.
"increase this by setting ES_HEAP_SIZE in /etc/sysconfig/elasticsearch. The general recommendation is to give it half of the available RAM:
CODE: SELECT ALL
ES_HEAP_SIZE=8g
MAX_LOCKED_MEMORY=unlimited
That last line keeps the ES heap from being swapped if possible (system limits may need to be adjusted to allow this)."
"increase this by setting ES_HEAP_SIZE in /etc/sysconfig/elasticsearch. The general recommendation is to give it half of the available RAM:
CODE: SELECT ALL
ES_HEAP_SIZE=8g
MAX_LOCKED_MEMORY=unlimited
That last line keeps the ES heap from being swapped if possible (system limits may need to be adjusted to allow this)."
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Three instance failure
@myriad - Has the system stabilized after increasing the heap?