Three instance failure

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
Locked
myriad
Posts: 26
Joined: Tue Dec 02, 2014 1:29 pm

Three instance failure

Post by myriad »

Hi, I've got three instances, initially it was working great, then it has slowly started crumbling. I noticed a couple of bugs, and they've been fixed. but now I have a major issue.

1. After a period of up-time (the logsvrs) performance drops and the servers stall out to the point of failing to receive logs.
2. Only on a reset will they begin to receive logs again and only for a short amount of time.
3. Servers are vmware machines and have two processors, 8GB RAM, and 600 GB Drive space on tiered storage.

logcorp, logDC1, logDC2 clustername logsvr
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Three instance failure

Post by tmcdonald »

Some things to check:
  • Are elasticsearch and logstash running on all three machines?
  • Are your disks full on any of the three machines?
  • Do you have a SAN/NAS in use instead of local disks?
Former Nagios employee
myriad
Posts: 26
Joined: Tue Dec 02, 2014 1:29 pm

Re: Three instance failure

Post by myriad »

This was the Java VM running out of heap memory.
"increase this by setting ES_HEAP_SIZE in /etc/sysconfig/elasticsearch. The general recommendation is to give it half of the available RAM:
CODE: SELECT ALL
ES_HEAP_SIZE=8g
MAX_LOCKED_MEMORY=unlimited
That last line keeps the ES heap from being swapped if possible (system limits may need to be adjusted to allow this)."
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Three instance failure

Post by scottwilkerson »

@myriad - Has the system stabilized after increasing the heap?
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
Locked