Issues encountered on Nagios Log Server 2.0.0

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
Locked
esmie
Posts: 15
Joined: Sun Apr 23, 2017 7:44 pm

Issues encountered on Nagios Log Server 2.0.0

Post by esmie »

Hi we are trying to fix some issues on our NLS test environment, it suddenly stops working as it was before: (i.e. Unable to login using the ldap account, nothing showing on the dashboard, very slow response time on the NLS web. I've checked the cpu resources and java with user nagios is taking too much usage, please refer to attached screenshot.

We have updated - memory_limit first to 510M then to 1024M on /etc/php.ini due NLS Page Fails To Display, and have rebooted the server twice.

Excerpts from elasticsearch log:
[2018-01-03 08:21:52,863][DEBUG][action.bulk ] [test] observer: timeout notification from cluster service. timeout setting [1m], time since start [1.3m]
[2018-01-03 08:21:52,865][WARN ][monitor.jvm ] [test] [gc][old][916][147] duration [17.9s], collections [1]/[17.9s], total [17.9s]/[43.6m], memory [7.9gb]->[7.9gb]/[7.9gb], all_pools {[young] [532.5mb]->[532.5mb]/[532.5mb]}{[survivor] [61.8mb]->[63.1mb]/[66.5mb]}{[old] [7.3gb]->[7.3gb]/[7.3gb]}
[2018-01-03 08:22:16,879][WARN ][monitor.jvm ] [test] [gc][old][917][148] duration [23.9s], collections [1]/[24s], total [23.9s]/[44m], memory [7.9gb]->[7.9gb]/[7.9gb], all_pools {[young] [532.5mb]->[532.5mb]/[532.5mb]}{[survivor] [63.1mb]->[62.9mb]/[66.5mb]}{[old] [7.3gb]->[7.3gb]/[7.3gb]}
[2018-01-03 08:23:30,443][WARN ][monitor.jvm ] [test] [gc][old][918][151] duration [1.2m], collections [3]/[1.2m], total [1.2m]/[45.2m], memory [7.9gb]->[7.9gb]/[7.9gb], all_pools {[young] [532.5mb]->[532.5mb]/[532.5mb]}{[survivor] [62.9mb]->[64.2mb]/[66.5mb]}{[old] [7.3gb]->[7.3gb]/[7.3gb]}

Excerpts from logstash log:
{:timestamp=>"2018-01-03T07:51:23.040000+0100", :message=>"Pipeline main started"}

Additional Action done:
Updated LS_HEAP_SIZE=“1024m”
Updated ES_HEAP_SIZE=8g
You do not have the required permissions to view the files attached to this post.
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Issues encountered on Nagios Log Server 2.0.0

Post by mcapra »

I'm going to assume the ElasticSearch service was restarted after changing the ES_HEAP_SIZE variable.

I'm guessing the top process is ElasticSearch which is currently using 56% of your available memory. That coupled with the aggressive garbage collection in your ElasticSearch logs (~once every minute) leads me to believe ElasticSearch is exhausting it's available memory.

If adding more memory to the machine(s) is an option, I'd suggest that. If it's not, you could adjust the Snapshots & Maintenance settings, specifically the Close indexes older than setting, to be less generous. "Closed" indexes remain on-disk, but are not readily searchable until they are "opened".

It's generally recommended that ElasticSearch not hold more than 50% of the physical memory for it's heap. I'm not sure where your 8GB puts you but I thought this was worth mentioning.
Former Nagios employee
https://www.mcapra.com/
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Issues encountered on Nagios Log Server 2.0.0

Post by dwhitfield »

Thanks @mcapra!

OP, did you have any additional questions?
esmie
Posts: 15
Joined: Sun Apr 23, 2017 7:44 pm

Re: Issues encountered on Nagios Log Server 2.0.0

Post by esmie »

Thanks mcapra, we have 16g of memory, I've read that article too and hoping it would help in resolving the issue, that's why I have increased the ES_HEAP_SIZE. For now, we have deleted old indices (curator --host 127.0.0.1 delete indices --older-than 30 --time-unit days --timestring '%Y.%m.%d') and configure some settings on the Snapshots & Maintenance.

@dwhitfield no questions for now as I can now login using ldap account, see logs coming in and NLS web is already responsive. I'll monitor the new settings for now. Thank you.
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Issues encountered on Nagios Log Server 2.0.0

Post by dwhitfield »

Sounds good! If you have unrelated issues, please start another thread. I'll leave this open for now.
Locked