Hi we are trying to fix some issues on our NLS test environment, it suddenly stops working as it was before: (i.e. Unable to login using the ldap account, nothing showing on the dashboard, very slow response time on the NLS web. I've checked the cpu resources and java with user nagios is taking too much usage, please refer to attached screenshot.
We have updated - memory_limit first to 510M then to 1024M on /etc/php.ini due NLS Page Fails To Display, and have rebooted the server twice.
Excerpts from elasticsearch log:
[2018-01-03 08:21:52,863][DEBUG][action.bulk ] [test] observer: timeout notification from cluster service. timeout setting [1m], time since start [1.3m]
[2018-01-03 08:21:52,865][WARN ][monitor.jvm ] [test] [gc][old][916][147] duration [17.9s], collections [1]/[17.9s], total [17.9s]/[43.6m], memory [7.9gb]->[7.9gb]/[7.9gb], all_pools {[young] [532.5mb]->[532.5mb]/[532.5mb]}{[survivor] [61.8mb]->[63.1mb]/[66.5mb]}{[old] [7.3gb]->[7.3gb]/[7.3gb]}
[2018-01-03 08:22:16,879][WARN ][monitor.jvm ] [test] [gc][old][917][148] duration [23.9s], collections [1]/[24s], total [23.9s]/[44m], memory [7.9gb]->[7.9gb]/[7.9gb], all_pools {[young] [532.5mb]->[532.5mb]/[532.5mb]}{[survivor] [63.1mb]->[62.9mb]/[66.5mb]}{[old] [7.3gb]->[7.3gb]/[7.3gb]}
[2018-01-03 08:23:30,443][WARN ][monitor.jvm ] [test] [gc][old][918][151] duration [1.2m], collections [3]/[1.2m], total [1.2m]/[45.2m], memory [7.9gb]->[7.9gb]/[7.9gb], all_pools {[young] [532.5mb]->[532.5mb]/[532.5mb]}{[survivor] [62.9mb]->[64.2mb]/[66.5mb]}{[old] [7.3gb]->[7.3gb]/[7.3gb]}
Excerpts from logstash log:
{:timestamp=>"2018-01-03T07:51:23.040000+0100", :message=>"Pipeline main started"}
Additional Action done:
Updated LS_HEAP_SIZE=“1024m”
Updated ES_HEAP_SIZE=8g
Issues encountered on Nagios Log Server 2.0.0
Issues encountered on Nagios Log Server 2.0.0
You do not have the required permissions to view the files attached to this post.
Re: Issues encountered on Nagios Log Server 2.0.0
I'm going to assume the ElasticSearch service was restarted after changing the ES_HEAP_SIZE variable.
I'm guessing the top process is ElasticSearch which is currently using 56% of your available memory. That coupled with the aggressive garbage collection in your ElasticSearch logs (~once every minute) leads me to believe ElasticSearch is exhausting it's available memory.
If adding more memory to the machine(s) is an option, I'd suggest that. If it's not, you could adjust the Snapshots & Maintenance settings, specifically the Close indexes older than setting, to be less generous. "Closed" indexes remain on-disk, but are not readily searchable until they are "opened".
It's generally recommended that ElasticSearch not hold more than 50% of the physical memory for it's heap. I'm not sure where your 8GB puts you but I thought this was worth mentioning.
I'm guessing the top process is ElasticSearch which is currently using 56% of your available memory. That coupled with the aggressive garbage collection in your ElasticSearch logs (~once every minute) leads me to believe ElasticSearch is exhausting it's available memory.
If adding more memory to the machine(s) is an option, I'd suggest that. If it's not, you could adjust the Snapshots & Maintenance settings, specifically the Close indexes older than setting, to be less generous. "Closed" indexes remain on-disk, but are not readily searchable until they are "opened".
It's generally recommended that ElasticSearch not hold more than 50% of the physical memory for it's heap. I'm not sure where your 8GB puts you but I thought this was worth mentioning.
Former Nagios employee
https://www.mcapra.com/
https://www.mcapra.com/
-
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: Issues encountered on Nagios Log Server 2.0.0
Thanks mcapra, we have 16g of memory, I've read that article too and hoping it would help in resolving the issue, that's why I have increased the ES_HEAP_SIZE. For now, we have deleted old indices (curator --host 127.0.0.1 delete indices --older-than 30 --time-unit days --timestring '%Y.%m.%d') and configure some settings on the Snapshots & Maintenance.
@dwhitfield no questions for now as I can now login using ldap account, see logs coming in and NLS web is already responsive. I'll monitor the new settings for now. Thank you.
@dwhitfield no questions for now as I can now login using ldap account, see logs coming in and NLS web is already responsive. I'll monitor the new settings for now. Thank you.
-
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: Issues encountered on Nagios Log Server 2.0.0
Sounds good! If you have unrelated issues, please start another thread. I'll leave this open for now.