Issues encountered on Nagios Log Server 2.0.0
Posted: Wed Jan 03, 2018 2:52 am
Hi we are trying to fix some issues on our NLS test environment, it suddenly stops working as it was before: (i.e. Unable to login using the ldap account, nothing showing on the dashboard, very slow response time on the NLS web. I've checked the cpu resources and java with user nagios is taking too much usage, please refer to attached screenshot.
We have updated - memory_limit first to 510M then to 1024M on /etc/php.ini due NLS Page Fails To Display, and have rebooted the server twice.
Excerpts from elasticsearch log:
[2018-01-03 08:21:52,863][DEBUG][action.bulk ] [test] observer: timeout notification from cluster service. timeout setting [1m], time since start [1.3m]
[2018-01-03 08:21:52,865][WARN ][monitor.jvm ] [test] [gc][old][916][147] duration [17.9s], collections [1]/[17.9s], total [17.9s]/[43.6m], memory [7.9gb]->[7.9gb]/[7.9gb], all_pools {[young] [532.5mb]->[532.5mb]/[532.5mb]}{[survivor] [61.8mb]->[63.1mb]/[66.5mb]}{[old] [7.3gb]->[7.3gb]/[7.3gb]}
[2018-01-03 08:22:16,879][WARN ][monitor.jvm ] [test] [gc][old][917][148] duration [23.9s], collections [1]/[24s], total [23.9s]/[44m], memory [7.9gb]->[7.9gb]/[7.9gb], all_pools {[young] [532.5mb]->[532.5mb]/[532.5mb]}{[survivor] [63.1mb]->[62.9mb]/[66.5mb]}{[old] [7.3gb]->[7.3gb]/[7.3gb]}
[2018-01-03 08:23:30,443][WARN ][monitor.jvm ] [test] [gc][old][918][151] duration [1.2m], collections [3]/[1.2m], total [1.2m]/[45.2m], memory [7.9gb]->[7.9gb]/[7.9gb], all_pools {[young] [532.5mb]->[532.5mb]/[532.5mb]}{[survivor] [62.9mb]->[64.2mb]/[66.5mb]}{[old] [7.3gb]->[7.3gb]/[7.3gb]}
Excerpts from logstash log:
{:timestamp=>"2018-01-03T07:51:23.040000+0100", :message=>"Pipeline main started"}
Additional Action done:
Updated LS_HEAP_SIZE=“1024m”
Updated ES_HEAP_SIZE=8g
We have updated - memory_limit first to 510M then to 1024M on /etc/php.ini due NLS Page Fails To Display, and have rebooted the server twice.
Excerpts from elasticsearch log:
[2018-01-03 08:21:52,863][DEBUG][action.bulk ] [test] observer: timeout notification from cluster service. timeout setting [1m], time since start [1.3m]
[2018-01-03 08:21:52,865][WARN ][monitor.jvm ] [test] [gc][old][916][147] duration [17.9s], collections [1]/[17.9s], total [17.9s]/[43.6m], memory [7.9gb]->[7.9gb]/[7.9gb], all_pools {[young] [532.5mb]->[532.5mb]/[532.5mb]}{[survivor] [61.8mb]->[63.1mb]/[66.5mb]}{[old] [7.3gb]->[7.3gb]/[7.3gb]}
[2018-01-03 08:22:16,879][WARN ][monitor.jvm ] [test] [gc][old][917][148] duration [23.9s], collections [1]/[24s], total [23.9s]/[44m], memory [7.9gb]->[7.9gb]/[7.9gb], all_pools {[young] [532.5mb]->[532.5mb]/[532.5mb]}{[survivor] [63.1mb]->[62.9mb]/[66.5mb]}{[old] [7.3gb]->[7.3gb]/[7.3gb]}
[2018-01-03 08:23:30,443][WARN ][monitor.jvm ] [test] [gc][old][918][151] duration [1.2m], collections [3]/[1.2m], total [1.2m]/[45.2m], memory [7.9gb]->[7.9gb]/[7.9gb], all_pools {[young] [532.5mb]->[532.5mb]/[532.5mb]}{[survivor] [62.9mb]->[64.2mb]/[66.5mb]}{[old] [7.3gb]->[7.3gb]/[7.3gb]}
Excerpts from logstash log:
{:timestamp=>"2018-01-03T07:51:23.040000+0100", :message=>"Pipeline main started"}
Additional Action done:
Updated LS_HEAP_SIZE=“1024m”
Updated ES_HEAP_SIZE=8g