Page 1 of 1

WARN monitor.jvm

Posted: Thu Aug 31, 2017 4:19 pm
by ssoliveira
Good evening,

Today we had a problem with the NLS infrastructure.

We work with 4 servers in Cluster, with 30 days of open logs.

It was necessary to open last month's logs for analysis; and after opening the logs the environment became unavailable, presenting in the ElasticSearch log [WARN] [monitor.jvm]

ElasticSearch is configured to work with 31GB of memory; and with the "indices.fielddata.cache.size" parameter to "50%".

Is there any configuration we can do, so we can work with more open logs? It is common to have to open old logs for analysis.

Code: Select all

[2017-08-31 17:11:09,036][WARN ][monitor.jvm              ] [765cc658-3e5f-4923-804e-5eb57735f761] [gc][old][1711364][3089] duration [13.8s], collections [1]/[14.7s], total [13.8s]/[24.8m], memory [29.2gb]->[26.4gb]/[30.8gb], all_pools {[young] [922.1mb]->[3.6mb]/[1.1gb]}{[survivor] [149.7mb]->[0b]/[149.7mb]}{[old] [28.1gb]->[26.4gb]/[29.5gb]}

Code: Select all

# free -g
              total        used        free      shared  buff/cache   available
Mem:             70          35           0           3          34          31
Swap:             0           0           0

Code: Select all

# curl -XGET 'localhost:9200/_nodes/stats/jvm?pretty'
{
  "cluster_name" : "a5726a09-769e-4f2b-be91-d786c8165c6f",
  "nodes" : {
    "ntTAsuj2TNSV3FOQJBfiCA" : {
      "timestamp" : 1504213760252,
      "name" : "8d4f2dfb-f10c-4655-a4b7-8b5eaa9f6a3c",
      "transport_address" : "inet[/10.0.0.22:9300]",
      "host" : "datalog-utb-log2",
      "ip" : [ "inet[/10.0.0.22:9300]", "NONE" ],
      "attributes" : {
        "max_local_storage_nodes" : "1"
      },
      "jvm" : {
        "timestamp" : 1504213760252,
        "uptime_in_millis" : 2082572,
        "mem" : {
          "heap_used_in_bytes" : 22268511608,
          "heap_used_percent" : 67,
          "heap_committed_in_bytes" : 33128972288,
          "heap_max_in_bytes" : 33128972288,
          "non_heap_used_in_bytes" : 62824272,
          "non_heap_committed_in_bytes" : 94212096,
          "pools" : {
            "young" : {
              "used_in_bytes" : 2686672,
              "max_in_bytes" : 1256259584,
              "peak_used_in_bytes" : 1256259584,
              "peak_max_in_bytes" : 1256259584
            },
            "survivor" : {
              "used_in_bytes" : 0,
              "max_in_bytes" : 157024256,
              "peak_used_in_bytes" : 157024256,
              "peak_max_in_bytes" : 157024256
            },
            "old" : {
              "used_in_bytes" : 22265827016,
              "max_in_bytes" : 31715688448,
              "peak_used_in_bytes" : 31622600896,
              "peak_max_in_bytes" : 31715688448
            }
          }
        },
        "threads" : {
          "count" : 302,
          "peak_count" : 382
        },
        "gc" : {
          "collectors" : {
            "young" : {
              "collection_count" : 878,
              "collection_time_in_millis" : 122703
            },
            "old" : {
              "collection_count" : 50,
              "collection_time_in_millis" : 236842
            }
          }
        },
        "buffer_pools" : {
          "direct" : {
            "count" : 5851,
            "used_in_bytes" : 169727612,
            "total_capacity_in_bytes" : 169727612
          },
          "mapped" : {
            "count" : 533,
            "used_in_bytes" : 139664711705,
            "total_capacity_in_bytes" : 139664711705
          }
        }
      }
    },
    "-ZCSuPZZSUG5riiuyqsD2w" : {
      "timestamp" : 1504213749739,
      "name" : "5c998cfb-0460-4e56-8697-83b65c086a13",
      "transport_address" : "inet[/10.0.0.12:9300]",
      "host" : "datalog-ugt-log2",
      "ip" : [ "inet[/10.0.0.12:9300]", "NONE" ],
      "attributes" : {
        "max_local_storage_nodes" : "1",
        "master" : "false"
      },
      "jvm" : {
        "timestamp" : 1504213749739,
        "uptime_in_millis" : 1973949,
        "mem" : {
          "heap_used_in_bytes" : 22483162280,
          "heap_used_percent" : 67,
          "heap_committed_in_bytes" : 33128972288,
          "heap_max_in_bytes" : 33128972288,
          "non_heap_used_in_bytes" : 61457232,
          "non_heap_committed_in_bytes" : 92794880,
          "pools" : {
            "young" : {
              "used_in_bytes" : 946812288,
              "max_in_bytes" : 1256259584,
              "peak_used_in_bytes" : 1256259584,
              "peak_max_in_bytes" : 1256259584
            },
            "survivor" : {
              "used_in_bytes" : 157024256,
              "max_in_bytes" : 157024256,
              "peak_used_in_bytes" : 157024256,
              "peak_max_in_bytes" : 157024256
            },
            "old" : {
              "used_in_bytes" : 21379391288,
              "max_in_bytes" : 31715688448,
              "peak_used_in_bytes" : 31540103000,
              "peak_max_in_bytes" : 31715688448
            }
          }
        },
        "threads" : {
          "count" : 320,
          "peak_count" : 409
        },
        "gc" : {
          "collectors" : {
            "young" : {
              "collection_count" : 526,
              "collection_time_in_millis" : 97401
            },
            "old" : {
              "collection_count" : 49,
              "collection_time_in_millis" : 170133
            }
          }
        },
        "buffer_pools" : {
          "direct" : {
            "count" : 13990,
            "used_in_bytes" : 303305004,
            "total_capacity_in_bytes" : 303305004
          },
          "mapped" : {
            "count" : 428,
            "used_in_bytes" : 108184394759,
            "total_capacity_in_bytes" : 108184394759
          }
        }
      }
    },
    "vmHk6GKaQJClCVmR8oX_Fg" : {
      "timestamp" : 1504213749739,
      "name" : "8471b9e1-1a82-4c3d-98bc-03f2ce871369",
      "transport_address" : "inet[/10.0.0.11:9300]",
      "host" : "datalog-ugt-log1",
      "ip" : [ "inet[/10.0.0.11:9300]", "NONE" ],
      "attributes" : {
        "max_local_storage_nodes" : "1",
        "master" : "false"
      },
      "jvm" : {
        "timestamp" : 1504213749739,
        "uptime_in_millis" : 2108370,
        "mem" : {
          "heap_used_in_bytes" : 23976032112,
          "heap_used_percent" : 72,
          "heap_committed_in_bytes" : 33128972288,
          "heap_max_in_bytes" : 33128972288,
          "non_heap_used_in_bytes" : 52179192,
          "non_heap_committed_in_bytes" : 52756480,
          "pools" : {
            "young" : {
              "used_in_bytes" : 216668312,
              "max_in_bytes" : 1256259584,
              "peak_used_in_bytes" : 1256259584,
              "peak_max_in_bytes" : 1256259584
            },
            "survivor" : {
              "used_in_bytes" : 157024256,
              "max_in_bytes" : 157024256,
              "peak_used_in_bytes" : 157024256,
              "peak_max_in_bytes" : 157024256
            },
            "old" : {
              "used_in_bytes" : 23602339544,
              "max_in_bytes" : 31715688448,
              "peak_used_in_bytes" : 23602339544,
              "peak_max_in_bytes" : 31715688448
            }
          }
        },
        "threads" : {
          "count" : 221,
          "peak_count" : 256
        },
        "gc" : {
          "collectors" : {
            "young" : {
              "collection_count" : 230,
              "collection_time_in_millis" : 17153
            },
            "old" : {
              "collection_count" : 0,
              "collection_time_in_millis" : 0
            }
          }
        },
        "buffer_pools" : {
          "direct" : {
            "count" : 9434,
            "used_in_bytes" : 228473204,
            "total_capacity_in_bytes" : 228473204
          },
          "mapped" : {
            "count" : 44,
            "used_in_bytes" : 11007416023,
            "total_capacity_in_bytes" : 11007416023
          }
        }
      }
    },
    "XlBovMxySK-lLpiG3qJ63w" : {
      "timestamp" : 1504213749740,
      "name" : "765cc658-3e5f-4923-804e-5eb57735f761",
      "transport_address" : "inet[/10.0.0.21:9300]",
      "host" : "datalog-utb-log1",
      "ip" : [ "inet[/10.0.0.21:9300]", "NONE" ],
      "attributes" : {
        "max_local_storage_nodes" : "1"
      },
      "jvm" : {
        "timestamp" : 1504213749740,
        "uptime_in_millis" : 1318299,
        "mem" : {
          "heap_used_in_bytes" : 19578995856,
          "heap_used_percent" : 59,
          "heap_committed_in_bytes" : 33128972288,
          "heap_max_in_bytes" : 33128972288,
          "non_heap_used_in_bytes" : 53480728,
          "non_heap_committed_in_bytes" : 83423232,
          "pools" : {
            "young" : {
              "used_in_bytes" : 1044909736,
              "max_in_bytes" : 1256259584,
              "peak_used_in_bytes" : 1256259584,
              "peak_max_in_bytes" : 1256259584
            },
            "survivor" : {
              "used_in_bytes" : 63216832,
              "max_in_bytes" : 157024256,
              "peak_used_in_bytes" : 157024256,
              "peak_max_in_bytes" : 157024256
            },
            "old" : {
              "used_in_bytes" : 18470869288,
              "max_in_bytes" : 31715688448,
              "peak_used_in_bytes" : 27207926648,
              "peak_max_in_bytes" : 31715688448
            }
          }
        },
        "threads" : {
          "count" : 260,
          "peak_count" : 302
        },
        "gc" : {
          "collectors" : {
            "young" : {
              "collection_count" : 227,
              "collection_time_in_millis" : 20359
            },
            "old" : {
              "collection_count" : 1,
              "collection_time_in_millis" : 29
            }
          }
        },
        "buffer_pools" : {
          "direct" : {
            "count" : 3047,
            "used_in_bytes" : 119291193,
            "total_capacity_in_bytes" : 119291193
          },
          "mapped" : {
            "count" : 374,
            "used_in_bytes" : 91305805720,
            "total_capacity_in_bytes" : 91305805720
          }
        }
      }
    }
  }
}

Code: Select all

# curl -XGET 'localhost:9200/_nodes/jvm?pretty'
{
  "cluster_name" : "a5726a09-769e-4f2b-be91-d786c8165c6f",
  "nodes" : {
    "ntTAsuj2TNSV3FOQJBfiCA" : {
      "name" : "8d4f2dfb-f10c-4655-a4b7-8b5eaa9f6a3c",
      "transport_address" : "inet[/10.0.0.22:9300]",
      "host" : "datalog-utb-log2",
      "ip" : "10.0.0.22",
      "version" : "1.6.0",
      "build" : "cdd3ac4",
      "http_address" : "inet[localhost/127.0.0.1:9200]",
      "attributes" : {
        "max_local_storage_nodes" : "1"
      },
      "jvm" : {
        "pid" : 2602,
        "version" : "1.7.0_141",
        "vm_name" : "OpenJDK 64-Bit Server VM",
        "vm_version" : "24.141-b02",
        "vm_vendor" : "Oracle Corporation",
        "start_time_in_millis" : 1504211677680,
        "mem" : {
          "heap_init_in_bytes" : 33285996544,
          "heap_max_in_bytes" : 33128972288,
          "non_heap_init_in_bytes" : 24313856,
          "non_heap_max_in_bytes" : 224395264,
          "direct_max_in_bytes" : 33128972288
        },
        "gc_collectors" : [ "ParNew", "ConcurrentMarkSweep" ],
        "memory_pools" : [ "Code Cache", "Par Eden Space", "Par Survivor Space", "CMS Old Gen", "CMS Perm Gen" ]
      }
    },
    "-ZCSuPZZSUG5riiuyqsD2w" : {
      "name" : "5c998cfb-0460-4e56-8697-83b65c086a13",
      "transport_address" : "inet[/10.0.0.12:9300]",
      "host" : "datalog-ugt-log2",
      "ip" : "10.0.0.12",
      "version" : "1.6.0",
      "build" : "cdd3ac4",
      "http_address" : "inet[localhost/127.0.0.1:9200]",
      "attributes" : {
        "max_local_storage_nodes" : "1",
        "master" : "false"
      },
      "jvm" : {
        "pid" : 18039,
        "version" : "1.7.0_141",
        "vm_name" : "OpenJDK 64-Bit Server VM",
        "vm_version" : "24.141-b02",
        "vm_vendor" : "Oracle Corporation",
        "start_time_in_millis" : 1504211775790,
        "mem" : {
          "heap_init_in_bytes" : 33285996544,
          "heap_max_in_bytes" : 33128972288,
          "non_heap_init_in_bytes" : 24313856,
          "non_heap_max_in_bytes" : 224395264,
          "direct_max_in_bytes" : 33128972288
        },
        "gc_collectors" : [ "ParNew", "ConcurrentMarkSweep" ],
        "memory_pools" : [ "Code Cache", "Par Eden Space", "Par Survivor Space", "CMS Old Gen", "CMS Perm Gen" ]
      }
    },
    "vmHk6GKaQJClCVmR8oX_Fg" : {
      "name" : "8471b9e1-1a82-4c3d-98bc-03f2ce871369",
      "transport_address" : "inet[/10.0.0.11:9300]",
      "host" : "datalog-ugt-log1",
      "ip" : "10.0.0.11",
      "version" : "1.6.0",
      "build" : "cdd3ac4",
      "http_address" : "inet[localhost/127.0.0.1:9200]",
      "attributes" : {
        "max_local_storage_nodes" : "1",
        "master" : "false"
      },
      "jvm" : {
        "pid" : 23569,
        "version" : "1.7.0_141",
        "vm_name" : "OpenJDK 64-Bit Server VM",
        "vm_version" : "24.141-b02",
        "vm_vendor" : "Oracle Corporation",
        "start_time_in_millis" : 1504211641369,
        "mem" : {
          "heap_init_in_bytes" : 33285996544,
          "heap_max_in_bytes" : 33128972288,
          "non_heap_init_in_bytes" : 24313856,
          "non_heap_max_in_bytes" : 224395264,
          "direct_max_in_bytes" : 33128972288
        },
        "gc_collectors" : [ "ParNew", "ConcurrentMarkSweep" ],
        "memory_pools" : [ "Code Cache", "Par Eden Space", "Par Survivor Space", "CMS Old Gen", "CMS Perm Gen" ]
      }
    },
    "XlBovMxySK-lLpiG3qJ63w" : {
      "name" : "765cc658-3e5f-4923-804e-5eb57735f761",
      "transport_address" : "inet[/10.0.0.21:9300]",
      "host" : "datalog-utb-log1",
      "ip" : "10.0.0.21",
      "version" : "1.6.0",
      "build" : "cdd3ac4",
      "http_address" : "inet[localhost/127.0.0.1:9200]",
      "attributes" : {
        "max_local_storage_nodes" : "1"
      },
      "jvm" : {
        "pid" : 27655,
        "version" : "1.7.0_141",
        "vm_name" : "OpenJDK 64-Bit Server VM",
        "vm_version" : "24.141-b02",
        "vm_vendor" : "Oracle Corporation",
        "start_time_in_millis" : 1504212431441,
        "mem" : {
          "heap_init_in_bytes" : 33285996544,
          "heap_max_in_bytes" : 33128972288,
          "non_heap_init_in_bytes" : 24313856,
          "non_heap_max_in_bytes" : 224395264,
          "direct_max_in_bytes" : 33128972288
        },
        "gc_collectors" : [ "ParNew", "ConcurrentMarkSweep" ],
        "memory_pools" : [ "Code Cache", "Par Eden Space", "Par Survivor Space", "CMS Old Gen", "CMS Perm Gen" ]
      }
    }
  }
}

Re: WARN monitor.jvm

Posted: Thu Aug 31, 2017 4:21 pm
by ssoliveira

Code: Select all

[2017-08-31 18:08:26,179][WARN ][monitor.jvm              ] [5c998cfb-0460-4e56-8697-83b65c086a13] [gc][old][1747][47] duration [11.1s], collections [1]/[11.4s], total [11.1s]/[2.8m], memory [29.7gb]->[20.5gb]/[30.8gb], all_pools {[young] [801.2mb]->[141.8mb]/[1.1gb]}{[survivor] [149.7mb]->[0b]/[149.7mb]}{[old] [28.7gb]->[20.4gb]/[29.5gb]}
[2017-08-31 18:14:16,494][WARN ][monitor.jvm              ] [5c998cfb-0460-4e56-8697-83b65c086a13] [gc][old][2085][55] duration [11.4s], collections [1]/[1s], total [11.4s]/[3m], memory [29.2gb]->[30.3gb]/[30.8gb], all_pools {[young] [739.4mb]->[1.1gb]/[1.1gb]}{[survivor] [149.7mb]->[149.7mb]/[149.7mb]}{[old] [28.4gb]->[29gb]/[29.5gb]}
[2017-08-31 18:14:36,811][WARN ][monitor.jvm              ] [5c998cfb-0460-4e56-8697-83b65c086a13] [gc][old][2095][57] duration [10.4s], collections [1]/[10.9s], total [10.4s]/[3.2m], memory [29.7gb]->[21.6gb]/[30.8gb], all_pools {[young] [335.9mb]->[177mb]/[1.1gb]}{[survivor] [149.7mb]->[0b]/[149.7mb]}{[old] [29.3gb]->[21.4gb]/[29.5gb]}
[2017-08-31 18:15:44,544][WARN ][monitor.jvm              ] [5c998cfb-0460-4e56-8697-83b65c086a13] [gc][old][2151][62] duration [11.1s], collections [1]/[12s], total [11.1s]/[3.4m], memory [29.9gb]->[22.7gb]/[30.8gb], all_pools {[young] [1.1gb]->[5mb]/[1.1gb]}{[survivor] [149.7mb]->[0b]/[149.7mb]}{[old] [28.7gb]->[22.7gb]/[29.5gb]}

Re: WARN monitor.jvm

Posted: Thu Aug 31, 2017 4:29 pm
by ssoliveira
Memory Heap

Re: WARN monitor.jvm

Posted: Fri Sep 01, 2017 8:11 am
by mcapra

Code: Select all

[2017-08-31 18:14:16,494][WARN ][monitor.jvm              ] [5c998cfb-0460-4e56-8697-83b65c086a13] [gc]
[2017-08-31 18:14:36,811][WARN ][monitor.jvm              ] [5c998cfb-0460-4e56-8697-83b65c086a13] [gc]
[2017-08-31 18:15:44,544][WARN ][monitor.jvm              ] [5c998cfb-0460-4e56-8697-83b65c086a13] [gc]
That's some relatively aggressive garbage collection. If you're hitting garbage collection every minute or so during this heavy analysis, and it's junking the heap, I don't think there's optimizations that can solve that. You simply require more heap space which you can get by adding memory to the machine(s) or closing indices that you are not currently using. Without knowing more about your data size, it's hard to say which route is best. Adding memory is certainly a much better catch-all, though.

As a note, closing the current day's index is always a bad idea. Having said that, if you regularly have say the most recent 2 weeks of days open, and you want to load the month of July in for analysis, you could close everything but the most recent day's index then begin the process of opening up July.

Re: WARN monitor.jvm

Posted: Fri Sep 01, 2017 9:23 am
by cdienger
Thanks for the assist, Matt. Ssoliveira, let us know if you have any questions.

Re: WARN monitor.jvm

Posted: Fri Sep 01, 2017 11:22 am
by ssoliveira
Hi all,

Currently the server has free 31 GB of memory.
However, if I set ElasticSearch to use more than 32 GB, I'll probably have problems.

What do you suggest?

If I add more nodes to the cluster; can a good idea to solve the problem?

Re: WARN monitor.jvm

Posted: Fri Sep 01, 2017 11:57 am
by cdienger
More nodes will help divide up the shards and lessen the load per machine. If you have the means to do so I would recommend this as a valid option.

Re: WARN monitor.jvm

Posted: Fri Sep 01, 2017 1:09 pm
by ssoliveira
Okay, thank you very much Cdienger

Re: WARN monitor.jvm

Posted: Fri Sep 01, 2017 1:49 pm
by cdienger
Is this something you can spin up quickly? We can keep the thread open and wait for an update if you'd like. Otherwise we can lock it and you can create a new thread if needed.

Re: WARN monitor.jvm

Posted: Fri Sep 01, 2017 6:02 pm
by ssoliveira
Hi, you can close this ticket.

I'll add more servers.

Thank you