WARN monitor.jvm

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
Locked
ssoliveira
Posts: 91
Joined: Wed Dec 07, 2016 6:02 pm

WARN monitor.jvm

Post by ssoliveira »

Good evening,

Today we had a problem with the NLS infrastructure.

We work with 4 servers in Cluster, with 30 days of open logs.

It was necessary to open last month's logs for analysis; and after opening the logs the environment became unavailable, presenting in the ElasticSearch log [WARN] [monitor.jvm]

ElasticSearch is configured to work with 31GB of memory; and with the "indices.fielddata.cache.size" parameter to "50%".

Is there any configuration we can do, so we can work with more open logs? It is common to have to open old logs for analysis.

Code: Select all

[2017-08-31 17:11:09,036][WARN ][monitor.jvm              ] [765cc658-3e5f-4923-804e-5eb57735f761] [gc][old][1711364][3089] duration [13.8s], collections [1]/[14.7s], total [13.8s]/[24.8m], memory [29.2gb]->[26.4gb]/[30.8gb], all_pools {[young] [922.1mb]->[3.6mb]/[1.1gb]}{[survivor] [149.7mb]->[0b]/[149.7mb]}{[old] [28.1gb]->[26.4gb]/[29.5gb]}

Code: Select all

# free -g
              total        used        free      shared  buff/cache   available
Mem:             70          35           0           3          34          31
Swap:             0           0           0

Code: Select all

# curl -XGET 'localhost:9200/_nodes/stats/jvm?pretty'
{
  "cluster_name" : "a5726a09-769e-4f2b-be91-d786c8165c6f",
  "nodes" : {
    "ntTAsuj2TNSV3FOQJBfiCA" : {
      "timestamp" : 1504213760252,
      "name" : "8d4f2dfb-f10c-4655-a4b7-8b5eaa9f6a3c",
      "transport_address" : "inet[/10.0.0.22:9300]",
      "host" : "datalog-utb-log2",
      "ip" : [ "inet[/10.0.0.22:9300]", "NONE" ],
      "attributes" : {
        "max_local_storage_nodes" : "1"
      },
      "jvm" : {
        "timestamp" : 1504213760252,
        "uptime_in_millis" : 2082572,
        "mem" : {
          "heap_used_in_bytes" : 22268511608,
          "heap_used_percent" : 67,
          "heap_committed_in_bytes" : 33128972288,
          "heap_max_in_bytes" : 33128972288,
          "non_heap_used_in_bytes" : 62824272,
          "non_heap_committed_in_bytes" : 94212096,
          "pools" : {
            "young" : {
              "used_in_bytes" : 2686672,
              "max_in_bytes" : 1256259584,
              "peak_used_in_bytes" : 1256259584,
              "peak_max_in_bytes" : 1256259584
            },
            "survivor" : {
              "used_in_bytes" : 0,
              "max_in_bytes" : 157024256,
              "peak_used_in_bytes" : 157024256,
              "peak_max_in_bytes" : 157024256
            },
            "old" : {
              "used_in_bytes" : 22265827016,
              "max_in_bytes" : 31715688448,
              "peak_used_in_bytes" : 31622600896,
              "peak_max_in_bytes" : 31715688448
            }
          }
        },
        "threads" : {
          "count" : 302,
          "peak_count" : 382
        },
        "gc" : {
          "collectors" : {
            "young" : {
              "collection_count" : 878,
              "collection_time_in_millis" : 122703
            },
            "old" : {
              "collection_count" : 50,
              "collection_time_in_millis" : 236842
            }
          }
        },
        "buffer_pools" : {
          "direct" : {
            "count" : 5851,
            "used_in_bytes" : 169727612,
            "total_capacity_in_bytes" : 169727612
          },
          "mapped" : {
            "count" : 533,
            "used_in_bytes" : 139664711705,
            "total_capacity_in_bytes" : 139664711705
          }
        }
      }
    },
    "-ZCSuPZZSUG5riiuyqsD2w" : {
      "timestamp" : 1504213749739,
      "name" : "5c998cfb-0460-4e56-8697-83b65c086a13",
      "transport_address" : "inet[/10.0.0.12:9300]",
      "host" : "datalog-ugt-log2",
      "ip" : [ "inet[/10.0.0.12:9300]", "NONE" ],
      "attributes" : {
        "max_local_storage_nodes" : "1",
        "master" : "false"
      },
      "jvm" : {
        "timestamp" : 1504213749739,
        "uptime_in_millis" : 1973949,
        "mem" : {
          "heap_used_in_bytes" : 22483162280,
          "heap_used_percent" : 67,
          "heap_committed_in_bytes" : 33128972288,
          "heap_max_in_bytes" : 33128972288,
          "non_heap_used_in_bytes" : 61457232,
          "non_heap_committed_in_bytes" : 92794880,
          "pools" : {
            "young" : {
              "used_in_bytes" : 946812288,
              "max_in_bytes" : 1256259584,
              "peak_used_in_bytes" : 1256259584,
              "peak_max_in_bytes" : 1256259584
            },
            "survivor" : {
              "used_in_bytes" : 157024256,
              "max_in_bytes" : 157024256,
              "peak_used_in_bytes" : 157024256,
              "peak_max_in_bytes" : 157024256
            },
            "old" : {
              "used_in_bytes" : 21379391288,
              "max_in_bytes" : 31715688448,
              "peak_used_in_bytes" : 31540103000,
              "peak_max_in_bytes" : 31715688448
            }
          }
        },
        "threads" : {
          "count" : 320,
          "peak_count" : 409
        },
        "gc" : {
          "collectors" : {
            "young" : {
              "collection_count" : 526,
              "collection_time_in_millis" : 97401
            },
            "old" : {
              "collection_count" : 49,
              "collection_time_in_millis" : 170133
            }
          }
        },
        "buffer_pools" : {
          "direct" : {
            "count" : 13990,
            "used_in_bytes" : 303305004,
            "total_capacity_in_bytes" : 303305004
          },
          "mapped" : {
            "count" : 428,
            "used_in_bytes" : 108184394759,
            "total_capacity_in_bytes" : 108184394759
          }
        }
      }
    },
    "vmHk6GKaQJClCVmR8oX_Fg" : {
      "timestamp" : 1504213749739,
      "name" : "8471b9e1-1a82-4c3d-98bc-03f2ce871369",
      "transport_address" : "inet[/10.0.0.11:9300]",
      "host" : "datalog-ugt-log1",
      "ip" : [ "inet[/10.0.0.11:9300]", "NONE" ],
      "attributes" : {
        "max_local_storage_nodes" : "1",
        "master" : "false"
      },
      "jvm" : {
        "timestamp" : 1504213749739,
        "uptime_in_millis" : 2108370,
        "mem" : {
          "heap_used_in_bytes" : 23976032112,
          "heap_used_percent" : 72,
          "heap_committed_in_bytes" : 33128972288,
          "heap_max_in_bytes" : 33128972288,
          "non_heap_used_in_bytes" : 52179192,
          "non_heap_committed_in_bytes" : 52756480,
          "pools" : {
            "young" : {
              "used_in_bytes" : 216668312,
              "max_in_bytes" : 1256259584,
              "peak_used_in_bytes" : 1256259584,
              "peak_max_in_bytes" : 1256259584
            },
            "survivor" : {
              "used_in_bytes" : 157024256,
              "max_in_bytes" : 157024256,
              "peak_used_in_bytes" : 157024256,
              "peak_max_in_bytes" : 157024256
            },
            "old" : {
              "used_in_bytes" : 23602339544,
              "max_in_bytes" : 31715688448,
              "peak_used_in_bytes" : 23602339544,
              "peak_max_in_bytes" : 31715688448
            }
          }
        },
        "threads" : {
          "count" : 221,
          "peak_count" : 256
        },
        "gc" : {
          "collectors" : {
            "young" : {
              "collection_count" : 230,
              "collection_time_in_millis" : 17153
            },
            "old" : {
              "collection_count" : 0,
              "collection_time_in_millis" : 0
            }
          }
        },
        "buffer_pools" : {
          "direct" : {
            "count" : 9434,
            "used_in_bytes" : 228473204,
            "total_capacity_in_bytes" : 228473204
          },
          "mapped" : {
            "count" : 44,
            "used_in_bytes" : 11007416023,
            "total_capacity_in_bytes" : 11007416023
          }
        }
      }
    },
    "XlBovMxySK-lLpiG3qJ63w" : {
      "timestamp" : 1504213749740,
      "name" : "765cc658-3e5f-4923-804e-5eb57735f761",
      "transport_address" : "inet[/10.0.0.21:9300]",
      "host" : "datalog-utb-log1",
      "ip" : [ "inet[/10.0.0.21:9300]", "NONE" ],
      "attributes" : {
        "max_local_storage_nodes" : "1"
      },
      "jvm" : {
        "timestamp" : 1504213749740,
        "uptime_in_millis" : 1318299,
        "mem" : {
          "heap_used_in_bytes" : 19578995856,
          "heap_used_percent" : 59,
          "heap_committed_in_bytes" : 33128972288,
          "heap_max_in_bytes" : 33128972288,
          "non_heap_used_in_bytes" : 53480728,
          "non_heap_committed_in_bytes" : 83423232,
          "pools" : {
            "young" : {
              "used_in_bytes" : 1044909736,
              "max_in_bytes" : 1256259584,
              "peak_used_in_bytes" : 1256259584,
              "peak_max_in_bytes" : 1256259584
            },
            "survivor" : {
              "used_in_bytes" : 63216832,
              "max_in_bytes" : 157024256,
              "peak_used_in_bytes" : 157024256,
              "peak_max_in_bytes" : 157024256
            },
            "old" : {
              "used_in_bytes" : 18470869288,
              "max_in_bytes" : 31715688448,
              "peak_used_in_bytes" : 27207926648,
              "peak_max_in_bytes" : 31715688448
            }
          }
        },
        "threads" : {
          "count" : 260,
          "peak_count" : 302
        },
        "gc" : {
          "collectors" : {
            "young" : {
              "collection_count" : 227,
              "collection_time_in_millis" : 20359
            },
            "old" : {
              "collection_count" : 1,
              "collection_time_in_millis" : 29
            }
          }
        },
        "buffer_pools" : {
          "direct" : {
            "count" : 3047,
            "used_in_bytes" : 119291193,
            "total_capacity_in_bytes" : 119291193
          },
          "mapped" : {
            "count" : 374,
            "used_in_bytes" : 91305805720,
            "total_capacity_in_bytes" : 91305805720
          }
        }
      }
    }
  }
}

Code: Select all

# curl -XGET 'localhost:9200/_nodes/jvm?pretty'
{
  "cluster_name" : "a5726a09-769e-4f2b-be91-d786c8165c6f",
  "nodes" : {
    "ntTAsuj2TNSV3FOQJBfiCA" : {
      "name" : "8d4f2dfb-f10c-4655-a4b7-8b5eaa9f6a3c",
      "transport_address" : "inet[/10.0.0.22:9300]",
      "host" : "datalog-utb-log2",
      "ip" : "10.0.0.22",
      "version" : "1.6.0",
      "build" : "cdd3ac4",
      "http_address" : "inet[localhost/127.0.0.1:9200]",
      "attributes" : {
        "max_local_storage_nodes" : "1"
      },
      "jvm" : {
        "pid" : 2602,
        "version" : "1.7.0_141",
        "vm_name" : "OpenJDK 64-Bit Server VM",
        "vm_version" : "24.141-b02",
        "vm_vendor" : "Oracle Corporation",
        "start_time_in_millis" : 1504211677680,
        "mem" : {
          "heap_init_in_bytes" : 33285996544,
          "heap_max_in_bytes" : 33128972288,
          "non_heap_init_in_bytes" : 24313856,
          "non_heap_max_in_bytes" : 224395264,
          "direct_max_in_bytes" : 33128972288
        },
        "gc_collectors" : [ "ParNew", "ConcurrentMarkSweep" ],
        "memory_pools" : [ "Code Cache", "Par Eden Space", "Par Survivor Space", "CMS Old Gen", "CMS Perm Gen" ]
      }
    },
    "-ZCSuPZZSUG5riiuyqsD2w" : {
      "name" : "5c998cfb-0460-4e56-8697-83b65c086a13",
      "transport_address" : "inet[/10.0.0.12:9300]",
      "host" : "datalog-ugt-log2",
      "ip" : "10.0.0.12",
      "version" : "1.6.0",
      "build" : "cdd3ac4",
      "http_address" : "inet[localhost/127.0.0.1:9200]",
      "attributes" : {
        "max_local_storage_nodes" : "1",
        "master" : "false"
      },
      "jvm" : {
        "pid" : 18039,
        "version" : "1.7.0_141",
        "vm_name" : "OpenJDK 64-Bit Server VM",
        "vm_version" : "24.141-b02",
        "vm_vendor" : "Oracle Corporation",
        "start_time_in_millis" : 1504211775790,
        "mem" : {
          "heap_init_in_bytes" : 33285996544,
          "heap_max_in_bytes" : 33128972288,
          "non_heap_init_in_bytes" : 24313856,
          "non_heap_max_in_bytes" : 224395264,
          "direct_max_in_bytes" : 33128972288
        },
        "gc_collectors" : [ "ParNew", "ConcurrentMarkSweep" ],
        "memory_pools" : [ "Code Cache", "Par Eden Space", "Par Survivor Space", "CMS Old Gen", "CMS Perm Gen" ]
      }
    },
    "vmHk6GKaQJClCVmR8oX_Fg" : {
      "name" : "8471b9e1-1a82-4c3d-98bc-03f2ce871369",
      "transport_address" : "inet[/10.0.0.11:9300]",
      "host" : "datalog-ugt-log1",
      "ip" : "10.0.0.11",
      "version" : "1.6.0",
      "build" : "cdd3ac4",
      "http_address" : "inet[localhost/127.0.0.1:9200]",
      "attributes" : {
        "max_local_storage_nodes" : "1",
        "master" : "false"
      },
      "jvm" : {
        "pid" : 23569,
        "version" : "1.7.0_141",
        "vm_name" : "OpenJDK 64-Bit Server VM",
        "vm_version" : "24.141-b02",
        "vm_vendor" : "Oracle Corporation",
        "start_time_in_millis" : 1504211641369,
        "mem" : {
          "heap_init_in_bytes" : 33285996544,
          "heap_max_in_bytes" : 33128972288,
          "non_heap_init_in_bytes" : 24313856,
          "non_heap_max_in_bytes" : 224395264,
          "direct_max_in_bytes" : 33128972288
        },
        "gc_collectors" : [ "ParNew", "ConcurrentMarkSweep" ],
        "memory_pools" : [ "Code Cache", "Par Eden Space", "Par Survivor Space", "CMS Old Gen", "CMS Perm Gen" ]
      }
    },
    "XlBovMxySK-lLpiG3qJ63w" : {
      "name" : "765cc658-3e5f-4923-804e-5eb57735f761",
      "transport_address" : "inet[/10.0.0.21:9300]",
      "host" : "datalog-utb-log1",
      "ip" : "10.0.0.21",
      "version" : "1.6.0",
      "build" : "cdd3ac4",
      "http_address" : "inet[localhost/127.0.0.1:9200]",
      "attributes" : {
        "max_local_storage_nodes" : "1"
      },
      "jvm" : {
        "pid" : 27655,
        "version" : "1.7.0_141",
        "vm_name" : "OpenJDK 64-Bit Server VM",
        "vm_version" : "24.141-b02",
        "vm_vendor" : "Oracle Corporation",
        "start_time_in_millis" : 1504212431441,
        "mem" : {
          "heap_init_in_bytes" : 33285996544,
          "heap_max_in_bytes" : 33128972288,
          "non_heap_init_in_bytes" : 24313856,
          "non_heap_max_in_bytes" : 224395264,
          "direct_max_in_bytes" : 33128972288
        },
        "gc_collectors" : [ "ParNew", "ConcurrentMarkSweep" ],
        "memory_pools" : [ "Code Cache", "Par Eden Space", "Par Survivor Space", "CMS Old Gen", "CMS Perm Gen" ]
      }
    }
  }
}
ssoliveira
Posts: 91
Joined: Wed Dec 07, 2016 6:02 pm

Re: WARN monitor.jvm

Post by ssoliveira »

Code: Select all

[2017-08-31 18:08:26,179][WARN ][monitor.jvm              ] [5c998cfb-0460-4e56-8697-83b65c086a13] [gc][old][1747][47] duration [11.1s], collections [1]/[11.4s], total [11.1s]/[2.8m], memory [29.7gb]->[20.5gb]/[30.8gb], all_pools {[young] [801.2mb]->[141.8mb]/[1.1gb]}{[survivor] [149.7mb]->[0b]/[149.7mb]}{[old] [28.7gb]->[20.4gb]/[29.5gb]}
[2017-08-31 18:14:16,494][WARN ][monitor.jvm              ] [5c998cfb-0460-4e56-8697-83b65c086a13] [gc][old][2085][55] duration [11.4s], collections [1]/[1s], total [11.4s]/[3m], memory [29.2gb]->[30.3gb]/[30.8gb], all_pools {[young] [739.4mb]->[1.1gb]/[1.1gb]}{[survivor] [149.7mb]->[149.7mb]/[149.7mb]}{[old] [28.4gb]->[29gb]/[29.5gb]}
[2017-08-31 18:14:36,811][WARN ][monitor.jvm              ] [5c998cfb-0460-4e56-8697-83b65c086a13] [gc][old][2095][57] duration [10.4s], collections [1]/[10.9s], total [10.4s]/[3.2m], memory [29.7gb]->[21.6gb]/[30.8gb], all_pools {[young] [335.9mb]->[177mb]/[1.1gb]}{[survivor] [149.7mb]->[0b]/[149.7mb]}{[old] [29.3gb]->[21.4gb]/[29.5gb]}
[2017-08-31 18:15:44,544][WARN ][monitor.jvm              ] [5c998cfb-0460-4e56-8697-83b65c086a13] [gc][old][2151][62] duration [11.1s], collections [1]/[12s], total [11.1s]/[3.4m], memory [29.9gb]->[22.7gb]/[30.8gb], all_pools {[young] [1.1gb]->[5mb]/[1.1gb]}{[survivor] [149.7mb]->[0b]/[149.7mb]}{[old] [28.7gb]->[22.7gb]/[29.5gb]}
ssoliveira
Posts: 91
Joined: Wed Dec 07, 2016 6:02 pm

Re: WARN monitor.jvm

Post by ssoliveira »

Memory Heap
You do not have the required permissions to view the files attached to this post.
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: WARN monitor.jvm

Post by mcapra »

Code: Select all

[2017-08-31 18:14:16,494][WARN ][monitor.jvm              ] [5c998cfb-0460-4e56-8697-83b65c086a13] [gc]
[2017-08-31 18:14:36,811][WARN ][monitor.jvm              ] [5c998cfb-0460-4e56-8697-83b65c086a13] [gc]
[2017-08-31 18:15:44,544][WARN ][monitor.jvm              ] [5c998cfb-0460-4e56-8697-83b65c086a13] [gc]
That's some relatively aggressive garbage collection. If you're hitting garbage collection every minute or so during this heavy analysis, and it's junking the heap, I don't think there's optimizations that can solve that. You simply require more heap space which you can get by adding memory to the machine(s) or closing indices that you are not currently using. Without knowing more about your data size, it's hard to say which route is best. Adding memory is certainly a much better catch-all, though.

As a note, closing the current day's index is always a bad idea. Having said that, if you regularly have say the most recent 2 weeks of days open, and you want to load the month of July in for analysis, you could close everything but the most recent day's index then begin the process of opening up July.
Former Nagios employee
https://www.mcapra.com/
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: WARN monitor.jvm

Post by cdienger »

Thanks for the assist, Matt. Ssoliveira, let us know if you have any questions.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
ssoliveira
Posts: 91
Joined: Wed Dec 07, 2016 6:02 pm

Re: WARN monitor.jvm

Post by ssoliveira »

Hi all,

Currently the server has free 31 GB of memory.
However, if I set ElasticSearch to use more than 32 GB, I'll probably have problems.

What do you suggest?

If I add more nodes to the cluster; can a good idea to solve the problem?
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: WARN monitor.jvm

Post by cdienger »

More nodes will help divide up the shards and lessen the load per machine. If you have the means to do so I would recommend this as a valid option.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
ssoliveira
Posts: 91
Joined: Wed Dec 07, 2016 6:02 pm

Re: WARN monitor.jvm

Post by ssoliveira »

Okay, thank you very much Cdienger
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: WARN monitor.jvm

Post by cdienger »

Is this something you can spin up quickly? We can keep the thread open and wait for an update if you'd like. Otherwise we can lock it and you can create a new thread if needed.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
ssoliveira
Posts: 91
Joined: Wed Dec 07, 2016 6:02 pm

Re: WARN monitor.jvm

Post by ssoliveira »

Hi, you can close this ticket.

I'll add more servers.

Thank you
Locked