Elasticsearch tuning
Elasticsearch tuning
Hello,
I was wondering if there is any tuning for Elasticsearch like there is for Logstash?
Specifically I'm referring to this article: https://support.nagios.com/kb/article/n ... g-576.html
Whenever my Log Server environment crashes and I have to restart the services, I always get a Java error when I attempt to restart Elasticsearch, but never for Logstash.
The error implies there isn't enough RAM available to support the Java process.
Thank you.
I was wondering if there is any tuning for Elasticsearch like there is for Logstash?
Specifically I'm referring to this article: https://support.nagios.com/kb/article/n ... g-576.html
Whenever my Log Server environment crashes and I have to restart the services, I always get a Java error when I attempt to restart Elasticsearch, but never for Logstash.
The error implies there isn't enough RAM available to support the Java process.
Thank you.
Re: Elasticsearch tuning
You're seeing memory errors when you try to restart? It sounds like it may need a moment to free the memory. When you restart it, run "service elasticsearch stop; ps aux | grep elasticsearch" to make sure the Elasticsearch has stopped and then starting it back up.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Elasticsearch tuning
Ok, I'll try to stop it first next time.
Re: Elasticsearch tuning
Sounds good. Keep us posted.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Elasticsearch tuning
My Log Server environment was hung this morning. I attempted to stop Elasticsearch instead of restarting it and it failed to stop the service.
Stopping elasticsearch: [FAILED]
Then when I try to restart it, I get this:
Starting elasticsearch: [ OK ]
[root@nagioslscc2 ~]# OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f3dd1b30000, 33324597248, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 33324597248 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /tmp/hs_err_pid13033.log
This is what I've been getting every time on this particular box. It has the same amount of RAM as the other 2 nodes, but seems to run as the "primary" all the time.
Stopping elasticsearch: [FAILED]
Then when I try to restart it, I get this:
Starting elasticsearch: [ OK ]
[root@nagioslscc2 ~]# OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f3dd1b30000, 33324597248, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 33324597248 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /tmp/hs_err_pid13033.log
This is what I've been getting every time on this particular box. It has the same amount of RAM as the other 2 nodes, but seems to run as the "primary" all the time.
Re: Elasticsearch tuning
Here's the log the error is referencing.
You do not have the required permissions to view the files attached to this post.
Re: Elasticsearch tuning
Thanks for posting that log file!
It looks like Java is trying to commit more memory than is available on the system.
and send the output to us? It may give us another hint as to why the service is hung.
Can you PM me a profile of the system the next time Elasticsearch is in a hung state?
It looks like Java is trying to commit more memory than is available on the system.
Do the other servers in your environment have more swap space? I wonder if the ~30GB of memory used at this point is Elasticsearch still in memory.Native memory allocation (mmap) failed to map 33324597248 bytes for committing reserved memory.
...
Memory: 4k page, physical 66109536k(3027960k free), swap 262140k(230244k free)
Can you expand on this a little? What do you mean by it running as the primary? What are you looking at to determine this?but seems to run as the "primary" all the time.
When you get this message, can you run,Stopping elasticsearch: [FAILED]
Code: Select all
journalctl -xeCan you PM me a profile of the system the next time Elasticsearch is in a hung state?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Elasticsearch tuning
When I say "primary" what I mean to say is that it always seems like there is one server working harder than the others in our cluster.
For example, the server we're talking about now has 34GBs of memory active where as the other two only have 17GBs and the CPUs seem to be running at a higher usage than the other servers. I am basing all of this on the statistics in vmWare/vCenter.
Here's the swap information (you can see the swap space is less on the server we're talking about):
NAGIOSLSCC2
total used free shared buffers cached
Mem: 66109536 65742956 366580 96 30604 29592048
-/+ buffers/cache: 36120304 29989232
Swap: 262140 217192 44948
NAGIOSLSCC1
total used free shared buff/cache available
Mem: 65789932 35339688 658528 20592 29791716 29719240
Swap: 4190204 26212 4163992
NAGIOSLSCC3
total used free shared buff/cache available
Mem: 66821096 35768048 355428 20700 30697620 30319020
Swap: 4190204 0 4190204
For example, the server we're talking about now has 34GBs of memory active where as the other two only have 17GBs and the CPUs seem to be running at a higher usage than the other servers. I am basing all of this on the statistics in vmWare/vCenter.
Here's the swap information (you can see the swap space is less on the server we're talking about):
NAGIOSLSCC2
total used free shared buffers cached
Mem: 66109536 65742956 366580 96 30604 29592048
-/+ buffers/cache: 36120304 29989232
Swap: 262140 217192 44948
NAGIOSLSCC1
total used free shared buff/cache available
Mem: 65789932 35339688 658528 20592 29791716 29719240
Swap: 4190204 26212 4163992
NAGIOSLSCC3
total used free shared buff/cache available
Mem: 66821096 35768048 355428 20700 30697620 30319020
Swap: 4190204 0 4190204
Re: Elasticsearch tuning
Excellent, thank you! Next set of questions. When you configured your devices, were they all configured to point at LSCC2, or another device, or maybe you spread them out across all 3 VMs?
When LSCC2 displays high CPU usage, could you run the top command and get us the output?
And finally, has LSCC2 been rebooted to try and give it a fresh start since the last hang? LSCC2 seems to be running double what the other Log Server VMs are running.
When LSCC2 displays high CPU usage, could you run the top command and get us the output?
And finally, has LSCC2 been rebooted to try and give it a fresh start since the last hang? LSCC2 seems to be running double what the other Log Server VMs are running.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Elasticsearch tuning
Honestly, I have no idea. These were setup years ago and I wouldn't even know how to check.When you configured your devices, were they all configured to point at LSCC2, or another device, or maybe you spread them out across all 3 VMs?
I've actually brought up this "issue" before, you can look at this thread: https://support.nagios.com/forum/viewto ... 38&t=52386When LSCC2 displays high CPU usage, could you run the top command and get us the output?
The screen shot on the first page is basically what the top output looks like all the time on LSCC2.
I rebooted all three servers this morning after an SSH session kept disconnecting me on LSCC2 and I couldn't figure out what was going on. It looks like LSCC2 was hung up trying to complete a snapshot from Saturday night. When I finally got back in it showed the snapshot still in progress.And finally, has LSCC2 been rebooted to try and give it a fresh start since the last hang? LSCC2 seems to be running double what the other Log Server VMs are running.
But, to go back to the issue, LSCC2 is always working harder than the other two servers.