The "elasticsearch" service does not start correctly when restart the server.
Every time I restart the server, I need to manually restart the "elasticsearch" service.
Considerations:
This problem does not occur in CentOS 6
This problem only occurs when I install the NLS infrastructure in CentOS 7.
# cat /etc/centos-release
CentOS Linux release 7.3.1611 (Core)
Nagios Log Server: 1.4.4
Elasticsearch: 1.6.0
Logstash: 1.5.1
Kibana: 3.1.1-nagios3
##### After a reboot the elasticsearch service does not start automatically.
##### Displaying the following error.
[2017-05-05 18:20:30,234][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] version[1.6.0], pid[2671], build[cdd3ac4/2015-06-09T13:36:34Z]
[2017-05-05 18:20:30,234][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] initializing ...
[2017-05-05 18:20:30,254][INFO ][plugins ] [5c998cfb-0460-4e56-8697-83b65c086a13] loaded [knapsack-1.5.2.0-f340ad1], sites []
[2017-05-05 18:20:30,341][INFO ][env ] [5c998cfb-0460-4e56-8697-83b65c086a13] using [1] data paths, mounts [[/usr/local/datalog (/dev/mapper/vg_datalog-lv_datalog)]], net usable_space [15tb], net total_space [15.8tb], types [ext4]
[2017-05-05 18:20:34,574][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] initialized
[2017-05-05 18:20:34,574][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] starting ...
[2017-05-05 18:20:34,736][INFO ][transport ] [5c998cfb-0460-4e56-8697-83b65c086a13] bound_address {inet[/0.0.0.0:9300]}, publish_address {inet[/10.154.3.100:9300]}
[2017-05-05 18:20:34,750][INFO ][discovery ] [5c998cfb-0460-4e56-8697-83b65c086a13] a5726a09-769e-4f2b-be91-d786c8165c6f/OPCqK_rHRAmtCmGtOMb7mA
[2017-05-05 18:20:37,782][WARN ][transport.netty ] [5c998cfb-0460-4e56-8697-83b65c086a13] exception caught on transport layer [[id: 0x447eaaab]], closing connection
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
##### After the problem occurs.
##### If I perform a restarts of elasticsearch with
##### the "systemctl restart elasticsearch" command, the service initializes correctly.
##### Apparently, there is some bug, or problem with the dependencies of the service.
[2017-05-05 18:30:22,255][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] stopping ...
[2017-05-05 18:30:22,606][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] stopped
[2017-05-05 18:30:22,606][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] closing ...
[2017-05-05 18:30:22,622][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] closed
[2017-05-05 18:30:29,938][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] version[1.6.0], pid[6112], build[cdd3ac4/2015-06-09T13:36:34Z]
[2017-05-05 18:30:29,938][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] initializing ...
[2017-05-05 18:30:29,950][INFO ][plugins ] [5c998cfb-0460-4e56-8697-83b65c086a13] loaded [knapsack-1.5.2.0-f340ad1], sites []
[2017-05-05 18:30:29,992][INFO ][env ] [5c998cfb-0460-4e56-8697-83b65c086a13] using [1] data paths, mounts [[/usr/local/datalog (/dev/mapper/vg_datalog-lv_datalog)]], net usable_space [15tb], net total_space [15.8tb], types [ext4]
[2017-05-05 18:30:32,566][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] initialized
[2017-05-05 18:30:32,566][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] starting ...
[2017-05-05 18:30:32,726][INFO ][transport ] [5c998cfb-0460-4e56-8697-83b65c086a13] bound_address {inet[/0.0.0.0:9300]}, publish_address {inet[/10.154.3.100:9300]}
[2017-05-05 18:30:32,737][INFO ][discovery ] [5c998cfb-0460-4e56-8697-83b65c086a13] a5726a09-769e-4f2b-be91-d786c8165c6f/bdD-q2t8Qpac9a1BLyxEGQ
[2017-05-05 18:30:34,430][DEBUG][action.admin.indices.create] [5c998cfb-0460-4e56-8697-83b65c086a13] no known master node, scheduling a retry
[2017-05-05 18:30:34,430][DEBUG][action.admin.indices.create] [5c998cfb-0460-4e56-8697-83b65c086a13] no known master node, scheduling a retry
[2017-05-05 18:30:34,478][DEBUG][action.admin.indices.create] [5c998cfb-0460-4e56-8697-83b65c086a13] no known master node, scheduling a retry
[2017-05-05 18:30:34,479][DEBUG][action.admin.indices.create] [5c998cfb-0460-4e56-8697-83b65c086a13] no known master node, scheduling a retry
[2017-05-05 18:30:35,812][INFO ][cluster.service ] [5c998cfb-0460-4e56-8697-83b65c086a13] detected_master [8471b9e1-1a82-4c3d-98bc-03f2ce871369][2KczjkYuStS0z83RbYH4mw][datalog-ugt-log1.gtservicos][inet[/10.154.3.99:9300]]{max_local_storage_nodes=1}, added {[8471b9e1-1a82-4c3d-98bc-03f2ce871369][2KczjkYuStS0z83RbYH4mw][datalog-ugt-log1.gtservicos][inet[/10.154.3.99:9300]]{max_local_storage_nodes=1},}, reason: zen-disco-receive(from master [[8471b9e1-1a82-4c3d-98bc-03f2ce871369][2KczjkYuStS0z83RbYH4mw][datalog-ugt-log1.gtservicos][inet[/10.154.3.99:9300]]{max_local_storage_nodes=1}])
[2017-05-05 18:30:35,934][INFO ][http ] [5c998cfb-0460-4e56-8697-83b65c086a13] bound_address {inet[/127.0.0.1:9200]}, publish_address {inet[localhost/127.0.0.1:9200]}
[2017-05-05 18:30:35,934][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] started
##### That may be trying to start while the network is not yet available (it's my suspicion).
##### I believe the version used is not natively compatible with systemd, which still uses chkconfig internally.
# Hypotheses
## Bug
## Problems with dependencies
## Bug related to installation in CentOS 7 (current), with all packages updated
Problem: Elasticsearch does not start correctly on boot
-
ssoliveira
- Posts: 91
- Joined: Wed Dec 07, 2016 6:02 pm
Re: Problem: Elasticsearch does not start correctly on boot
Is this a multi-instance environment?
This message implies that, when Elasticsearch first starts, it's unable to find a path to it's cluster members:
What I think may be causing this is a delayed network connection. Basically, Elasticsearch is trying to join the cluster ahead of the machine actually being on the network.
This message implies that, when Elasticsearch first starts, it's unable to find a path to it's cluster members:
Code: Select all
[2017-05-05 18:20:37,782][WARN ][transport.netty ] [5c998cfb-0460-4e56-8697-83b65c086a13] exception caught on transport layer [[id: 0x447eaaab]], closing connection
java.net.NoRouteToHostException: No route to hostFormer Nagios employee
https://www.mcapra.com/
https://www.mcapra.com/
-
ssoliveira
- Posts: 91
- Joined: Wed Dec 07, 2016 6:02 pm
Re: Problem: Elasticsearch does not start correctly on boot
Yes, it is a group of 4 servers, in the same L2.
I rebuild the environment; in the laboratory; with 4 servers in CentOS 7, and I can replicate the problem.
After running reboot, I need to wait for the server to restart and then restart the elasticsearch.
Rebuild the environment; in the laboratory; with CentOS 6, and the problem does not occur.
All servers are 100% upgraded via yum.
The environment does not have customizations, following strictly as documentation of the nagios.
I rebuild the environment; in the laboratory; with 4 servers in CentOS 7, and I can replicate the problem.
After running reboot, I need to wait for the server to restart and then restart the elasticsearch.
Rebuild the environment; in the laboratory; with CentOS 6, and the problem does not occur.
All servers are 100% upgraded via yum.
The environment does not have customizations, following strictly as documentation of the nagios.
- tacolover101
- Posts: 432
- Joined: Mon Apr 10, 2017 11:55 am
Re: Problem: Elasticsearch does not start correctly on boot
is ipv6 enabled? a thought that comes to mind, is it may be trying to connect to peers through this method.
the other problem could be port 9300 blocked somewhere between the nodes.
the other problem could be port 9300 blocked somewhere between the nodes.
Re: Problem: Elasticsearch does not start correctly on boot
I've walked through this in my lab environment with 4 nodes and am still unable to replicate. I do a hard-reset on each CentOS 7 VM, and all members rejoin the cluster without further issues:
Can we get all of the Elasticsearch logs from each instance? They're typically found here:
Code: Select all
[root@nls1 ~]# curl -XGET localhost:9200/_nodes/jvm?pretty
{
"cluster_name" : "a4e1a590-f811-4545-b7a9-c5b160afebc2",
"nodes" : {
"u1wZXAG7TpeZFKjg2-C_7w" : {
"name" : "ac1679a4-b4f0-46ff-9203-b1277072c5c6",
"transport_address" : "inet[/192.168.67.5:9300]",
"host" : "nls2",
"ip" : "192.168.67.5",
"version" : "1.6.0",
"build" : "cdd3ac4",
"http_address" : "inet[localhost/127.0.0.1:9200]",
"attributes" : {
"max_local_storage_nodes" : "1"
},
"jvm" : {
"pid" : 1751,
"version" : "1.7.0_121",
"vm_name" : "OpenJDK 64-Bit Server VM",
"vm_version" : "24.121-b00",
"vm_vendor" : "Oracle Corporation",
"start_time_in_millis" : 1494527541932,
"mem" : {
"heap_init_in_bytes" : 1987051520,
"heap_max_in_bytes" : 1953234944,
"non_heap_init_in_bytes" : 24313856,
"non_heap_max_in_bytes" : 224395264,
"direct_max_in_bytes" : 1953234944
},
"gc_collectors" : [ "ParNew", "ConcurrentMarkSweep" ],
"memory_pools" : [ "Code Cache", "Par Eden Space", "Par Survivor Space", "CMS Old Gen", "CMS Perm Gen" ]
}
},
"MiASaXZWQIiHibXz0NyhrQ" : {
"name" : "4c832c80-f512-4cae-9c65-5159efe1a336",
"transport_address" : "inet[/192.168.67.6:9300]",
"host" : "nls3",
"ip" : "192.168.67.6",
"version" : "1.6.0",
"build" : "cdd3ac4",
"http_address" : "inet[localhost/127.0.0.1:9200]",
"attributes" : {
"max_local_storage_nodes" : "1"
},
"jvm" : {
"pid" : 1999,
"version" : "1.7.0_121",
"vm_name" : "OpenJDK 64-Bit Server VM",
"vm_version" : "24.121-b00",
"vm_vendor" : "Oracle Corporation",
"start_time_in_millis" : 1494527544857,
"mem" : {
"heap_init_in_bytes" : 1987051520,
"heap_max_in_bytes" : 1953234944,
"non_heap_init_in_bytes" : 24313856,
"non_heap_max_in_bytes" : 224395264,
"direct_max_in_bytes" : 1953234944
},
"gc_collectors" : [ "ParNew", "ConcurrentMarkSweep" ],
"memory_pools" : [ "Code Cache", "Par Eden Space", "Par Survivor Space", "CMS Old Gen", "CMS Perm Gen" ]
}
},
"19aL6Jf-SkiEqapDsRVx1A" : {
"name" : "d7bfa7b4-a76a-4871-a120-e3399e7f2492",
"transport_address" : "inet[/192.168.67.4:9300]",
"host" : "nls1",
"ip" : "192.168.67.4",
"version" : "1.6.0",
"build" : "cdd3ac4",
"http_address" : "inet[/192.168.67.4:9200]",
"attributes" : {
"max_local_storage_nodes" : "1"
},
"jvm" : {
"pid" : 1469,
"version" : "1.7.0_121",
"vm_name" : "OpenJDK 64-Bit Server VM",
"vm_version" : "24.121-b00",
"vm_vendor" : "Oracle Corporation",
"start_time_in_millis" : 1494527542150,
"mem" : {
"heap_init_in_bytes" : 1987051520,
"heap_max_in_bytes" : 1953234944,
"non_heap_init_in_bytes" : 24313856,
"non_heap_max_in_bytes" : 224395264,
"direct_max_in_bytes" : 1953234944
},
"gc_collectors" : [ "ParNew", "ConcurrentMarkSweep" ],
"memory_pools" : [ "Code Cache", "Par Eden Space", "Par Survivor Space", "CMS Old Gen", "CMS Perm Gen" ]
}
},
"BFC4Rd_GTOm_tUmiCMgYpQ" : {
"name" : "34127c70-8136-4ef6-896a-2010ea231c2a",
"transport_address" : "inet[/192.168.67.7:9300]",
"host" : "nls4",
"ip" : "192.168.67.7",
"version" : "1.6.0",
"build" : "cdd3ac4",
"http_address" : "inet[localhost/127.0.0.1:9200]",
"attributes" : {
"max_local_storage_nodes" : "1"
},
"jvm" : {
"pid" : 1577,
"version" : "1.7.0_141",
"vm_name" : "OpenJDK 64-Bit Server VM",
"vm_version" : "24.141-b02",
"vm_vendor" : "Oracle Corporation",
"start_time_in_millis" : 1494527542557,
"mem" : {
"heap_init_in_bytes" : 1987051520,
"heap_max_in_bytes" : 1953234944,
"non_heap_init_in_bytes" : 24313856,
"non_heap_max_in_bytes" : 224395264,
"direct_max_in_bytes" : 1953234944
},
"gc_collectors" : [ "ParNew", "ConcurrentMarkSweep" ],
"memory_pools" : [ "Code Cache", "Par Eden Space", "Par Survivor Space", "CMS Old Gen", "CMS Perm Gen" ]
}
}
}
}
Code: Select all
/var/log/elasticsearch/Former Nagios employee
https://www.mcapra.com/
https://www.mcapra.com/
-
ssoliveira
- Posts: 91
- Joined: Wed Dec 07, 2016 6:02 pm
Re: Problem: Elasticsearch does not start correctly on boot
I'm going to restart the cluster again; And send the logs.
Follow the JVM settings
curl -XGET localhost:9200/_nodes/jvm?pretty
Follow the JVM settings
curl -XGET localhost:9200/_nodes/jvm?pretty
Code: Select all
[root@datalog-ugt-log1 ~]# curl -XGET localhost:9200/_nodes/jvm?pretty
{
"cluster_name" : "a5726a09-769e-4f2b-be91-d786c8165c6f",
"nodes" : {
"jAGKtfkcTCG1qGDBedVgsA" : {
"name" : "8d4f2dfb-f10c-4655-a4b7-8b5eaa9f6a3c",
"transport_address" : "inet[/10.154.9.94:9300]",
"host" : "datalog-utb-log2.servicos",
"ip" : "10.154.9.94",
"version" : "1.6.0",
"build" : "cdd3ac4",
"http_address" : "inet[localhost/127.0.0.1:9200]",
"attributes" : {
"max_local_storage_nodes" : "1"
},
"jvm" : {
"pid" : 18890,
"version" : "1.7.0_131",
"vm_name" : "OpenJDK 64-Bit Server VM",
"vm_version" : "24.131-b00",
"vm_vendor" : "Oracle Corporation",
"start_time_in_millis" : 1494865667666,
"mem" : {
"heap_init_in_bytes" : 37913362432,
"heap_max_in_bytes" : 37757386752,
"non_heap_init_in_bytes" : 24313856,
"non_heap_max_in_bytes" : 224395264,
"direct_max_in_bytes" : 37757386752
},
"gc_collectors" : [ "ParNew", "ConcurrentMarkSweep" ],
"memory_pools" : [ "Code Cache", "Par Eden Space", "Par Survivor Space", "CMS Old Gen", "CMS Perm Gen" ]
}
},
"ToOWGPogRZqbohZCnI3f8w" : {
"name" : "765cc658-3e5f-4923-804e-5eb57735f761",
"transport_address" : "inet[/10.154.9.93:9300]",
"host" : "datalog-utb-log1.servicos",
"ip" : "10.154.9.93",
"version" : "1.6.0",
"build" : "cdd3ac4",
"http_address" : "inet[localhost/127.0.0.1:9200]",
"attributes" : {
"max_local_storage_nodes" : "1"
},
"jvm" : {
"pid" : 4504,
"version" : "1.7.0_131",
"vm_name" : "OpenJDK 64-Bit Server VM",
"vm_version" : "24.131-b00",
"vm_vendor" : "Oracle Corporation",
"start_time_in_millis" : 1494865549411,
"mem" : {
"heap_init_in_bytes" : 37913362432,
"heap_max_in_bytes" : 37757386752,
"non_heap_init_in_bytes" : 24313856,
"non_heap_max_in_bytes" : 224395264,
"direct_max_in_bytes" : 37757386752
},
"gc_collectors" : [ "ParNew", "ConcurrentMarkSweep" ],
"memory_pools" : [ "Code Cache", "Par Eden Space", "Par Survivor Space", "CMS Old Gen", "CMS Perm Gen" ]
}
},
"h8ilRIIsTsyUb7BQaZm9uw" : {
"name" : "8471b9e1-1a82-4c3d-98bc-03f2ce871369",
"transport_address" : "inet[/10.154.3.99:9300]",
"host" : "datalog-ugt-log1.gtservicos",
"ip" : "10.154.3.99",
"version" : "1.6.0",
"build" : "cdd3ac4",
"http_address" : "inet[localhost/127.0.0.1:9200]",
"attributes" : {
"max_local_storage_nodes" : "1"
},
"jvm" : {
"pid" : 24531,
"version" : "1.7.0_131",
"vm_name" : "OpenJDK 64-Bit Server VM",
"vm_version" : "24.131-b00",
"vm_vendor" : "Oracle Corporation",
"start_time_in_millis" : 1494865394077,
"mem" : {
"heap_init_in_bytes" : 37913362432,
"heap_max_in_bytes" : 37757386752,
"non_heap_init_in_bytes" : 24313856,
"non_heap_max_in_bytes" : 224395264,
"direct_max_in_bytes" : 37757386752
},
"gc_collectors" : [ "ParNew", "ConcurrentMarkSweep" ],
"memory_pools" : [ "Code Cache", "Par Eden Space", "Par Survivor Space", "CMS Old Gen", "CMS Perm Gen" ]
}
},
"Z9Ei_i-8SFSGSzYY6F68Kw" : {
"name" : "5c998cfb-0460-4e56-8697-83b65c086a13",
"transport_address" : "inet[/10.154.3.100:9300]",
"host" : "datalog-ugt-log2.gtservicos",
"ip" : "10.154.3.100",
"version" : "1.6.0",
"build" : "cdd3ac4",
"http_address" : "inet[localhost/127.0.0.1:9200]",
"attributes" : {
"max_local_storage_nodes" : "1"
},
"jvm" : {
"pid" : 27785,
"version" : "1.7.0_131",
"vm_name" : "OpenJDK 64-Bit Server VM",
"vm_version" : "24.131-b00",
"vm_vendor" : "Oracle Corporation",
"start_time_in_millis" : 1494865490822,
"mem" : {
"heap_init_in_bytes" : 37913362432,
"heap_max_in_bytes" : 37757386752,
"non_heap_init_in_bytes" : 24313856,
"non_heap_max_in_bytes" : 224395264,
"direct_max_in_bytes" : 37757386752
},
"gc_collectors" : [ "ParNew", "ConcurrentMarkSweep" ],
"memory_pools" : [ "Code Cache", "Par Eden Space", "Par Survivor Space", "CMS Old Gen", "CMS Perm Gen" ]
}
}
}
}
Re: Problem: Elasticsearch does not start correctly on boot
Thanks for the update and please keep us posted with the results/logs of the reboot.
The output provided would suggest the service was manually started on the localhost before the command was run, is this a correct assessment?
The output provided would suggest the service was manually started on the localhost before the command was run, is this a correct assessment?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.