Problem: Elasticsearch does not start correctly on boot
Posted: Fri May 05, 2017 3:17 pm
The "elasticsearch" service does not start correctly when restart the server.
Every time I restart the server, I need to manually restart the "elasticsearch" service.
Considerations:
This problem does not occur in CentOS 6
This problem only occurs when I install the NLS infrastructure in CentOS 7.
# cat /etc/centos-release
CentOS Linux release 7.3.1611 (Core)
Nagios Log Server: 1.4.4
Elasticsearch: 1.6.0
Logstash: 1.5.1
Kibana: 3.1.1-nagios3
##### After a reboot the elasticsearch service does not start automatically.
##### Displaying the following error.
[2017-05-05 18:20:30,234][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] version[1.6.0], pid[2671], build[cdd3ac4/2015-06-09T13:36:34Z]
[2017-05-05 18:20:30,234][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] initializing ...
[2017-05-05 18:20:30,254][INFO ][plugins ] [5c998cfb-0460-4e56-8697-83b65c086a13] loaded [knapsack-1.5.2.0-f340ad1], sites []
[2017-05-05 18:20:30,341][INFO ][env ] [5c998cfb-0460-4e56-8697-83b65c086a13] using [1] data paths, mounts [[/usr/local/datalog (/dev/mapper/vg_datalog-lv_datalog)]], net usable_space [15tb], net total_space [15.8tb], types [ext4]
[2017-05-05 18:20:34,574][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] initialized
[2017-05-05 18:20:34,574][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] starting ...
[2017-05-05 18:20:34,736][INFO ][transport ] [5c998cfb-0460-4e56-8697-83b65c086a13] bound_address {inet[/0.0.0.0:9300]}, publish_address {inet[/10.154.3.100:9300]}
[2017-05-05 18:20:34,750][INFO ][discovery ] [5c998cfb-0460-4e56-8697-83b65c086a13] a5726a09-769e-4f2b-be91-d786c8165c6f/OPCqK_rHRAmtCmGtOMb7mA
[2017-05-05 18:20:37,782][WARN ][transport.netty ] [5c998cfb-0460-4e56-8697-83b65c086a13] exception caught on transport layer [[id: 0x447eaaab]], closing connection
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
##### After the problem occurs.
##### If I perform a restarts of elasticsearch with
##### the "systemctl restart elasticsearch" command, the service initializes correctly.
##### Apparently, there is some bug, or problem with the dependencies of the service.
[2017-05-05 18:30:22,255][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] stopping ...
[2017-05-05 18:30:22,606][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] stopped
[2017-05-05 18:30:22,606][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] closing ...
[2017-05-05 18:30:22,622][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] closed
[2017-05-05 18:30:29,938][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] version[1.6.0], pid[6112], build[cdd3ac4/2015-06-09T13:36:34Z]
[2017-05-05 18:30:29,938][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] initializing ...
[2017-05-05 18:30:29,950][INFO ][plugins ] [5c998cfb-0460-4e56-8697-83b65c086a13] loaded [knapsack-1.5.2.0-f340ad1], sites []
[2017-05-05 18:30:29,992][INFO ][env ] [5c998cfb-0460-4e56-8697-83b65c086a13] using [1] data paths, mounts [[/usr/local/datalog (/dev/mapper/vg_datalog-lv_datalog)]], net usable_space [15tb], net total_space [15.8tb], types [ext4]
[2017-05-05 18:30:32,566][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] initialized
[2017-05-05 18:30:32,566][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] starting ...
[2017-05-05 18:30:32,726][INFO ][transport ] [5c998cfb-0460-4e56-8697-83b65c086a13] bound_address {inet[/0.0.0.0:9300]}, publish_address {inet[/10.154.3.100:9300]}
[2017-05-05 18:30:32,737][INFO ][discovery ] [5c998cfb-0460-4e56-8697-83b65c086a13] a5726a09-769e-4f2b-be91-d786c8165c6f/bdD-q2t8Qpac9a1BLyxEGQ
[2017-05-05 18:30:34,430][DEBUG][action.admin.indices.create] [5c998cfb-0460-4e56-8697-83b65c086a13] no known master node, scheduling a retry
[2017-05-05 18:30:34,430][DEBUG][action.admin.indices.create] [5c998cfb-0460-4e56-8697-83b65c086a13] no known master node, scheduling a retry
[2017-05-05 18:30:34,478][DEBUG][action.admin.indices.create] [5c998cfb-0460-4e56-8697-83b65c086a13] no known master node, scheduling a retry
[2017-05-05 18:30:34,479][DEBUG][action.admin.indices.create] [5c998cfb-0460-4e56-8697-83b65c086a13] no known master node, scheduling a retry
[2017-05-05 18:30:35,812][INFO ][cluster.service ] [5c998cfb-0460-4e56-8697-83b65c086a13] detected_master [8471b9e1-1a82-4c3d-98bc-03f2ce871369][2KczjkYuStS0z83RbYH4mw][datalog-ugt-log1.gtservicos][inet[/10.154.3.99:9300]]{max_local_storage_nodes=1}, added {[8471b9e1-1a82-4c3d-98bc-03f2ce871369][2KczjkYuStS0z83RbYH4mw][datalog-ugt-log1.gtservicos][inet[/10.154.3.99:9300]]{max_local_storage_nodes=1},}, reason: zen-disco-receive(from master [[8471b9e1-1a82-4c3d-98bc-03f2ce871369][2KczjkYuStS0z83RbYH4mw][datalog-ugt-log1.gtservicos][inet[/10.154.3.99:9300]]{max_local_storage_nodes=1}])
[2017-05-05 18:30:35,934][INFO ][http ] [5c998cfb-0460-4e56-8697-83b65c086a13] bound_address {inet[/127.0.0.1:9200]}, publish_address {inet[localhost/127.0.0.1:9200]}
[2017-05-05 18:30:35,934][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] started
##### That may be trying to start while the network is not yet available (it's my suspicion).
##### I believe the version used is not natively compatible with systemd, which still uses chkconfig internally.
# Hypotheses
## Bug
## Problems with dependencies
## Bug related to installation in CentOS 7 (current), with all packages updated
Every time I restart the server, I need to manually restart the "elasticsearch" service.
Considerations:
This problem does not occur in CentOS 6
This problem only occurs when I install the NLS infrastructure in CentOS 7.
# cat /etc/centos-release
CentOS Linux release 7.3.1611 (Core)
Nagios Log Server: 1.4.4
Elasticsearch: 1.6.0
Logstash: 1.5.1
Kibana: 3.1.1-nagios3
##### After a reboot the elasticsearch service does not start automatically.
##### Displaying the following error.
[2017-05-05 18:20:30,234][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] version[1.6.0], pid[2671], build[cdd3ac4/2015-06-09T13:36:34Z]
[2017-05-05 18:20:30,234][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] initializing ...
[2017-05-05 18:20:30,254][INFO ][plugins ] [5c998cfb-0460-4e56-8697-83b65c086a13] loaded [knapsack-1.5.2.0-f340ad1], sites []
[2017-05-05 18:20:30,341][INFO ][env ] [5c998cfb-0460-4e56-8697-83b65c086a13] using [1] data paths, mounts [[/usr/local/datalog (/dev/mapper/vg_datalog-lv_datalog)]], net usable_space [15tb], net total_space [15.8tb], types [ext4]
[2017-05-05 18:20:34,574][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] initialized
[2017-05-05 18:20:34,574][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] starting ...
[2017-05-05 18:20:34,736][INFO ][transport ] [5c998cfb-0460-4e56-8697-83b65c086a13] bound_address {inet[/0.0.0.0:9300]}, publish_address {inet[/10.154.3.100:9300]}
[2017-05-05 18:20:34,750][INFO ][discovery ] [5c998cfb-0460-4e56-8697-83b65c086a13] a5726a09-769e-4f2b-be91-d786c8165c6f/OPCqK_rHRAmtCmGtOMb7mA
[2017-05-05 18:20:37,782][WARN ][transport.netty ] [5c998cfb-0460-4e56-8697-83b65c086a13] exception caught on transport layer [[id: 0x447eaaab]], closing connection
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
##### After the problem occurs.
##### If I perform a restarts of elasticsearch with
##### the "systemctl restart elasticsearch" command, the service initializes correctly.
##### Apparently, there is some bug, or problem with the dependencies of the service.
[2017-05-05 18:30:22,255][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] stopping ...
[2017-05-05 18:30:22,606][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] stopped
[2017-05-05 18:30:22,606][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] closing ...
[2017-05-05 18:30:22,622][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] closed
[2017-05-05 18:30:29,938][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] version[1.6.0], pid[6112], build[cdd3ac4/2015-06-09T13:36:34Z]
[2017-05-05 18:30:29,938][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] initializing ...
[2017-05-05 18:30:29,950][INFO ][plugins ] [5c998cfb-0460-4e56-8697-83b65c086a13] loaded [knapsack-1.5.2.0-f340ad1], sites []
[2017-05-05 18:30:29,992][INFO ][env ] [5c998cfb-0460-4e56-8697-83b65c086a13] using [1] data paths, mounts [[/usr/local/datalog (/dev/mapper/vg_datalog-lv_datalog)]], net usable_space [15tb], net total_space [15.8tb], types [ext4]
[2017-05-05 18:30:32,566][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] initialized
[2017-05-05 18:30:32,566][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] starting ...
[2017-05-05 18:30:32,726][INFO ][transport ] [5c998cfb-0460-4e56-8697-83b65c086a13] bound_address {inet[/0.0.0.0:9300]}, publish_address {inet[/10.154.3.100:9300]}
[2017-05-05 18:30:32,737][INFO ][discovery ] [5c998cfb-0460-4e56-8697-83b65c086a13] a5726a09-769e-4f2b-be91-d786c8165c6f/bdD-q2t8Qpac9a1BLyxEGQ
[2017-05-05 18:30:34,430][DEBUG][action.admin.indices.create] [5c998cfb-0460-4e56-8697-83b65c086a13] no known master node, scheduling a retry
[2017-05-05 18:30:34,430][DEBUG][action.admin.indices.create] [5c998cfb-0460-4e56-8697-83b65c086a13] no known master node, scheduling a retry
[2017-05-05 18:30:34,478][DEBUG][action.admin.indices.create] [5c998cfb-0460-4e56-8697-83b65c086a13] no known master node, scheduling a retry
[2017-05-05 18:30:34,479][DEBUG][action.admin.indices.create] [5c998cfb-0460-4e56-8697-83b65c086a13] no known master node, scheduling a retry
[2017-05-05 18:30:35,812][INFO ][cluster.service ] [5c998cfb-0460-4e56-8697-83b65c086a13] detected_master [8471b9e1-1a82-4c3d-98bc-03f2ce871369][2KczjkYuStS0z83RbYH4mw][datalog-ugt-log1.gtservicos][inet[/10.154.3.99:9300]]{max_local_storage_nodes=1}, added {[8471b9e1-1a82-4c3d-98bc-03f2ce871369][2KczjkYuStS0z83RbYH4mw][datalog-ugt-log1.gtservicos][inet[/10.154.3.99:9300]]{max_local_storage_nodes=1},}, reason: zen-disco-receive(from master [[8471b9e1-1a82-4c3d-98bc-03f2ce871369][2KczjkYuStS0z83RbYH4mw][datalog-ugt-log1.gtservicos][inet[/10.154.3.99:9300]]{max_local_storage_nodes=1}])
[2017-05-05 18:30:35,934][INFO ][http ] [5c998cfb-0460-4e56-8697-83b65c086a13] bound_address {inet[/127.0.0.1:9200]}, publish_address {inet[localhost/127.0.0.1:9200]}
[2017-05-05 18:30:35,934][INFO ][node ] [5c998cfb-0460-4e56-8697-83b65c086a13] started
##### That may be trying to start while the network is not yet available (it's my suspicion).
##### I believe the version used is not natively compatible with systemd, which still uses chkconfig internally.
# Hypotheses
## Bug
## Problems with dependencies
## Bug related to installation in CentOS 7 (current), with all packages updated