Master Not Discovered ERROR
Posted: Fri May 29, 2020 11:40 am
I guess I spoke too soon. My Log Server environment has become unresponsive yet again and I'm unable to successfully run many of the troubleshooting commands because they are timing out. The one that concerns me the most is this:
As stated in my recently closed thread, I had just made a change to the elasticsearch.yml file on each node to give a single node the dedicated master role and it appeared like the change took successfully. Everything has just crashed again.
Here's a subset of entries from the elasticsearch log file:
I would really like someone to contact me and connect to my environment for more direct troubleshooting if you have time ASAP. I'm going on over a week of various issues and my managers are starting to really press me. Thank you.
Code: Select all
root@nagioslscc1:/root>curl -XGET 'http://localhost:9200/_cat/nodes?v'
{"error":"MasterNotDiscoveredException[waited for [30s]]","status":503}Here's a subset of entries from the elasticsearch log file:
Code: Select all
tail -f -n 100 e4f9550c-f37c-417f-9cdc-283429a2a0a1.log
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [SerialNumber]
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:411)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:706)
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:497)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:544)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:493)
at org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:465)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:418)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:148)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase.performOnPrimary(TransportShardReplicationOperationAct ion.java:574)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1.doRun(TransportShardReplicationOperationAction.java: 440)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.elasticsearch.index.mapper.MapperParsingException: failed to parse date field [6000c29f51df3b01199f833635493dff], tried both date format [dateOptiona lTime], and timestamp number with locale []
at org.elasticsearch.index.mapper.core.DateFieldMapper.parseStringValue(DateFieldMapper.java:617)
at org.elasticsearch.index.mapper.core.DateFieldMapper.innerParseCreateField(DateFieldMapper.java:535)
at org.elasticsearch.index.mapper.core.NumberFieldMapper.parseCreateField(NumberFieldMapper.java:239)
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:401)
... 13 more
Caused by: java.lang.IllegalArgumentException: Invalid format: "6000c29f51df3b01199f833635493dff" is malformed at "c29f51df3b01199f833635493dff"
at org.elasticsearch.common.joda.time.format.DateTimeParserBucket.doParseMillis(DateTimeParserBucket.java:187)
at org.elasticsearch.common.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:780)
at org.elasticsearch.index.mapper.core.DateFieldMapper.parseStringValue(DateFieldMapper.java:612)
... 16 more
[2020-05-29 09:02:59,492][DEBUG][action.bulk ] [4deb767e-abbd-43f0-8839-049760687e98] [logstash-2020.05.29][3] failed to execute bulk item (index) inde x {[logstash-2020.05.29][eventlog][AXJhK8u96Ti2puEbs5jR], source[{"EventTime":"2020-05-29 08:48:04","Hostname":"DHHSISD033.dhhs-ad.state.nv.us","Keywords":576460753 377165312,"EventType":"ERROR","SeverityValue":4,"Severity":"ERROR","EventID":504,"SourceName":"Microsoft-Windows-StorDiag","ProviderGuid":"{F5D05B38-80A6-4653-825D- C414E4AB3C68}","Version":1,"Task":200,"OpcodeValue":101,"RecordNumber":277997,"ProcessID":5056,"ThreadID":26472,"Channel":"Microsoft-Windows-Storage-ClassPnP/Operat ional","Domain":"NT AUTHORITY","AccountName":"SYSTEM","UserID":"S-1-5-18","AccountType":"User","Category":"Class","Opcode":"Completion of request.","DeviceGUID":"{2 372cd43-5683-5ef0-21fc-b8e58c35b9ac}","DeviceNumber":"0","Vendor":"VMware ","Model":"Virtual disk ","FirmwareVersion":"2.0 ","SerialNumber":"6000c29f51df3b01199 f833635493dff","IrpStatus":"0xc0000185","IoctlControlCode":"0x74080","EventReceivedTime":"2020-05-29 08:48:06","SourceModuleName":"eventlog","SourceModuleType":"im_ msvistalog","message":"Completing a failed IOCTL request.","@version":"1","@timestamp":"2020-05-29T16:02:16.867Z","host":"10.150.20.33","port":54982,"type":"eventlo g"}]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [SerialNumber]
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:411)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:706)
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:497)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:544)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:493)
at org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:465)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:418)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:148)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase.performOnPrimary(TransportShardReplicationOperationAct ion.java:574)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1.doRun(TransportShardReplicationOperationAction.java: 440)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.elasticsearch.index.mapper.MapperParsingException: failed to parse date field [6000c29f51df3b01199f833635493dff], tried both date format [dateOptiona lTime], and timestamp number with locale []
at org.elasticsearch.index.mapper.core.DateFieldMapper.parseStringValue(DateFieldMapper.java:617)
at org.elasticsearch.index.mapper.core.DateFieldMapper.innerParseCreateField(DateFieldMapper.java:535)
at org.elasticsearch.index.mapper.core.NumberFieldMapper.parseCreateField(NumberFieldMapper.java:239)
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:401)
... 13 more
Caused by: java.lang.IllegalArgumentException: Invalid format: "6000c29f51df3b01199f833635493dff" is malformed at "c29f51df3b01199f833635493dff"
at org.elasticsearch.common.joda.time.format.DateTimeParserBucket.doParseMillis(DateTimeParserBucket.java:187)
at org.elasticsearch.common.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:780)
at org.elasticsearch.index.mapper.core.DateFieldMapper.parseStringValue(DateFieldMapper.java:612)
... 16 more
[2020-05-29 09:02:59,493][DEBUG][action.bulk ] [4deb767e-abbd-43f0-8839-049760687e98] [logstash-2020.05.29][4] failed to execute bulk item (index) inde x {[logstash-2020.05.29][eventlog][AXJhK8ut6Ti2puEbs5hz], source[{"EventTime":"2020-05-29 08:48:04","Hostname":"DHHSISD033.dhhs-ad.state.nv.us","Keywords":576460753 377165312,"EventType":"ERROR","SeverityValue":4,"Severity":"ERROR","EventID":504,"SourceName":"Microsoft-Windows-StorDiag","ProviderGuid":"{F5D05B38-80A6-4653-825D- C414E4AB3C68}","Version":1,"Task":200,"OpcodeValue":101,"RecordNumber":277994,"ProcessID":5056,"ThreadID":26472,"Channel":"Microsoft-Windows-Storage-ClassPnP/Operat ional","Domain":"NT AUTHORITY","AccountName":"SYSTEM","UserID":"S-1-5-18","AccountType":"User","Category":"Class","Opcode":"Completion of request.","DeviceGUID":"{2 372cd43-5683-5ef0-21fc-b8e58c35b9ac}","DeviceNumber":"0","Vendor":"VMware ","Model":"Virtual disk ","FirmwareVersion":"2.0 ","SerialNumber":"6000c29f51df3b01199 f833635493dff","IrpStatus":"0xc0000185","IoctlControlCode":"0x740d4","EventReceivedTime":"2020-05-29 08:48:06","SourceModuleName":"eventlog","SourceModuleType":"im_ msvistalog","message":"Completing a failed IOCTL request.","@version":"1","@timestamp":"2020-05-29T16:02:16.615Z","host":"10.150.20.33","port":54982,"type":"eventlo g"}]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [SerialNumber]
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:411)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:706)
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:497)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:544)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:493)
at org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:465)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:418)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:148)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase.performOnPrimary(TransportShardReplicationOperationAct ion.java:574)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1.doRun(TransportShardReplicationOperationAction.java: 440)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.elasticsearch.index.mapper.MapperParsingException: failed to parse date field [6000c29f51df3b01199f833635493dff], tried both date format [dateOptiona lTime], and timestamp number with locale []
at org.elasticsearch.index.mapper.core.DateFieldMapper.parseStringValue(DateFieldMapper.java:617)
at org.elasticsearch.index.mapper.core.DateFieldMapper.innerParseCreateField(DateFieldMapper.java:535)
at org.elasticsearch.index.mapper.core.NumberFieldMapper.parseCreateField(NumberFieldMapper.java:239)
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:401)
... 13 more
Caused by: java.lang.IllegalArgumentException: Invalid format: "6000c29f51df3b01199f833635493dff" is malformed at "c29f51df3b01199f833635493dff"
at org.elasticsearch.common.joda.time.format.DateTimeParserBucket.doParseMillis(DateTimeParserBucket.java:187)
at org.elasticsearch.common.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:780)
at org.elasticsearch.index.mapper.core.DateFieldMapper.parseStringValue(DateFieldMapper.java:612)
... 16 more
[2020-05-29 09:02:59,492][DEBUG][action.admin.cluster.node.stats] [4deb767e-abbd-43f0-8839-049760687e98] failed to execute on node [FoBNxUegTtaU2cFzAh7Qdw]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [29dbb5cc-f936-4f0e-8a41-26b2277c7083][inet[/10.128.207.114:9300]][cluster:monitor/nodes/stats[n]] req uest_id [434447] timed out after [19840ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:529)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[2020-05-29 09:02:59,504][WARN ][cluster.service ] [4deb767e-abbd-43f0-8839-049760687e98] cluster state update task [shard-started ([logstash-2020.05.26][1 ], node[B6gwR3GbRdWm9Oz3PL6IXw], [R], s[INITIALIZING], unassigned_info[[reason=NODE_LEFT], at[2020-05-29T15:12:10.590Z], details[node_left[B6gwR3GbRdWm9Oz3PL6IXw]]] ), reason [after recovery (replica) from node [[29dbb5cc-f936-4f0e-8a41-26b2277c7083][FoBNxUegTtaU2cFzAh7Qdw][nagioslscc3][inet[/10.128.207.114:9300]]{max_local_sto rage_nodes=1, master=false}]]] took 1m above the warn threshold of 30s
[2020-05-29 09:03:22,616][WARN ][monitor.jvm ] [4deb767e-abbd-43f0-8839-049760687e98] [gc][old][4010][125] duration [22.9s], collections [1]/[23.1s], t otal [22.9s]/[3.8m], memory [29.1gb]->[29.2gb]/[29.3gb], all_pools {[young] [578.3mb]->[617mb]/[665.6mb]}{[survivor] [0b]->[0b]/[83.1mb]}{[old] [28.6gb]->[28.6gb]/[ 28.6gb]}
[2020-05-29 09:03:42,390][WARN ][monitor.jvm ] [4deb767e-abbd-43f0-8839-049760687e98] [gc][old][4011][126] duration [19.6s], collections [1]/[19.7s], t otal [19.6s]/[4.1m], memory [29.2gb]->[29.2gb]/[29.3gb], all_pools {[young] [617mb]->[656.9mb]/[665.6mb]}{[survivor] [0b]->[0b]/[83.1mb]}{[old] [28.6gb]->[28.6gb]/[ 28.6gb]}
[2020-05-29 09:04:25,293][WARN ][monitor.jvm ] [4deb767e-abbd-43f0-8839-049760687e98] [gc][old][4012][128] duration [42.7s], collections [2]/[42.9s], t otal [42.7s]/[4.8m], memory [29.2gb]->[29.2gb]/[29.3gb], all_pools {[young] [656.9mb]->[665.4mb]/[665.6mb]}{[survivor] [0b]->[39.6mb]/[83.1mb]}{[old] [28.6gb]->[28. 6gb]/[28.6gb]}
[2020-05-29 09:04:48,394][WARN ][monitor.jvm ] [4deb767e-abbd-43f0-8839-049760687e98] [gc][old][4013][129] duration [23s], collections [1]/[23.1s], tot al [23s]/[5.2m], memory [29.2gb]->[29.3gb]/[29.3gb], all_pools {[young] [665.4mb]->[664.5mb]/[665.6mb]}{[survivor] [39.6mb]->[56mb]/[83.1mb]}{[old] [28.6gb]->[28.6g b]/[28.6gb]}
[2020-05-29 09:05:08,098][WARN ][monitor.jvm ] [4deb767e-abbd-43f0-8839-049760687e98] [gc][old][4014][130] duration [19.6s], collections [1]/[19.7s], t otal [19.6s]/[5.5m], memory [29.3gb]->[29.3gb]/[29.3gb], all_pools {[young] [664.5mb]->[664.7mb]/[665.6mb]}{[survivor] [56mb]->[49.7mb]/[83.1mb]}{[old] [28.6gb]->[2 8.6gb]/[28.6gb]}
[2020-05-29 09:05:31,165][WARN ][monitor.jvm ] [4deb767e-abbd-43f0-8839-049760687e98] [gc][old][4015][131] duration [22.9s], collections [1]/[23s], tot al [22.9s]/[5.9m], memory [29.3gb]->[29.3gb]/[29.3gb], all_pools {[young] [664.7mb]->[665.6mb]/[665.6mb]}{[survivor] [49.7mb]->[73.9mb]/[83.1mb]}{[old] [28.6gb]->[2 8.6gb]/[28.6gb]}
[2020-05-29 09:09:24,513][WARN ][transport ] [4deb767e-abbd-43f0-8839-049760687e98] Received response for a request that has timed out, sent [1059432 ms] ago, timed out [1044432ms] ago, action [cluster:monitor/nodes/stats[n]], node [[29dbb5cc-f936-4f0e-8a41-26b2277c7083][FoBNxUegTtaU2cFzAh7Qdw][nagioslscc3][inet[ /10.128.207.114:9300]]{max_local_storage_nodes=1, master=false}], id [428131]
[2020-05-29 09:12:56,757][WARN ][discovery.zen.publish ] [4deb767e-abbd-43f0-8839-049760687e98] timed out waiting for all nodes to process published state [318] (timeout [30s], pending nodes: [[fc00210a-5231-46f3-84f2-6e3c61c7ac0e][B6gwR3GbRdWm9Oz3PL6IXw][nagioslscc1][inet[/10.128.207.112:9300]]{max_local_storage_nodes=1, m aster=false}, [29dbb5cc-f936-4f0e-8a41-26b2277c7083][FoBNxUegTtaU2cFzAh7Qdw][nagioslscc3][inet[/10.128.207.114:9300]]{max_local_storage_nodes=1, master=false}])
[2020-05-29 09:13:19,197][WARN ][cluster.service ] [4deb767e-abbd-43f0-8839-049760687e98] cluster state update task [zen-disco-receive(join from node[[fc00 210a-5231-46f3-84f2-6e3c61c7ac0e][B6gwR3GbRdWm9Oz3PL6IXw][nagioslscc1][inet[/10.128.207.112:9300]]{max_local_storage_nodes=1, master=false}])] took 1m above the war n threshold of 30s
[2020-05-29 09:15:01,213][WARN ][discovery.zen.publish ] [4deb767e-abbd-43f0-8839-049760687e98] timed out waiting for all nodes to process published state [319] (timeout [30s], pending nodes: [[29dbb5cc-f936-4f0e-8a41-26b2277c7083][FoBNxUegTtaU2cFzAh7Qdw][nagioslscc3][inet[/10.128.207.114:9300]]{max_local_storage_nodes=1, m aster=false}])
[2020-05-29 09:15:50,427][WARN ][cluster.service ] [4deb767e-abbd-43f0-8839-049760687e98] cluster state update task [zen-disco-receive(join from node[[fc00 210a-5231-46f3-84f2-6e3c61c7ac0e][B6gwR3GbRdWm9Oz3PL6IXw][nagioslscc1][inet[/10.128.207.112:9300]]{max_local_storage_nodes=1, master=false}])] took 2.5m above the w arn threshold of 30s
[2020-05-29 09:17:29,174][WARN ][discovery.zen.publish ] [4deb767e-abbd-43f0-8839-049760687e98] timed out waiting for all nodes to process published state [320] (timeout [30s], pending nodes: [[29dbb5cc-f936-4f0e-8a41-26b2277c7083][FoBNxUegTtaU2cFzAh7Qdw][nagioslscc3][inet[/10.128.207.114:9300]]{max_local_storage_nodes=1, m aster=false}])
[2020-05-29 09:20:13,940][WARN ][cluster.service ] [4deb767e-abbd-43f0-8839-049760687e98] cluster state update task [zen-disco-receive(join from node[[fc00 210a-5231-46f3-84f2-6e3c61c7ac0e][B6gwR3GbRdWm9Oz3PL6IXw][nagioslscc1][inet[/10.128.207.112:9300]]{max_local_storage_nodes=1, master=false}])] took 4m above the war n threshold of 30s
[2020-05-29 09:30:29,876][WARN ][cluster.action.shard ] [4deb767e-abbd-43f0-8839-049760687e98] [logstash-2020.05.29][4] received shard failed for [logstash-2020.05.29][4], node[FoBNxUegTtaU2cFzAh7Qdw], [R], s[INITIALIZING], unassigned_info[[reason=NODE_LEFT], at[2020-05-29T15:12:10.590Z], details[node_left[B6gwR3GbRdWm9Oz3PL6IXw]]], indexUUID [zezml5pKSf-uEs4Xo40OMw], reason [shard failure [failed recovery][RecoveryFailedException[[logstash-2020.05.29][4]: Recovery failed from [4deb767e-abbd-43f0-8839-049760687e98][lflTyPKHT-qkM8ZEycK2Ag][nagioslscc2][inet[/10.128.207.111:9300]]{max_local_storage_nodes=1} into [29dbb5cc-f936-4f0e-8a41-26b2277c7083][FoBNxUegTtaU2cFzAh7Qdw][nagioslscc3][inet[/10.128.207.114:9300]]{max_local_storage_nodes=1, master=false} (no activity after [30m])]; nested: ElasticsearchTimeoutException[no activity after [30m]]; ]]