Page 1 of 2

Could not authenticate. Invalid token given.

Posted: Fri May 08, 2015 11:56 am
by mike4vr
We have two NLS nodes. I'm having an issue with elasticsearch on one of them, I believe. We noticed that the System Status page showed both nodes online. However, in the Instance Status page, it showed node00 with elasticsearch and logstash offline. I proceeded to restart the services in the System Status page. Clicked stop on logstash, stopped. Clicked stopped on elasticsearch, and got an error.

Image

The only way to recover from this is to bounce the server. Even then, I'm back to the same problem with the services showing down on this node in the Instance Status page.

Here are some errors in the elasticsearch log:

Code: Select all

[2015-05-08 09:18:00,482][WARN ][cluster.action.shard     ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] [logstash-2015.04.25][2] sending failed shard for [logstash-2015.04.25][2], node[B9U8CoVRRiCRRD7xNfty-g], [R], s[INITIALIZING], indexUUID [q4htz5ElSSyN937VFB030Q], reason [Failed to start shard, message [CorruptIndexException[[logstash-2015.04.25][2] Corrupted index [corrupted_imC91jWVR76Sr4UMRiGl7g] caused by: CorruptIndexException[codec footer mismatch: actual footer=541675078 vs expected footer=-1071082520 (resource: MMapIndexInput(path="/usr/local/nagioslogserver/elasticsearch/data/ae793673-21b1-46f1-ac7f-04fdd6440a60/nodes/0/indices/logstash-2015.04.25/2/index/_c2b_es090_0.tim"))]]]]
[2015-05-08 09:18:00,482][WARN ][cluster.action.shard     ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] [logstash-2015.04.25][2] received shard failed for [logstash-2015.04.25][2], node[B9U8CoVRRiCRRD7xNfty-g], [R], s[INITIALIZING], indexUUID [q4htz5ElSSyN937VFB030Q], reason [Failed to start shard, message [CorruptIndexException[[logstash-2015.04.25][2] Corrupted index [corrupted_imC91jWVR76Sr4UMRiGl7g] caused by: CorruptIndexException[codec footer mismatch: actual footer=541675078 vs expected footer=-1071082520 (resource: MMapIndexInput(path="/usr/local/nagioslogserver/elasticsearch/data/ae793673-21b1-46f1-ac7f-04fdd6440a60/nodes/0/indices/logstash-2015.04.25/2/index/_c2b_es090_0.tim"))]]]]
[2015-05-08 09:18:00,482][WARN ][cluster.action.shard     ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] [logstash-2015.04.25][2] sending failed shard for [logstash-2015.04.25][2], node[B9U8CoVRRiCRRD7xNfty-g], [R], s[INITIALIZING], indexUUID [q4htz5ElSSyN937VFB030Q], reason [engine failure, message [corrupted preexisting index][CorruptIndexException[[logstash-2015.04.25][2] Corrupted index [corrupted_imC91jWVR76Sr4UMRiGl7g] caused by: CorruptIndexException[codec footer mismatch: actual footer=541675078 vs expected footer=-1071082520 (resource: MMapIndexInput(path="/usr/local/nagioslogserver/elasticsearch/data/ae793673-21b1-46f1-ac7f-04fdd6440a60/nodes/0/indices/logstash-2015.04.25/2/index/_c2b_es090_0.tim"))]]]]
[2015-05-08 09:18:00,483][WARN ][cluster.action.shard     ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] [logstash-2015.04.25][2] received shard failed for [logstash-2015.04.25][2], node[B9U8CoVRRiCRRD7xNfty-g], [R], s[INITIALIZING], indexUUID [q4htz5ElSSyN937VFB030Q], reason [engine failure, message [corrupted preexisting index][CorruptIndexException[[logstash-2015.04.25][2] Corrupted index [corrupted_imC91jWVR76Sr4UMRiGl7g] caused by: CorruptIndexException[codec footer mismatch: actual footer=541675078 vs expected footer=-1071082520 (resource: MMapIndexInput(path="/usr/local/nagioslogserver/elasticsearch/data/ae793673-21b1-46f1-ac7f-04fdd6440a60/nodes/0/indices/logstash-2015.04.25/2/index/_c2b_es090_0.tim"))]]]]
[2015-05-08 09:32:20,432][INFO ][monitor.jvm              ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] [gc][young][1033][223] duration [982ms], collections [1]/[2s], total [982ms]/[6s], memory [439.2mb]->[319.2mb]/[921.3mb], all_pools {[young] [118.6mb]->[834.3kb]/[133.1mb]}{[survivor] [5.6mb]->[3.3mb]/[16.6mb]}{[old] [314.9mb]->[315mb]/[771.6mb]}
[2015-05-08 09:41:56,534][INFO ][cluster.metadata         ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] [logstash-2015.05.08] update_mapping [apache_error] (dynamic)
[2015-05-08 09:44:46,912][INFO ][node                     ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] stopping ...
[2015-05-08 09:44:50,163][INFO ][node                     ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] stopped
[2015-05-08 09:44:50,164][INFO ][node                     ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] closing ...
[2015-05-08 09:44:50,324][INFO ][node                     ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] closed
And logstash:

Code: Select all

log4j, [2015-05-08T09:37:15.109]  WARN: org.elasticsearch.transport: [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] Transport response handler not found of id [158]
{:timestamp=>"2015-05-08T09:37:20.144000-0700", :message=>"Failed parsing date from field", :field=>"timestamp", :value=>"May  8 08:33:11", :exception=>java.lang.IllegalArgumentException: Invalid format: "May  8 08:33:11", :level=>:warn}
{:timestamp=>"2015-05-08T09:37:20.143000-0700", :message=>"Failed parsing date from field", :field=>"timestamp", :value=>"May  8 09:37:05", :exception=>java.lang.IllegalArgumentException: Invalid format: "May  8 09:37:05", :level=>:warn}
{:timestamp=>"2015-05-08T09:37:20.842000-0700", :message=>"Failed parsing date from field", :field=>"timestamp", :value=>"May  8 09:35:56", :exception=>java.lang.IllegalArgumentException: Invalid format: "May  8 09:35:56", :level=>:warn}
Any help is appreciated. I'm ready to provide more info, if needed.

Re: Could not authenticate. Invalid token given.

Posted: Fri May 08, 2015 12:09 pm
by jolson
Looks like a corrupt index to me. Would it be alright if we attempted to delete the index on the problem server? You will need to start elasticsearch on the bad node and run the following command:

Code: Select all

curl -XDELETE 'http://localhost:9200/logstash-2015.04.25/'
Keep in mind that using the above command will remove all log information for that day. You will be able to restore the information from a backup if you have it set up.

After deletion, try restarting elasticsearch - let's see if that helps us resolve this problem.

Re: Could not authenticate. Invalid token given.

Posted: Fri May 08, 2015 12:20 pm
by mike4vr
I had the same result after clicking "Stop" on elasticsearch. Here's the latest elasticsearch log:

Code: Select all

[2015-05-08 10:15:39,024][INFO ][node                     ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] version[1.3.2], pid[4816], build[dee175d/2014-08-13T14:29:30Z]
[2015-05-08 10:15:39,024][INFO ][node                     ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] initializing ...
[2015-05-08 10:15:39,156][INFO ][plugins                  ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] loaded [knapsack-1.3.2.0-d5501ef], sites []
[2015-05-08 10:15:59,736][INFO ][node                     ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] initialized
[2015-05-08 10:15:59,736][INFO ][node                     ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] starting ...
[2015-05-08 10:15:59,961][INFO ][transport                ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.115.67:9300]}
[2015-05-08 10:16:00,001][INFO ][discovery                ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] ae793673-21b1-46f1-ac7f-04fdd6440a60/8pZ-dsZpRHGJ1iy89KX37g
[2015-05-08 10:16:00,906][INFO ][node                     ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] stopping ...
[2015-05-08 10:16:00,929][INFO ][node                     ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] stopped
[2015-05-08 10:16:00,930][INFO ][node                     ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] closing ...
[2015-05-08 10:16:00,936][INFO ][node                     ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] closed
[2015-05-08 10:16:29,067][INFO ][node                     ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] version[1.3.2], pid[1553], build[dee175d/2014-08-13T14:29:30Z]
[2015-05-08 10:16:29,086][INFO ][node                     ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] initializing ...
[2015-05-08 10:16:29,118][INFO ][plugins                  ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] loaded [knapsack-1.3.2.0-d5501ef], sites []
[2015-05-08 10:16:35,891][INFO ][node                     ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] initialized
[2015-05-08 10:16:35,892][INFO ][node                     ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] starting ...
[2015-05-08 10:16:36,011][INFO ][transport                ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.115.67:9300]}
[2015-05-08 10:16:36,017][INFO ][discovery                ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] ae793673-21b1-46f1-ac7f-04fdd6440a60/m-lEe1lqQ2264DHt2KBcMQ
[2015-05-08 10:16:39,257][INFO ][cluster.service          ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] detected_master [4197e29a-3751-434e-921b-f8f3ee07edd4][BzXofoK-R9-F8PdDZ6pyyw][lsp-001.la3.4over.com][inet[/192.168.115.163:9300]]{max_local_storage_nodes=1}, added {[4197e29a-3751-434e-921b-f8f3ee07edd4][BzXofoK-R9-F8PdDZ6pyyw][lsp-001.la3.4over.com][inet[/192.168.115.163:9300]]{max_local_storage_nodes=1},}, reason: zen-disco-receive(from master [[4197e29a-3751-434e-921b-f8f3ee07edd4][BzXofoK-R9-F8PdDZ6pyyw][lsp-001.la3.4over.com][inet[/192.168.115.163:9300]]{max_local_storage_nodes=1}])
[2015-05-08 10:16:39,477][INFO ][http                     ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] bound_address {inet[/127.0.0.1:9200]}, publish_address {inet[localhost/127.0.0.1:9200]}
[2015-05-08 10:16:39,478][INFO ][node                     ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] started
[2015-05-08 10:17:05,724][INFO ][node                     ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] stopping ...
[2015-05-08 10:17:05,934][INFO ][node                     ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] stopped
[2015-05-08 10:17:05,935][INFO ][node                     ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] closing ...
[2015-05-08 10:17:06,681][WARN ][netty.channel.DefaultChannelPipeline] An exception was thrown by an exception handler.
java.util.concurrent.RejectedExecutionException: Worker has already been shutdown
	at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.registerTask(AbstractNioSelector.java:120)
	at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:72)
	at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.executeInIoThread(NioWorker.java:36)
	at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:56)
	at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.executeInIoThread(NioWorker.java:36)
	at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioChannelSink.execute(AbstractNioChannelSink.java:34)
	at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.execute(DefaultChannelPipeline.java:636)
	at org.elasticsearch.common.netty.channel.Channels.fireExceptionCaughtLater(Channels.java:496)
	at org.elasticsearch.common.netty.channel.AbstractChannelSink.exceptionCaught(AbstractChannelSink.java:46)
	at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.notifyHandlerException(DefaultChannelPipeline.java:658)
	at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:577)
	at org.elasticsearch.common.netty.channel.Channels.write(Channels.java:704)
	at org.elasticsearch.common.netty.channel.Channels.write(Channels.java:671)
	at org.elasticsearch.common.netty.channel.AbstractChannel.write(AbstractChannel.java:248)
	at org.elasticsearch.transport.netty.NettyTransportChannel.sendResponse(NettyTransportChannel.java:97)
	at org.elasticsearch.transport.netty.NettyTransportChannel.sendResponse(NettyTransportChannel.java:68)
	at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:277)
	at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:268)
	at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
[2015-05-08 10:17:06,686][INFO ][node                     ] [c26bcd1f-4fb8-4e5d-a947-ecae0509fff0] closed

Re: Could not authenticate. Invalid token given.

Posted: Fri May 08, 2015 12:47 pm
by jolson
Please try pressing "Reset All Jobs" from the command subsystem page. After resetting your jobs, attempt to start elasticsearch from the web GUI. SSH into the node and run:

Code: Select all

service elasticsearch status
To see whether or not it's online. If it does not turn on, try starting it manually and watching the elasticsearch log:

Code: Select all

cat /var/log/elasticsearch/*.log
Any good information from the elasticsearch log? How long does elasticsearch run before it dies - or does it not even start?

I think the reason we're getting the error message is because elasticsearch is having difficulties running in the first place. Can you please check on the above and get back to us with your results?

I haven't seen an issue like this one before, so any additional information that you think may be useful is much appreciated.

Re: Could not authenticate. Invalid token given.

Posted: Fri May 08, 2015 1:22 pm
by mike4vr
Thank you. The suggestions, unfortunately, yield the same results. When I initially went to the Command Subsystem page, elasticsearch was down. I started the service from the command-line and was able to reset the jobs. Unfortunately, I still get the error "Could not authenticate. Invalid token given." I captured the output from *.log in /var/log/elasticsearch into a file for review. It is attached.

Re: Could not authenticate. Invalid token given.

Posted: Fri May 08, 2015 1:26 pm
by jolson
Thanks for the great information.

How much RAM is on the problem node? Any chance you could double the amount to see if that helps out?

I'd like you to check /var/log/messages for any instance of the kernel killing elasticsearch due to memory limitations:

Code: Select all

cat /var/log/messages

Re: Could not authenticate. Invalid token given.

Posted: Fri May 08, 2015 2:10 pm
by mike4vr
Initially, both nodes had 2GB each. I just upgraded them to 8GB and still see the same problem. /var/log/messages don't show much in terms of the elasticsearch / java being killed. Here's the memory utilization:

Code: Select all

# free -m
             total       used       free     shared    buffers     cached
Mem:          7872       5632       2240          0         46        627
-/+ buffers/cache:       4958       2914
Swap:         4031          0       4031

Re: Could not authenticate. Invalid token given.

Posted: Fri May 08, 2015 4:19 pm
by mike4vr
Okay, here's an update.

While tailing the elasticsearch logs on both nodes, I stopped the service on each. Then, I started node00 (faulty one) and it declared itself as the new master. Great. Started node01 and it detected node00 as the master and joined the fun. It all seems like things are working until I click "Stop" on elasticsearch in the web UI. That gets me back to where I started, which is the "Could not authenticate. Invalid token given." error.

One thing to note, this problem started as soon as I upgraded to 2015R1.4. Is there any way I can roll-back to 1.3 without it going haywire?

Re: Could not authenticate. Invalid token given.

Posted: Mon May 11, 2015 9:29 am
by jolson
One thing to note, this problem started as soon as I upgraded to 2015R1.4. Is there any way I can roll-back to 1.3 without it going haywire?
A rollback has the possibility of working - but I would recommend sticking to your current version if at all possible. The updates that were part of R1.4 should not have affected this component of NLS, leading me to believe that your issue could persist after rolling back.

Could I have you check your http logs please?

Code: Select all

cat /var/log/httpd/error_log
cat /var/log/httpd/access_log
Is your disk space usage fine?

Code: Select all

df -h
How about the age of the 'nagios' account?

Code: Select all

chage -l nagios
Let us know - if you wanted a remote soon we could easily move this to a ticket as well. Thanks!

Re: Could not authenticate. Invalid token given.

Posted: Mon May 11, 2015 11:03 am
by mike4vr
Yes, remote session sounds great. Please let me know how you'd like to proceed.