New NLS node errors out when trying to add to cluster

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
2evanowen
Posts: 35
Joined: Fri Jan 16, 2015 1:24 pm

Re: New NLS node errors out when trying to add to cluster

Post by 2evanowen »

Code: Select all

[owen@barium ~]$ nmap -p 1-9800 10.30.216.88

Starting Nmap 6.40 ( http://nmap.org ) at 2015-03-24 12:05 MDT
Nmap scan report for radium.colo.seagate.com (10.30.216.88)
Host is up (0.00071s latency).
Not shown: 9785 closed ports
PORT     STATE SERVICE
22/tcp   open  ssh
37/tcp   open  time
80/tcp   open  http
111/tcp  open  rpcbind
443/tcp  open  https
2056/tcp open  unknown
2057/tcp open  unknown
2301/tcp open  compaqdiag
2381/tcp open  compaq-https
3515/tcp open  must-backplane
5544/tcp open  unknown
5666/tcp open  nrpe
9300/tcp open  vrace
9390/tcp open  unknown
9391/tcp open  unknown

Nmap done: 1 IP address (1 host up) scanned in 0.39 seconds
-----------------------------------------------------------------------------------------

[owen@radium ~]$ nmap -p 1-9800 10.30.216.56

Starting Nmap 6.47 ( http://nmap.org ) at 2015-03-24 12:04 MDT
Nmap scan report for barium.colo.seagate.com (10.30.216.56)
Host is up (0.00017s latency).
Not shown: 9793 closed ports
PORT     STATE SERVICE
22/tcp   open  ssh
37/tcp   open  time
80/tcp   open  http
111/tcp  open  rpcbind
2301/tcp open  compaqdiag
2381/tcp open  compaq-https
5666/tcp open  nrpe

Nmap done: 1 IP address (1 host up) scanned in 0.13 seconds
Last edited by 2evanowen on Tue Mar 24, 2015 2:17 pm, edited 2 times in total.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: New NLS node errors out when trying to add to cluster

Post by jolson »

On your 10.30.216.56 host, there are supposed to be ports open for 9200 and 9300. These ports are controlled by elasticsearch - please check to ensure elasticsearch is running on that server:

Code: Select all

service elasticsearch status
If not, it will need to be on.

I see that you have NRPE on that server as well. Did you install NRPE first, or Nagios Log Server?
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
2evanowen
Posts: 35
Joined: Fri Jan 16, 2015 1:24 pm

Re: New NLS node errors out when trying to add to cluster

Post by 2evanowen »

Okay so on my 10.30.216.56 host I restarted the elasticsearch service then I ran nmap again on the other server to check which ports were open. Looks like restarting the service didn't help.

Code: Select all

[owen@radium ~]$ nmap -p 1-9800 10.30.216.56

Starting Nmap 6.47 ( http://nmap.org ) at 2015-03-24 12:42 MDT
Nmap scan report for barium.colo.seagate.com (10.30.216.56)
Host is up (0.00013s latency).
Not shown: 9793 closed ports
PORT     STATE SERVICE
22/tcp   open  ssh
37/tcp   open  time
80/tcp   open  http
111/tcp  open  rpcbind
2301/tcp open  compaqdiag
2381/tcp open  compaq-https
5666/tcp open  nrpe

Nmap done: 1 IP address (1 host up) scanned in 0.14 seconds

NRPE was definitely first. We were monitoring both of these servers with Nagios long before this was thought to use Nagios Log Server.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: New NLS node errors out when trying to add to cluster

Post by tgriep »

I just talked to the developers and they said that selinux needs to be disabled on the new server.
Run this on the new server to see if selinux is disabled.

Code: Select all

getenforce
Also you you run this on both servers and post the output?

Code: Select all

cat /usr/local/nagioslogserver/var/cluster_uuid
Be sure to check out our Knowledgebase for helpful articles and solutions!
2evanowen
Posts: 35
Joined: Fri Jan 16, 2015 1:24 pm

Re: New NLS node errors out when trying to add to cluster

Post by 2evanowen »

Alright the new server was in permissive before but I changed it to disabled. The old one was already disabled. Now they are both disabled.

Code: Select all

[owen@barium ~]$ getenforce
Disabled

[owen@radium ~]$ getenforce
Disabled

I tried to enter in the Cluster ID and Hostname/IP again, It failed again with the same error message both times.

Code: Select all

[owen@barium ~]$ cat /usr/local/nagioslogserver/var/cluster_uuid
76900ee2-f769-413c-9948-850204a96b32

[owen@radium ~]$ cat /usr/local/nagioslogserver/var/cluster_uuid
76900ee2-f769-413c-9948-850204a96b32
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: New NLS node errors out when trying to add to cluster

Post by jolson »

I believe that the quickest way to resolve this issue may be to reformat the client box with CentOS and re-install NLS if that is a possibility. It is very likely that it's the client causing the problems - it may also save you headaches in the future.

If that's not a possibility, we should take a look at your elasticsearch configuration files, and your elasticsearch logs:

Code: Select all

cat /usr/local/nagioslogserver/elasticsearch/config/elasticsearch.yml

Code: Select all

cat /etc/sysconfig/elasticsearch

Code: Select all

tail -n30 /var/log/elasticsearch/*.log
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: New NLS node errors out when trying to add to cluster

Post by tgriep »

Also you you run this on both servers and post the output?

Code: Select all

cat /usr/local/nagioslogserver/var/node_uuid
Be sure to check out our Knowledgebase for helpful articles and solutions!
2evanowen
Posts: 35
Joined: Fri Jan 16, 2015 1:24 pm

Re: New NLS node errors out when trying to add to cluster

Post by 2evanowen »

Definitely can't reformat the current NLS box (Radium). There are other applications running on that, which I am not in charge of.
We could reformat the new NLS (Barium the one I am trying to connect to Radium) but as a last effort.

With that being said here are the config/logs of Barium.

Code: Select all

[owen@barium ~]$ cat /usr/local/nagioslogserver/elasticsearch/config/elasticsearch.yml
##################### Elasticsearch Configuration Example #####################

# This file contains an overview of various configuration settings,
# targeted at operations staff. Application developers should
# consult the guide at <http://elasticsearch.org/guide>.
#
# The installation procedure is covered at
# <http://elasticsearch.org/guide/en/elasticsearch/reference/current/setup.html>.
#
# Elasticsearch comes with reasonable defaults for most settings,
# so you can try it out without bothering with configuration.
#
# Most of the time, these defaults are just fine for running a production
# cluster. If you're fine-tuning your cluster, or wondering about the
# effect of certain configuration option, please _do ask_ on the
# mailing list or IRC channel [http://elasticsearch.org/community].

# Any element in the configuration can be replaced with environment variables
# by placing them in ${...} notation. For example:
#
# node.rack: ${RACK_ENV_VAR}

# For information on supported formats and syntax for the config file, see
# <http://elasticsearch.org/guide/en/elasticsearch/reference/current/setup-configuration.html>


################################### Cluster ###################################

# Cluster name identifies your cluster for auto-discovery. If you're running
# multiple clusters on the same network, make sure you're using unique names.
#
cluster.name: nagios_elasticsearch


#################################### Node #####################################

# Node names are generated dynamically on startup, so you're relieved
# from configuring them manually. You can tie this node to a specific name:
#
# node.name: "Franz Kafka"

# Every node can be configured to allow or deny being eligible as the master,
# and to allow or deny to store the data.
#
# Allow this node to be eligible as a master node (enabled by default):
#
# node.master: true
#
# Allow this node to store data (enabled by default):
#
# node.data: true

# You can exploit these settings to design advanced cluster topologies.
#
# 1. You want this node to never become a master node, only to hold data.
#    This will be the "workhorse" of your cluster.
#
# node.master: false
# node.data: true
#
# 2. You want this node to only serve as a master: to not store any data and
#    to have free resources. This will be the "coordinator" of your cluster.
#
# node.master: true
# node.data: false
#
# 3. You want this node to be neither master nor data node, but
#    to act as a "search load balancer" (fetching data from nodes,
#    aggregating results, etc.)
#
# node.master: false
# node.data: false

# Use the Cluster Health API [http://localhost:9200/_cluster/health], the
# Node Info API [http://localhost:9200/_nodes] or GUI tools
# such as <http://www.elasticsearch.org/overview/marvel/>,
# <http://github.com/karmi/elasticsearch-paramedic>,
# <http://github.com/lukas-vlcek/bigdesk> and
# <http://mobz.github.com/elasticsearch-head> to inspect the cluster state.

# A node can have generic attributes associated with it, which can later be used
# for customized shard allocation filtering, or allocation awareness. An attribute
# is a simple key value pair, similar to node.key: value, here is an example:
#
# node.rack: rack314

# By default, multiple nodes are allowed to start from the same installation location
# to disable it, set the following:
node.max_local_storage_nodes: 1


#################################### Index ####################################

# You can set a number of options (such as shard/replica options, mapping
# or analyzer definitions, translog settings, ...) for indices globally,
# in this file.
#
# Note, that it makes more sense to configure index settings specifically for
# a certain index, either when creating it or by using the index templates API.
#
# See <http://elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html> and
# <http://elasticsearch.org/guide/en/elasticsearch/reference/current/indices-create-index.html>
# for more information.

# Set the number of shards (splits) of an index (5 by default):
#
# index.number_of_shards: 5

# Set the number of replicas (additional copies) of an index (1 by default):
#
# index.number_of_replicas: 1

# Note, that for development on a local machine, with small indices, it usually
# makes sense to "disable" the distributed features:
#
# index.number_of_shards: 1
# index.number_of_replicas: 0

# These settings directly affect the performance of index and search operations
# in your cluster. Assuming you have enough machines to hold shards and
# replicas, the rule of thumb is:
#
# 1. Having more *shards* enhances the _indexing_ performance and allows to
#    _distribute_ a big index across machines.
# 2. Having more *replicas* enhances the _search_ performance and improves the
#    cluster _availability_.
#
# The "number_of_shards" is a one-time setting for an index.
#
# The "number_of_replicas" can be increased or decreased anytime,
# by using the Index Update Settings API.
#
# Elasticsearch takes care about load balancing, relocating, gathering the
# results from nodes, etc. Experiment with different settings to fine-tune
# your setup.

# Use the Index Status API (<http://localhost:9200/A/_status>) to inspect
# the index status.


#################################### Paths ####################################

# Path to directory containing configuration (this file and logging.yml):
#
# path.conf: /path/to/conf

# Path to directory where to store index data allocated for this node.
#
# path.data: /path/to/data
#
# Can optionally include more than one location, causing data to be striped across
# the locations (a la RAID 0) on a file level, favouring locations with most free
# space on creation. For example:
#
# path.data: /path/to/data1,/path/to/data2

# Path to temporary files:
#
# path.work: /path/to/work

# Path to log files:
#
# path.logs: /path/to/logs

# Path to where plugins are installed:
#
# path.plugins: /path/to/plugins


#################################### Plugin ###################################

# If a plugin listed here is not installed for current node, the node will not start.
#
# plugin.mandatory: mapper-attachments,lang-groovy


################################### Memory ####################################

# Elasticsearch performs poorly when JVM starts swapping: you should ensure that
# it _never_ swaps.
#
# Set this property to true to lock the memory:
#
bootstrap.mlockall: true

# Make sure that the ES_MIN_MEM and ES_MAX_MEM environment variables are set
# to the same value, and that the machine has enough memory to allocate
# for Elasticsearch, leaving enough memory for the operating system itself.
#
# You should also make sure that the Elasticsearch process is allowed to lock
# the memory, eg. by using `ulimit -l unlimited`.


############################## Network And HTTP ###############################

# Elasticsearch, by default, binds itself to the 0.0.0.0 address, and listens
# on port [9200-9300] for HTTP traffic and on port [9300-9400] for node-to-node
# communication. (the range means that if the port is busy, it will automatically
# try the next port).

# Set the bind address specifically (IPv4 or IPv6):
#
# network.bind_host: 192.168.0.1

# Set the address other nodes will use to communicate with this node. If not
# set, it is automatically derived. It must point to an actual IP address.
#
# network.publish_host: 192.168.0.1

# Set both 'bind_host' and 'publish_host':
#
# network.host: 192.168.0.1

# Set a custom port for the node to node communication (9300 by default):
#
# transport.tcp.port: 9300

# Enable compression for all communication between nodes (disabled by default):
#
transport.tcp.compress: true

# Set a custom port to listen for HTTP traffic:
#
# http.port: 9200

# Set a custom allowed content length:
#
# http.max_content_length: 100mb

# Disable HTTP completely:
#
# http.enabled: false

# Set the HTTP host to listen to
#
http.host: "localhost"

################################### Gateway ###################################

# The gateway allows for persisting the cluster state between full cluster
# restarts. Every change to the state (such as adding an index) will be stored
# in the gateway, and when the cluster starts up for the first time,
# it will read its state from the gateway.

# There are several types of gateway implementations. For more information, see
# <http://elasticsearch.org/guide/en/elasticsearch/reference/current/modules-gateway.html>.

# The default gateway type is the "local" gateway (recommended):
#
# gateway.type: local

# Settings below control how and when to start the initial recovery process on
# a full cluster restart (to reuse as much local data as possible when using shared
# gateway).

# Allow recovery process after N nodes in a cluster are up:
#
# gateway.recover_after_nodes: 1

# Set the timeout to initiate the recovery process, once the N nodes
# from previous setting are up (accepts time value):
#
# gateway.recover_after_time: 5m

# Set how many nodes are expected in this cluster. Once these N nodes
# are up (and recover_after_nodes is met), begin recovery process immediately
# (without waiting for recover_after_time to expire):
#
# gateway.expected_nodes: 2


############################# Recovery Throttling #############################

# These settings allow to control the process of shards allocation between
# nodes during initial recovery, replica allocation, rebalancing,
# or when adding and removing nodes.

# Set the number of concurrent recoveries happening on a node:
#
# 1. During the initial recovery
#
# cluster.routing.allocation.node_initial_primaries_recoveries: 4
#
# 2. During adding/removing nodes, rebalancing, etc
#
# cluster.routing.allocation.node_concurrent_recoveries: 2

# Set to throttle throughput when recovering (eg. 100mb, by default 20mb):
#
# indices.recovery.max_bytes_per_sec: 20mb

# Set to limit the number of open concurrent streams when
# recovering a shard from a peer:
#
# indices.recovery.concurrent_streams: 5


################################## Discovery ##################################

# Discovery infrastructure ensures nodes can be found within a cluster
# and master node is elected. Multicast discovery is the default.

# Set to ensure a node sees N other master eligible nodes to be considered
# operational within the cluster. Its recommended to set it to a higher value
# than 1 when running more than 2 nodes in the cluster.
#
# discovery.zen.minimum_master_nodes: 1

# Set the time to wait for ping responses from other nodes when discovering.
# Set this option to a higher value on a slow or congested network
# to minimize discovery failures:
#
# discovery.zen.ping.timeout: 3s

# For more information, see
# <http://elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-zen.html>

# Unicast discovery allows to explicitly control which nodes will be used
# to discover the cluster. It can be used when multicast is not present,
# or to restrict the cluster communication-wise.
#
# 1. Disable multicast discovery (enabled by default):
#
discovery.zen.ping.multicast.enabled: false
#
# 2. Configure an initial list of master nodes in the cluster
#    to perform discovery when new nodes (master or data) are started:
#
discovery.zen.ping.unicast.hosts: ["localhost"]

# EC2 discovery allows to use AWS EC2 API in order to perform discovery.
#
# You have to install the cloud-aws plugin for enabling the EC2 discovery.
#
# For more information, see
# <http://elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-ec2.html>
#
# See <http://elasticsearch.org/tutorials/elasticsearch-on-ec2/>
# for a step-by-step tutorial.

# GCE discovery allows to use Google Compute Engine API in order to perform discovery.
#
# You have to install the cloud-gce plugin for enabling the GCE discovery.
#
# For more information, see <https://github.com/elasticsearch/elasticsearch-cloud-gce>.

# Azure discovery allows to use Azure API in order to perform discovery.
#
# You have to install the cloud-azure plugin for enabling the Azure discovery.
#
# For more information, see <https://github.com/elasticsearch/elasticsearch-cloud-azure>.

################################## Slow Log ##################################

# Shard level query and fetch threshold logging.

#index.search.slowlog.threshold.query.warn: 10s
#index.search.slowlog.threshold.query.info: 5s
#index.search.slowlog.threshold.query.debug: 2s
#index.search.slowlog.threshold.query.trace: 500ms

#index.search.slowlog.threshold.fetch.warn: 1s
#index.search.slowlog.threshold.fetch.info: 800ms
#index.search.slowlog.threshold.fetch.debug: 500ms
#index.search.slowlog.threshold.fetch.trace: 200ms

#index.indexing.slowlog.threshold.index.warn: 10s
#index.indexing.slowlog.threshold.index.info: 5s
#index.indexing.slowlog.threshold.index.debug: 2s
#index.indexing.slowlog.threshold.index.trace: 500ms

################################## GC Logging ################################

#monitor.jvm.gc.young.warn: 1000ms
#monitor.jvm.gc.young.info: 700ms
#monitor.jvm.gc.young.debug: 400ms

#monitor.jvm.gc.old.warn: 10s
#monitor.jvm.gc.old.info: 5s
#monitor.jvm.gc.old.debug: 2s

Code: Select all

[owen@barium ~]$ cat /etc/sysconfig/elasticsearch
# Directory where the Elasticsearch binary distribution resides
APP_DIR="/usr/local/nagioslogserver"
ES_HOME="$APP_DIR/elasticsearch"

# Heap Size (defaults to 256m min, 1g max)
#ES_HEAP_SIZE=2g

# Heap new generation
#ES_HEAP_NEWSIZE=

# max direct memory
#ES_DIRECT_SIZE=

# Additional Java OPTS
#ES_JAVA_OPTS=

# Maximum number of open files
MAX_OPEN_FILES=65535

# Maximum amount of locked memory
#MAX_LOCKED_MEMORY=

# Maximum number of VMA (Virtual Memory Areas) a process can own
MAX_MAP_COUNT=262144

# Elasticsearch log directory
LOG_DIR=/var/log/elasticsearch

# Elasticsearch data directory
DATA_DIR="$ES_HOME/data"

# Elasticsearch work directory
WORK_DIR="$APP_DIR/tmp/elasticsearch"

# Elasticsearch conf directory
CONF_DIR="$ES_HOME/config"

# Elasticsearch configuration file (elasticsearch.yml)
CONF_FILE="$ES_HOME/config/elasticsearch.yml"

# User to run as, change this to a specific elasticsearch user if possible
# Also make sure, this user can write into the log directories in case you change them
# This setting only works for the init script, but has to be configured separately for systemd startup
ES_USER=nagios
ES_GROUP=nagios

# Configure restart on package upgrade (true, every other setting will lead to not restarting)
#RESTART_ON_UPGRADE=true

if [ "x$1" == "xstart" -o "x$1" == "xrestart" -o "x$1" == "xreload" -o "x$1" == "xforce-reload" ];then
	GET_ES_CONFIG_MESSAGE="$( php $APP_DIR/scripts/get_es_config.php )"
	GET_ES_CONFIG_RETURN=$?

	if [ "$GET_ES_CONFIG_RETURN" != "0" ]; then
		echo $GET_ES_CONFIG_MESSAGE
		exit 1
	else
		ES_JAVA_OPTS="$GET_ES_CONFIG_MESSAGE"
	fi
fi

Code: Select all

[owen@barium ~]$ tail -n30 /var/log/elasticsearch/*.log
==> /var/log/elasticsearch/76900ee2-f769-413c-9948-850204a96b32_index_indexing_slowlog.log <==

==> /var/log/elasticsearch/76900ee2-f769-413c-9948-850204a96b32_index_search_slowlog.log <==

==> /var/log/elasticsearch/76900ee2-f769-413c-9948-850204a96b32.log <==
[2015-03-24 15:18:42,056][INFO ][node                     ] [8d1dbf6a-3a17-4e87-8f17-899bdaf40237] initializing ...
[2015-03-24 15:18:42,105][INFO ][plugins                  ] [8d1dbf6a-3a17-4e87-8f17-899bdaf40237] loaded [knapsack-1.3.2.0-d5501ef], sites []
[2015-03-24 15:18:45,221][WARN ][transport                ] [8d1dbf6a-3a17-4e87-8f17-899bdaf40237] Registered two transport handlers for action index/shard/exists, handlers: org.elasticsearch.indices.store.IndicesStore$ShardActiveRequestHandler@6913b29e, org.elasticsearch.indices.store.IndicesStore$ShardActiveRequestHandler@c33d8f7
[2015-03-24 15:18:45,887][ERROR][bootstrap                ] {1.3.2}: Initialization Failed ...
1) ElasticsearchIllegalArgumentException[Failed to resolve address for [https://radium.colo.seagate.com]]
	NumberFormatException[For input string: "//radium.colo.seagate.com"]2) IllegalStateException[This is a proxy used to support circular references involving constructors. The object we're proxying is not constructed yet. Please wait until after injection has completed to use this object.]
[2015-03-24 15:20:01,586][WARN ][common.jna               ] Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out. Increase RLIMIT_MEMLOCK (ulimit).
[2015-03-24 15:20:01,662][INFO ][node                     ] [00e914ae-5c34-4504-a80f-f18b8ef8e796] version[1.3.2], pid[3376], build[dee175d/2014-08-13T14:29:30Z]
[2015-03-24 15:20:01,663][INFO ][node                     ] [00e914ae-5c34-4504-a80f-f18b8ef8e796] initializing ...
[2015-03-24 15:20:01,675][INFO ][plugins                  ] [00e914ae-5c34-4504-a80f-f18b8ef8e796] loaded [knapsack-1.3.2.0-d5501ef], sites []
[2015-03-24 15:20:03,763][WARN ][transport                ] [00e914ae-5c34-4504-a80f-f18b8ef8e796] Registered two transport handlers for action index/shard/exists, handlers: org.elasticsearch.indices.store.IndicesStore$ShardActiveRequestHandler@25c73030, org.elasticsearch.indices.store.IndicesStore$ShardActiveRequestHandler@382cb2b0
[2015-03-24 15:20:04,405][ERROR][bootstrap                ] {1.3.2}: Initialization Failed ...
1) ElasticsearchIllegalArgumentException[Failed to resolve address for [https://radium.colo.seagate.com]]
	NumberFormatException[For input string: "//radium.colo.seagate.com"]2) IllegalStateException[This is a proxy used to support circular references involving constructors. The object we're proxying is not constructed yet. Please wait until after injection has completed to use this object.]
[2015-03-24 15:22:10,428][WARN ][common.jna               ] Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out. Increase RLIMIT_MEMLOCK (ulimit).
[2015-03-24 15:22:10,506][INFO ][node                     ] [d9fc1952-0870-4043-a997-3179b81c6a16] version[1.3.2], pid[3459], build[dee175d/2014-08-13T14:29:30Z]
[2015-03-24 15:22:10,506][INFO ][node                     ] [d9fc1952-0870-4043-a997-3179b81c6a16] initializing ...
[2015-03-24 15:22:10,518][INFO ][plugins                  ] [d9fc1952-0870-4043-a997-3179b81c6a16] loaded [knapsack-1.3.2.0-d5501ef], sites []
[2015-03-24 15:22:12,614][WARN ][transport                ] [d9fc1952-0870-4043-a997-3179b81c6a16] Registered two transport handlers for action index/shard/exists, handlers: org.elasticsearch.indices.store.IndicesStore$ShardActiveRequestHandler@4b2ef3e5, org.elasticsearch.indices.store.IndicesStore$ShardActiveRequestHandler@778e65f2
[2015-03-24 15:22:13,257][ERROR][bootstrap                ] {1.3.2}: Initialization Failed ...
1) ElasticsearchIllegalArgumentException[Failed to resolve address for [https://radium.colo.seagate.com]]
	NumberFormatException[For input string: "//radium.colo.seagate.com"]2) IllegalStateException[This is a proxy used to support circular references involving constructors. The object we're proxying is not constructed yet. Please wait until after injection has completed to use this object.]
[2015-03-24 15:52:57,089][WARN ][common.jna               ] Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out. Increase RLIMIT_MEMLOCK (ulimit).
[2015-03-24 15:52:57,167][INFO ][node                     ] [d9fc1952-0870-4043-a997-3179b81c6a16] version[1.3.2], pid[7584], build[dee175d/2014-08-13T14:29:30Z]
[2015-03-24 15:52:57,167][INFO ][node                     ] [d9fc1952-0870-4043-a997-3179b81c6a16] initializing ...
[2015-03-24 15:52:57,179][INFO ][plugins                  ] [d9fc1952-0870-4043-a997-3179b81c6a16] loaded [knapsack-1.3.2.0-d5501ef], sites []
[2015-03-24 15:52:59,301][WARN ][transport                ] [d9fc1952-0870-4043-a997-3179b81c6a16] Registered two transport handlers for action index/shard/exists, handlers: org.elasticsearch.indices.store.IndicesStore$ShardActiveRequestHandler@778e65f2, org.elasticsearch.indices.store.IndicesStore$ShardActiveRequestHandler@25c73030
[2015-03-24 15:52:59,957][ERROR][bootstrap                ] {1.3.2}: Initialization Failed ...
1) ElasticsearchIllegalArgumentException[Failed to resolve address for [https://radium.colo.seagate.com]]
	NumberFormatException[For input string: "//radium.colo.seagate.com"]2) IllegalStateException[This is a proxy used to support circular references involving constructors. The object we're proxying is not constructed yet. Please wait until after injection has completed to use this object.]

==> /var/log/elasticsearch/91614dae-5637-43b7-8ef9-caafa2163f77_index_indexing_slowlog.log <==

==> /var/log/elasticsearch/91614dae-5637-43b7-8ef9-caafa2163f77_index_search_slowlog.log <==

==> /var/log/elasticsearch/91614dae-5637-43b7-8ef9-caafa2163f77.log <==

==> /var/log/elasticsearch/9de70213-13e3-4951-bb73-00ea40c5fda2_index_indexing_slowlog.log <==

==> /var/log/elasticsearch/9de70213-13e3-4951-bb73-00ea40c5fda2_index_search_slowlog.log <==

==> /var/log/elasticsearch/9de70213-13e3-4951-bb73-00ea40c5fda2.log <==

And Radium.

Code: Select all

[owen@radium ~]$ cat /usr/local/nagioslogserver/elasticsearch/config/elasticsearch.yml
##################### Elasticsearch Configuration Example #####################

# This file contains an overview of various configuration settings,
# targeted at operations staff. Application developers should
# consult the guide at <http://elasticsearch.org/guide>.
#
# The installation procedure is covered at
# <http://elasticsearch.org/guide/en/elasticsearch/reference/current/setup.html>.
#
# Elasticsearch comes with reasonable defaults for most settings,
# so you can try it out without bothering with configuration.
#
# Most of the time, these defaults are just fine for running a production
# cluster. If you're fine-tuning your cluster, or wondering about the
# effect of certain configuration option, please _do ask_ on the
# mailing list or IRC channel [http://elasticsearch.org/community].

# Any element in the configuration can be replaced with environment variables
# by placing them in ${...} notation. For example:
#
# node.rack: ${RACK_ENV_VAR}

# For information on supported formats and syntax for the config file, see
# <http://elasticsearch.org/guide/en/elasticsearch/reference/current/setup-configuration.html>


################################### Cluster ###################################

# Cluster name identifies your cluster for auto-discovery. If you're running
# multiple clusters on the same network, make sure you're using unique names.
#
cluster.name: nagios_elasticsearch


#################################### Node #####################################

# Node names are generated dynamically on startup, so you're relieved
# from configuring them manually. You can tie this node to a specific name:
#
# node.name: "Franz Kafka"

# Every node can be configured to allow or deny being eligible as the master,
# and to allow or deny to store the data.
#
# Allow this node to be eligible as a master node (enabled by default):
#
# node.master: true
#
# Allow this node to store data (enabled by default):
#
# node.data: true

# You can exploit these settings to design advanced cluster topologies.
#
# 1. You want this node to never become a master node, only to hold data.
#    This will be the "workhorse" of your cluster.
#
# node.master: false
# node.data: true
#
# 2. You want this node to only serve as a master: to not store any data and
#    to have free resources. This will be the "coordinator" of your cluster.
#
# node.master: true
# node.data: false
#
# 3. You want this node to be neither master nor data node, but
#    to act as a "search load balancer" (fetching data from nodes,
#    aggregating results, etc.)
#
# node.master: false
# node.data: false

# Use the Cluster Health API [http://localhost:9200/_cluster/health], the
# Node Info API [http://localhost:9200/_nodes] or GUI tools
# such as <http://www.elasticsearch.org/overview/marvel/>,
# <http://github.com/karmi/elasticsearch-paramedic>,
# <http://github.com/lukas-vlcek/bigdesk> and
# <http://mobz.github.com/elasticsearch-head> to inspect the cluster state.

# A node can have generic attributes associated with it, which can later be used
# for customized shard allocation filtering, or allocation awareness. An attribute
# is a simple key value pair, similar to node.key: value, here is an example:
#
# node.rack: rack314

# By default, multiple nodes are allowed to start from the same installation location
# to disable it, set the following:
node.max_local_storage_nodes: 1


#################################### Index ####################################

# You can set a number of options (such as shard/replica options, mapping
# or analyzer definitions, translog settings, ...) for indices globally,
# in this file.
#
# Note, that it makes more sense to configure index settings specifically for
# a certain index, either when creating it or by using the index templates API.
#
# See <http://elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules.html> and
# <http://elasticsearch.org/guide/en/elasticsearch/reference/current/indices-create-index.html>
# for more information.

# Set the number of shards (splits) of an index (5 by default):
#
# index.number_of_shards: 5

# Set the number of replicas (additional copies) of an index (1 by default):
#
# index.number_of_replicas: 1

# Note, that for development on a local machine, with small indices, it usually
# makes sense to "disable" the distributed features:
#
# index.number_of_shards: 1
# index.number_of_replicas: 0

# These settings directly affect the performance of index and search operations
# in your cluster. Assuming you have enough machines to hold shards and
# replicas, the rule of thumb is:
#
# 1. Having more *shards* enhances the _indexing_ performance and allows to
#    _distribute_ a big index across machines.
# 2. Having more *replicas* enhances the _search_ performance and improves the
#    cluster _availability_.
#
# The "number_of_shards" is a one-time setting for an index.
#
# The "number_of_replicas" can be increased or decreased anytime,
# by using the Index Update Settings API.
#
# Elasticsearch takes care about load balancing, relocating, gathering the
# results from nodes, etc. Experiment with different settings to fine-tune
# your setup.

# Use the Index Status API (<http://localhost:9200/A/_status>) to inspect
# the index status.


#################################### Paths ####################################

# Path to directory containing configuration (this file and logging.yml):
#
# path.conf: /path/to/conf

# Path to directory where to store index data allocated for this node.
#
# path.data: /path/to/data
#
# Can optionally include more than one location, causing data to be striped across
# the locations (a la RAID 0) on a file level, favouring locations with most free
# space on creation. For example:
#
# path.data: /path/to/data1,/path/to/data2

# Path to temporary files:
#
# path.work: /path/to/work

# Path to log files:
#
# path.logs: /path/to/logs

# Path to where plugins are installed:
#
# path.plugins: /path/to/plugins


#################################### Plugin ###################################

# If a plugin listed here is not installed for current node, the node will not start.
#
# plugin.mandatory: mapper-attachments,lang-groovy


################################### Memory ####################################

# Elasticsearch performs poorly when JVM starts swapping: you should ensure that
# it _never_ swaps.
#
# Set this property to true to lock the memory:
#
bootstrap.mlockall: true

# Make sure that the ES_MIN_MEM and ES_MAX_MEM environment variables are set
# to the same value, and that the machine has enough memory to allocate
# for Elasticsearch, leaving enough memory for the operating system itself.
#
# You should also make sure that the Elasticsearch process is allowed to lock
# the memory, eg. by using `ulimit -l unlimited`.


############################## Network And HTTP ###############################

# Elasticsearch, by default, binds itself to the 0.0.0.0 address, and listens
# on port [9200-9300] for HTTP traffic and on port [9300-9400] for node-to-node
# communication. (the range means that if the port is busy, it will automatically
# try the next port).

# Set the bind address specifically (IPv4 or IPv6):
#
# network.bind_host: 192.168.0.1

# Set the address other nodes will use to communicate with this node. If not
# set, it is automatically derived. It must point to an actual IP address.
#
# network.publish_host: 192.168.0.1

# Set both 'bind_host' and 'publish_host':
#
# network.host: 192.168.0.1

# Set a custom port for the node to node communication (9300 by default):
#
# transport.tcp.port: 9300

# Enable compression for all communication between nodes (disabled by default):
#
transport.tcp.compress: true

# Set a custom port to listen for HTTP traffic:
#
# http.port: 9200

# Set a custom allowed content length:
#
# http.max_content_length: 100mb

# Disable HTTP completely:
#
# http.enabled: false

# Set the HTTP host to listen to
#
http.host: "localhost"

################################### Gateway ###################################

# The gateway allows for persisting the cluster state between full cluster
# restarts. Every change to the state (such as adding an index) will be stored
# in the gateway, and when the cluster starts up for the first time,
# it will read its state from the gateway.

# There are several types of gateway implementations. For more information, see
# <http://elasticsearch.org/guide/en/elasticsearch/reference/current/modules-gateway.html>.

# The default gateway type is the "local" gateway (recommended):
#
# gateway.type: local

# Settings below control how and when to start the initial recovery process on
# a full cluster restart (to reuse as much local data as possible when using shared
# gateway).

# Allow recovery process after N nodes in a cluster are up:
#
# gateway.recover_after_nodes: 1

# Set the timeout to initiate the recovery process, once the N nodes
# from previous setting are up (accepts time value):
#
# gateway.recover_after_time: 5m

# Set how many nodes are expected in this cluster. Once these N nodes
# are up (and recover_after_nodes is met), begin recovery process immediately
# (without waiting for recover_after_time to expire):
#
# gateway.expected_nodes: 2


############################# Recovery Throttling #############################

# These settings allow to control the process of shards allocation between
# nodes during initial recovery, replica allocation, rebalancing,
# or when adding and removing nodes.

# Set the number of concurrent recoveries happening on a node:
#
# 1. During the initial recovery
#
# cluster.routing.allocation.node_initial_primaries_recoveries: 4
#
# 2. During adding/removing nodes, rebalancing, etc
#
# cluster.routing.allocation.node_concurrent_recoveries: 2

# Set to throttle throughput when recovering (eg. 100mb, by default 20mb):
#
# indices.recovery.max_bytes_per_sec: 20mb

# Set to limit the number of open concurrent streams when
# recovering a shard from a peer:
#
# indices.recovery.concurrent_streams: 5


################################## Discovery ##################################

# Discovery infrastructure ensures nodes can be found within a cluster
# and master node is elected. Multicast discovery is the default.

# Set to ensure a node sees N other master eligible nodes to be considered
# operational within the cluster. Its recommended to set it to a higher value
# than 1 when running more than 2 nodes in the cluster.
#
# discovery.zen.minimum_master_nodes: 1

# Set the time to wait for ping responses from other nodes when discovering.
# Set this option to a higher value on a slow or congested network
# to minimize discovery failures:
#
# discovery.zen.ping.timeout: 3s

# For more information, see
# <http://elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-zen.html>

# Unicast discovery allows to explicitly control which nodes will be used
# to discover the cluster. It can be used when multicast is not present,
# or to restrict the cluster communication-wise.
#
# 1. Disable multicast discovery (enabled by default):
#
discovery.zen.ping.multicast.enabled: false
#
# 2. Configure an initial list of master nodes in the cluster
#    to perform discovery when new nodes (master or data) are started:
#
discovery.zen.ping.unicast.hosts: ["localhost"]

# EC2 discovery allows to use AWS EC2 API in order to perform discovery.
#
# You have to install the cloud-aws plugin for enabling the EC2 discovery.
#
# For more information, see
# <http://elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-ec2.html>
#
# See <http://elasticsearch.org/tutorials/elasticsearch-on-ec2/>
# for a step-by-step tutorial.

# GCE discovery allows to use Google Compute Engine API in order to perform discovery.
#
# You have to install the cloud-gce plugin for enabling the GCE discovery.
#
# For more information, see <https://github.com/elasticsearch/elasticsearch-cloud-gce>.

# Azure discovery allows to use Azure API in order to perform discovery.
#
# You have to install the cloud-azure plugin for enabling the Azure discovery.
#
# For more information, see <https://github.com/elasticsearch/elasticsearch-cloud-azure>.

################################## Slow Log ##################################

# Shard level query and fetch threshold logging.

#index.search.slowlog.threshold.query.warn: 10s
#index.search.slowlog.threshold.query.info: 5s
#index.search.slowlog.threshold.query.debug: 2s
#index.search.slowlog.threshold.query.trace: 500ms

#index.search.slowlog.threshold.fetch.warn: 1s
#index.search.slowlog.threshold.fetch.info: 800ms
#index.search.slowlog.threshold.fetch.debug: 500ms
#index.search.slowlog.threshold.fetch.trace: 200ms

#index.indexing.slowlog.threshold.index.warn: 10s
#index.indexing.slowlog.threshold.index.info: 5s
#index.indexing.slowlog.threshold.index.debug: 2s
#index.indexing.slowlog.threshold.index.trace: 500ms

################################## GC Logging ################################

#monitor.jvm.gc.young.warn: 1000ms
#monitor.jvm.gc.young.info: 700ms
#monitor.jvm.gc.young.debug: 400ms

#monitor.jvm.gc.old.warn: 10s
#monitor.jvm.gc.old.info: 5s
#monitor.jvm.gc.old.debug: 2s

Code: Select all

[owen@radium ~]$ cat /etc/sysconfig/elasticsearch
# Directory where the Elasticsearch binary distribution resides
APP_DIR="/usr/local/nagioslogserver"
ES_HOME="$APP_DIR/elasticsearch"

# Heap Size (defaults to 256m min, 1g max)
#ES_HEAP_SIZE=2g

# Heap new generation
#ES_HEAP_NEWSIZE=

# max direct memory
#ES_DIRECT_SIZE=

# Additional Java OPTS
#ES_JAVA_OPTS=

# Maximum number of open files
MAX_OPEN_FILES=65535

# Maximum amount of locked memory
#MAX_LOCKED_MEMORY=

# Maximum number of VMA (Virtual Memory Areas) a process can own
MAX_MAP_COUNT=262144

# Elasticsearch log directory
LOG_DIR=/var/log/elasticsearch

# Elasticsearch data directory
DATA_DIR="$ES_HOME/data"

# Elasticsearch work directory
WORK_DIR="$APP_DIR/tmp/elasticsearch"

# Elasticsearch conf directory
CONF_DIR="$ES_HOME/config"

# Elasticsearch configuration file (elasticsearch.yml)
CONF_FILE="$ES_HOME/config/elasticsearch.yml"

# User to run as, change this to a specific elasticsearch user if possible
# Also make sure, this user can write into the log directories in case you change them
# This setting only works for the init script, but has to be configured separately for systemd startup
ES_USER=nagios
ES_GROUP=nagios

# Configure restart on package upgrade (true, every other setting will lead to not restarting)
#RESTART_ON_UPGRADE=true

if [ "x$1" == "xstart" -o "x$1" == "xrestart" -o "x$1" == "xreload" -o "x$1" == "xforce-reload" ];then
	GET_ES_CONFIG_MESSAGE="$( php $APP_DIR/scripts/get_es_config.php )"
	GET_ES_CONFIG_RETURN=$?

	if [ "$GET_ES_CONFIG_RETURN" != "0" ]; then
		echo $GET_ES_CONFIG_MESSAGE
		exit 1
	else
		ES_JAVA_OPTS="$GET_ES_CONFIG_MESSAGE"
	fi
fi

Code: Select all

[owen@radium ~]$ tail -n30 /var/log/elasticsearch/*.log
==> /var/log/elasticsearch/24af0eef-97fc-4e53-b001-a349e49e0f78_index_indexing_slowlog.log <==

==> /var/log/elasticsearch/24af0eef-97fc-4e53-b001-a349e49e0f78_index_search_slowlog.log <==

==> /var/log/elasticsearch/24af0eef-97fc-4e53-b001-a349e49e0f78.log <==

==> /var/log/elasticsearch/76900ee2-f769-413c-9948-850204a96b32_index_indexing_slowlog.log <==

==> /var/log/elasticsearch/76900ee2-f769-413c-9948-850204a96b32_index_search_slowlog.log <==

==> /var/log/elasticsearch/76900ee2-f769-413c-9948-850204a96b32.log <==
[2015-03-24 11:00:02,159][INFO ][cluster.metadata         ] [91614dae-5637-43b7-8ef9-caafa2163f77] [logstash-2015.03.25] creating index, cause [auto(bulk api)], shards [5]/[1], mappings [_default_]
[2015-03-24 11:00:02,208][INFO ][cluster.metadata         ] [91614dae-5637-43b7-8ef9-caafa2163f77] [logstash-2015.03.25] update_mapping [syslog] (dynamic)
[2015-03-24 11:03:22,988][INFO ][index.engine.internal    ] [91614dae-5637-43b7-8ef9-caafa2163f77] [logstash-2015.03.23][0] updating index.codec.bloom.load from [true] to [false]

==> /var/log/elasticsearch/abe16b19-381d-43e8-a874-20566198a504_index_indexing_slowlog.log <==

==> /var/log/elasticsearch/abe16b19-381d-43e8-a874-20566198a504_index_search_slowlog.log <==

==> /var/log/elasticsearch/abe16b19-381d-43e8-a874-20566198a504.log <==

==> /var/log/elasticsearch/elasticsearch_index_indexing_slowlog.log <==

==> /var/log/elasticsearch/elasticsearch_index_search_slowlog.log <==

==> /var/log/elasticsearch/elasticsearch.log <==

==> /var/log/elasticsearch/f33034b3-7c06-4561-9fb1-f16b8d6c354d_index_indexing_slowlog.log <==

==> /var/log/elasticsearch/f33034b3-7c06-4561-9fb1-f16b8d6c354d_index_search_slowlog.log <==

==> /var/log/elasticsearch/f33034b3-7c06-4561-9fb1-f16b8d6c354d.log <==
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: New NLS node errors out when trying to add to cluster

Post by jolson »

There is no difference between your configuration and mine. I tested this in my lab by spinning up two log servers, and installing NRPE on them before attempting to connect them together - everything worked fine in the lab.

The only error of distinction that I see in your log is as follows:

Code: Select all

[2015-03-24 15:20:04,405][ERROR][bootstrap                ] {1.3.2}: Initialization Failed ...
1) ElasticsearchIllegalArgumentException[Failed to resolve address for [https://radium.colo.seagate.com]]
This is coming from your Barium machine - it looks like elasticsearch is trying to resolve HTTPS to radium.colo.seagate.com, and it cannot. I am not sure what this means or what the implications are - but it may be something you want to look into on your end. Can radium.colo.seagate.com be resolved properly from Barium?

From Barium:

Code: Select all

ping radium.colo.seagate.com

Code: Select all

cat /etc/hosts
I would like you to turn your logging level up on Barium:

Code: Select all

vi /usr/local/nagioslogserver/elasticsearch/config/logging.yml
Change 'es.logger.level' to DEBUG and restart elasticsearch:

Code: Select all

service elasticsearch restart
Please reproduce your issue by trying to join the node to your existing cluster.

Since we don't have a definite direction, I'd like you to zip up all of your logs and PM them to me, as they may contain sensitive information:

Code: Select all

tar cfz logs.tgz /var/log/ /usr/local/nagioslogserver/var/
Run nmap on Barium again - I would like you to run the nmap from the Barium machine itself.

Code: Select all

nmap -p 1-9800 localhost
Hopefully we can dig out some good logs from this procedure. Thanks!
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
2evanowen
Posts: 35
Joined: Fri Jan 16, 2015 1:24 pm

Re: New NLS node errors out when trying to add to cluster

Post by 2evanowen »

OKay.

Yeah I can ping Radium from Barium just fine.

Code: Select all

[owen@barium ~]$ ping radium.colo.seagate.com
PING radium.colo.seagate.com (10.30.216.88) 56(84) bytes of data.
64 bytes from radium.colo.seagate.com (10.30.216.88): icmp_seq=1 ttl=64 time=0.246 ms
64 bytes from radium.colo.seagate.com (10.30.216.88): icmp_seq=2 ttl=64 time=0.213 ms
64 bytes from radium.colo.seagate.com (10.30.216.88): icmp_seq=3 ttl=64 time=0.216 ms
^C
--- radium.colo.seagate.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.213/0.225/0.246/0.014 ms
We use a DNS server so Radium is not in /etc/hosts.

Code: Select all

[owen@barium ~]$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
Logging is now turned on to debug and elasticsearch service is restarted.
I reproduced the problem using the radium's full qualified domain name and also it's IP. So those both should be in the log.

I am PM-ing the log right now.

Code: Select all

[owen@barium ~]$ nmap -p 1-9800 localhost

Starting Nmap 6.40 ( http://nmap.org ) at 2015-03-25 15:47 MDT
Nmap scan report for localhost (127.0.0.1)
Host is up (0.00042s latency).
Other addresses for localhost (not scanned): 127.0.0.1
Not shown: 9791 closed ports
PORT     STATE SERVICE
22/tcp   open  ssh
25/tcp   open  smtp
37/tcp   open  time
80/tcp   open  http
111/tcp  open  rpcbind
199/tcp  open  smux
2301/tcp open  compaqdiag
2381/tcp open  compaq-https
5666/tcp open  nrpe

Nmap done: 1 IP address (1 host up) scanned in 0.22 seconds
Locked