Recently downloaded trial Nagios LogServer crashes

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
DC6171
Posts: 6
Joined: Thu Jun 25, 2015 9:38 am

Recently downloaded trial Nagios LogServer crashes

Post by DC6171 »

Hello,

We recently started a Nagios Logserver trial using the vmware ovf and have seven devices pushing about 2.8GB of data per day to the log server, deleting all data after 7 days. Unfortunately, the appliance keeps periodically dying for some reason and has to be powered off and back on again. It doesn't seem to be too busy looking at vmware performance graphs for the appliance nor out of space. I've attached the tail end of the console message when the situation occurs. Any suggestions on what to try would be appreciated.

Environment Info:
4 CPU cores
100GB eager thick @~25% full
Physical Host: HP Proliant DL380 G9
Hypervisor: VMware ESXi, 6.0.0, 2494585
VMware Tools: Running, version:9536
Storage:VMFS5
You do not have the required permissions to view the files attached to this post.
Last edited by DC6171 on Thu Jun 25, 2015 10:14 am, edited 2 times in total.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Recently downloaded trial Nagios LogServer crashes

Post by jolson »

First, I'd like to know about what's inside of the virtual machine:
-Memory
-Number of CPUs
-Storage
-What version of NLS? (I assume the latest - R1.4)

I'd like you to send us the elasticsearch log generated during the time of failure:

Code: Select all

cat /var/log/elasticsearch/*.log
Some additional debug information:

Code: Select all

cat /etc/sysconfig/elasticsearch
grep -i 'out of memory' /var/log/messages
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
DC6171
Posts: 6
Joined: Thu Jun 25, 2015 9:38 am

Re: Recently downloaded trial Nagios LogServer crashes

Post by DC6171 »

[root@logserver01 elasticsearch]# cat /var/log/elasticsearch/*.log
[2015-06-25 09:17:52,342][INFO ][node ] [581ddc65-44cc-48af-88ce-290f486c5695] version[1.3.2], pid[1389], build[dee175d/2014-08-13T14:29:30Z]
[2015-06-25 09:17:52,343][INFO ][node ] [581ddc65-44cc-48af-88ce-290f486c5695] initializing ...
[2015-06-25 09:17:52,427][INFO ][plugins ] [581ddc65-44cc-48af-88ce-290f486c5695] loaded [knapsack-1.3.2.0-d5501ef], sites []
[2015-06-25 09:18:00,281][INFO ][node ] [581ddc65-44cc-48af-88ce-290f486c5695] initialized
[2015-06-25 09:18:00,282][INFO ][node ] [581ddc65-44cc-48af-88ce-290f486c5695] starting ...
[2015-06-25 09:18:00,531][INFO ][transport ] [581ddc65-44cc-48af-88ce-290f486c5695] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/172.22.12.38:9300]}
[2015-06-25 09:18:00,541][INFO ][discovery ] [581ddc65-44cc-48af-88ce-290f486c5695] fb3b397f-4380-4031-a93b-fcbe65d50872/ed6RA74XQA-Y_IaEZkGaSw
[2015-06-25 09:18:03,651][WARN ][transport.netty ] [581ddc65-44cc-48af-88ce-290f486c5695] exception caught on transport layer [[id: 0x4f3046df]], closing connection
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:150)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2015-06-25 09:18:05,119][INFO ][cluster.service ] [581ddc65-44cc-48af-88ce-290f486c5695] new_master [581ddc65-44cc-48af-88ce-290f486c5695][ed6RA74XQA-Y_IaEZkGaSw][logserver01.udp.com][inet[/172.22.12.38:9300]]{max_local_storage_nodes=1}, reason: zen-disco-join (elected_as_master)
[2015-06-25 09:18:05,201][INFO ][http ] [581ddc65-44cc-48af-88ce-290f486c5695] bound_address {inet[/127.0.0.1:9200]}, publish_address {inet[localhost/127.0.0.1:9200]}
[2015-06-25 09:18:05,202][INFO ][node ] [581ddc65-44cc-48af-88ce-290f486c5695] started
[2015-06-25 09:18:06,650][WARN ][transport.netty ] [581ddc65-44cc-48af-88ce-290f486c5695] exception caught on transport layer [[id: 0x69318183]], closing connection
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:150)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2015-06-25 09:18:07,164][INFO ][gateway ] [581ddc65-44cc-48af-88ce-290f486c5695] recovered [11] indices into cluster_state
[2015-06-25 09:18:07,955][DEBUG][action.search.type ] [581ddc65-44cc-48af-88ce-290f486c5695] All shards failed for phase: [query_fetch]
[2015-06-25 09:18:08,000][DEBUG][action.search.type ] [581ddc65-44cc-48af-88ce-290f486c5695] All shards failed for phase: [query_fetch]
[2015-06-25 09:18:13,015][DEBUG][action.search.type ] [581ddc65-44cc-48af-88ce-290f486c5695] All shards failed for phase: [query_fetch]
[2015-06-25 09:18:13,031][DEBUG][action.search.type ] [581ddc65-44cc-48af-88ce-290f486c5695] All shards failed for phase: [query_fetch]
[2015-06-25 09:18:18,079][DEBUG][action.search.type ] [581ddc65-44cc-48af-88ce-290f486c5695] All shards failed for phase: [query_fetch]
[2015-06-25 09:18:18,083][DEBUG][action.search.type ] [581ddc65-44cc-48af-88ce-290f486c5695] All shards failed for phase: [query_fetch]
[root@logserver01 elasticsearch]#

[root@logserver01 elasticsearch]# grep -i 'out of memory' /var/log/messages
[root@logserver01 elasticsearch]#

Thank you for any info.
DC6171
Posts: 6
Joined: Thu Jun 25, 2015 9:38 am

Re: Recently downloaded trial Nagios LogServer crashes

Post by DC6171 »

[root@logserver01 elasticsearch]# cat /etc/sysconfig/elasticsearch
# Directory where the Elasticsearch binary distribution resides
APP_DIR="/usr/local/nagioslogserver"
ES_HOME="$APP_DIR/elasticsearch"

# Heap Size (defaults to 256m min, 1g max)
# Nagios Log Server Default to 0.5 physical Memory
ES_HEAP_SIZE=$(expr $(free -m|awk '/^Mem:/{print $2}') / 2 )m

# Heap new generation
#ES_HEAP_NEWSIZE=

# max direct memory
#ES_DIRECT_SIZE=

# Additional Java OPTS
#ES_JAVA_OPTS=

# Maximum number of open files
MAX_OPEN_FILES=65535

# Maximum amount of locked memory
MAX_LOCKED_MEMORY=unlimited

# Maximum number of VMA (Virtual Memory Areas) a process can own
MAX_MAP_COUNT=262144

# Elasticsearch log directory
LOG_DIR=/var/log/elasticsearch

# Elasticsearch data directory
DATA_DIR="$ES_HOME/data"

# Elasticsearch work directory
WORK_DIR="$APP_DIR/tmp/elasticsearch"

# Elasticsearch conf directory
CONF_DIR="$ES_HOME/config"

# Elasticsearch configuration file (elasticsearch.yml)
CONF_FILE="$ES_HOME/config/elasticsearch.yml"

# User to run as, change this to a specific elasticsearch user if possible
# Also make sure, this user can write into the log directories in case you change them
# This setting only works for the init script, but has to be configured separately for systemd startup
ES_USER=nagios
ES_GROUP=nagios

# Configure restart on package upgrade (true, every other setting will lead to not restarting)
#RESTART_ON_UPGRADE=true

if [ "x$1" == "xstart" -o "x$1" == "xrestart" -o "x$1" == "xreload" -o "x$1" == "xforce-reload" ];then
GET_ES_CONFIG_MESSAGE="$( php $APP_DIR/scripts/get_es_config.php )"
GET_ES_CONFIG_RETURN=$?

if [ "$GET_ES_CONFIG_RETURN" != "0" ]; then
echo $GET_ES_CONFIG_MESSAGE
exit 1
else
ES_JAVA_OPTS="$GET_ES_CONFIG_MESSAGE"
fi
fi
[root@logserver01 elasticsearch]#
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Recently downloaded trial Nagios LogServer crashes

Post by jolson »

How much memory is in your server? I assume you'll need between 8 and 16 GB of memory to handle the load of 2.2GB daily logs.

When your server 'crashes', what are the symptoms?

Your elasticsearch log looks quite normal (or at least I don't see any obvious indication of a crash). Let's also see your logstash log:

Code: Select all

cat /var/log/logstash/logstash.log
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
DC6171
Posts: 6
Joined: Thu Jun 25, 2015 9:38 am

Re: Recently downloaded trial Nagios LogServer crashes

Post by DC6171 »

logserver guest is assigned 6GB ram. We can assign more if needed, we just haven't seen an indication.

As far as symptoms, the server console is at the screen as previously attached and the logserver guest is otherwise non-responsive.

Log result is:
[root@logserver01 elasticsearch]# cat /var/log/logstash/logstash.log
[root@logserver01 elasticsearch]#
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Recently downloaded trial Nagios LogServer crashes

Post by jolson »

I think it's a good idea to increase the amount of RAM allocated to the box as a test. Are you capable of giving the box 16GB of RAM? If so, please do so.

Once the RAM has been allocated and the box has been restarted, wait to see if it crashes once more. If so, immediately collect the logs mentioned below:

Code: Select all

cat /var/log/logstash/logstash.log
cat /var/log/elasticsearch/*.log
In the meantime, I'd like to take a look at some of your rotated logs. Please create some .tar.gz archives and send the resulting files to me:

Code: Select all

tar zcf /tmp/elasticsearch.tar.gz /var/log/elasticsearch/*
tar zcf /tmp/logstash.tar.gz /var/log/logstash/*
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
DC6171
Posts: 6
Joined: Thu Jun 25, 2015 9:38 am

Re: Recently downloaded trial Nagios LogServer crashes

Post by DC6171 »

Bumped memory from 6GB to 16GB to see if it makes a difference. Requested logs updated. Will respond back after a couple days of running or next event. Thank you.
You do not have the required permissions to view the files attached to this post.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Recently downloaded trial Nagios LogServer crashes

Post by jolson »

Sounds good - let us know what you find out. Thanks!
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
DC6171
Posts: 6
Joined: Thu Jun 25, 2015 9:38 am

Re: Recently downloaded trial Nagios LogServer crashes

Post by DC6171 »

Looks like insufficient memory was to blame. After allocating more memory, we have not had a recurrence where previously it was failing nightly. Thank you for the help!
Locked