Recently downloaded trial Nagios LogServer crashes
Recently downloaded trial Nagios LogServer crashes
Hello,
We recently started a Nagios Logserver trial using the vmware ovf and have seven devices pushing about 2.8GB of data per day to the log server, deleting all data after 7 days. Unfortunately, the appliance keeps periodically dying for some reason and has to be powered off and back on again. It doesn't seem to be too busy looking at vmware performance graphs for the appliance nor out of space. I've attached the tail end of the console message when the situation occurs. Any suggestions on what to try would be appreciated.
Environment Info:
4 CPU cores
100GB eager thick @~25% full
Physical Host: HP Proliant DL380 G9
Hypervisor: VMware ESXi, 6.0.0, 2494585
VMware Tools: Running, version:9536
Storage:VMFS5
We recently started a Nagios Logserver trial using the vmware ovf and have seven devices pushing about 2.8GB of data per day to the log server, deleting all data after 7 days. Unfortunately, the appliance keeps periodically dying for some reason and has to be powered off and back on again. It doesn't seem to be too busy looking at vmware performance graphs for the appliance nor out of space. I've attached the tail end of the console message when the situation occurs. Any suggestions on what to try would be appreciated.
Environment Info:
4 CPU cores
100GB eager thick @~25% full
Physical Host: HP Proliant DL380 G9
Hypervisor: VMware ESXi, 6.0.0, 2494585
VMware Tools: Running, version:9536
Storage:VMFS5
You do not have the required permissions to view the files attached to this post.
Last edited by DC6171 on Thu Jun 25, 2015 10:14 am, edited 2 times in total.
Re: Recently downloaded trial Nagios LogServer crashes
First, I'd like to know about what's inside of the virtual machine:
-Memory
-Number of CPUs
-Storage
-What version of NLS? (I assume the latest - R1.4)
I'd like you to send us the elasticsearch log generated during the time of failure:
Some additional debug information:
-Memory
-Number of CPUs
-Storage
-What version of NLS? (I assume the latest - R1.4)
I'd like you to send us the elasticsearch log generated during the time of failure:
Code: Select all
cat /var/log/elasticsearch/*.log
Code: Select all
cat /etc/sysconfig/elasticsearch
grep -i 'out of memory' /var/log/messages
Re: Recently downloaded trial Nagios LogServer crashes
[root@logserver01 elasticsearch]# cat /var/log/elasticsearch/*.log
[2015-06-25 09:17:52,342][INFO ][node ] [581ddc65-44cc-48af-88ce-290f486c5695] version[1.3.2], pid[1389], build[dee175d/2014-08-13T14:29:30Z]
[2015-06-25 09:17:52,343][INFO ][node ] [581ddc65-44cc-48af-88ce-290f486c5695] initializing ...
[2015-06-25 09:17:52,427][INFO ][plugins ] [581ddc65-44cc-48af-88ce-290f486c5695] loaded [knapsack-1.3.2.0-d5501ef], sites []
[2015-06-25 09:18:00,281][INFO ][node ] [581ddc65-44cc-48af-88ce-290f486c5695] initialized
[2015-06-25 09:18:00,282][INFO ][node ] [581ddc65-44cc-48af-88ce-290f486c5695] starting ...
[2015-06-25 09:18:00,531][INFO ][transport ] [581ddc65-44cc-48af-88ce-290f486c5695] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/172.22.12.38:9300]}
[2015-06-25 09:18:00,541][INFO ][discovery ] [581ddc65-44cc-48af-88ce-290f486c5695] fb3b397f-4380-4031-a93b-fcbe65d50872/ed6RA74XQA-Y_IaEZkGaSw
[2015-06-25 09:18:03,651][WARN ][transport.netty ] [581ddc65-44cc-48af-88ce-290f486c5695] exception caught on transport layer [[id: 0x4f3046df]], closing connection
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:150)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2015-06-25 09:18:05,119][INFO ][cluster.service ] [581ddc65-44cc-48af-88ce-290f486c5695] new_master [581ddc65-44cc-48af-88ce-290f486c5695][ed6RA74XQA-Y_IaEZkGaSw][logserver01.udp.com][inet[/172.22.12.38:9300]]{max_local_storage_nodes=1}, reason: zen-disco-join (elected_as_master)
[2015-06-25 09:18:05,201][INFO ][http ] [581ddc65-44cc-48af-88ce-290f486c5695] bound_address {inet[/127.0.0.1:9200]}, publish_address {inet[localhost/127.0.0.1:9200]}
[2015-06-25 09:18:05,202][INFO ][node ] [581ddc65-44cc-48af-88ce-290f486c5695] started
[2015-06-25 09:18:06,650][WARN ][transport.netty ] [581ddc65-44cc-48af-88ce-290f486c5695] exception caught on transport layer [[id: 0x69318183]], closing connection
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:150)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2015-06-25 09:18:07,164][INFO ][gateway ] [581ddc65-44cc-48af-88ce-290f486c5695] recovered [11] indices into cluster_state
[2015-06-25 09:18:07,955][DEBUG][action.search.type ] [581ddc65-44cc-48af-88ce-290f486c5695] All shards failed for phase: [query_fetch]
[2015-06-25 09:18:08,000][DEBUG][action.search.type ] [581ddc65-44cc-48af-88ce-290f486c5695] All shards failed for phase: [query_fetch]
[2015-06-25 09:18:13,015][DEBUG][action.search.type ] [581ddc65-44cc-48af-88ce-290f486c5695] All shards failed for phase: [query_fetch]
[2015-06-25 09:18:13,031][DEBUG][action.search.type ] [581ddc65-44cc-48af-88ce-290f486c5695] All shards failed for phase: [query_fetch]
[2015-06-25 09:18:18,079][DEBUG][action.search.type ] [581ddc65-44cc-48af-88ce-290f486c5695] All shards failed for phase: [query_fetch]
[2015-06-25 09:18:18,083][DEBUG][action.search.type ] [581ddc65-44cc-48af-88ce-290f486c5695] All shards failed for phase: [query_fetch]
[root@logserver01 elasticsearch]#
[root@logserver01 elasticsearch]# grep -i 'out of memory' /var/log/messages
[root@logserver01 elasticsearch]#
Thank you for any info.
[2015-06-25 09:17:52,342][INFO ][node ] [581ddc65-44cc-48af-88ce-290f486c5695] version[1.3.2], pid[1389], build[dee175d/2014-08-13T14:29:30Z]
[2015-06-25 09:17:52,343][INFO ][node ] [581ddc65-44cc-48af-88ce-290f486c5695] initializing ...
[2015-06-25 09:17:52,427][INFO ][plugins ] [581ddc65-44cc-48af-88ce-290f486c5695] loaded [knapsack-1.3.2.0-d5501ef], sites []
[2015-06-25 09:18:00,281][INFO ][node ] [581ddc65-44cc-48af-88ce-290f486c5695] initialized
[2015-06-25 09:18:00,282][INFO ][node ] [581ddc65-44cc-48af-88ce-290f486c5695] starting ...
[2015-06-25 09:18:00,531][INFO ][transport ] [581ddc65-44cc-48af-88ce-290f486c5695] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/172.22.12.38:9300]}
[2015-06-25 09:18:00,541][INFO ][discovery ] [581ddc65-44cc-48af-88ce-290f486c5695] fb3b397f-4380-4031-a93b-fcbe65d50872/ed6RA74XQA-Y_IaEZkGaSw
[2015-06-25 09:18:03,651][WARN ][transport.netty ] [581ddc65-44cc-48af-88ce-290f486c5695] exception caught on transport layer [[id: 0x4f3046df]], closing connection
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:150)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2015-06-25 09:18:05,119][INFO ][cluster.service ] [581ddc65-44cc-48af-88ce-290f486c5695] new_master [581ddc65-44cc-48af-88ce-290f486c5695][ed6RA74XQA-Y_IaEZkGaSw][logserver01.udp.com][inet[/172.22.12.38:9300]]{max_local_storage_nodes=1}, reason: zen-disco-join (elected_as_master)
[2015-06-25 09:18:05,201][INFO ][http ] [581ddc65-44cc-48af-88ce-290f486c5695] bound_address {inet[/127.0.0.1:9200]}, publish_address {inet[localhost/127.0.0.1:9200]}
[2015-06-25 09:18:05,202][INFO ][node ] [581ddc65-44cc-48af-88ce-290f486c5695] started
[2015-06-25 09:18:06,650][WARN ][transport.netty ] [581ddc65-44cc-48af-88ce-290f486c5695] exception caught on transport layer [[id: 0x69318183]], closing connection
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:150)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2015-06-25 09:18:07,164][INFO ][gateway ] [581ddc65-44cc-48af-88ce-290f486c5695] recovered [11] indices into cluster_state
[2015-06-25 09:18:07,955][DEBUG][action.search.type ] [581ddc65-44cc-48af-88ce-290f486c5695] All shards failed for phase: [query_fetch]
[2015-06-25 09:18:08,000][DEBUG][action.search.type ] [581ddc65-44cc-48af-88ce-290f486c5695] All shards failed for phase: [query_fetch]
[2015-06-25 09:18:13,015][DEBUG][action.search.type ] [581ddc65-44cc-48af-88ce-290f486c5695] All shards failed for phase: [query_fetch]
[2015-06-25 09:18:13,031][DEBUG][action.search.type ] [581ddc65-44cc-48af-88ce-290f486c5695] All shards failed for phase: [query_fetch]
[2015-06-25 09:18:18,079][DEBUG][action.search.type ] [581ddc65-44cc-48af-88ce-290f486c5695] All shards failed for phase: [query_fetch]
[2015-06-25 09:18:18,083][DEBUG][action.search.type ] [581ddc65-44cc-48af-88ce-290f486c5695] All shards failed for phase: [query_fetch]
[root@logserver01 elasticsearch]#
[root@logserver01 elasticsearch]# grep -i 'out of memory' /var/log/messages
[root@logserver01 elasticsearch]#
Thank you for any info.
Re: Recently downloaded trial Nagios LogServer crashes
[root@logserver01 elasticsearch]# cat /etc/sysconfig/elasticsearch
# Directory where the Elasticsearch binary distribution resides
APP_DIR="/usr/local/nagioslogserver"
ES_HOME="$APP_DIR/elasticsearch"
# Heap Size (defaults to 256m min, 1g max)
# Nagios Log Server Default to 0.5 physical Memory
ES_HEAP_SIZE=$(expr $(free -m|awk '/^Mem:/{print $2}') / 2 )m
# Heap new generation
#ES_HEAP_NEWSIZE=
# max direct memory
#ES_DIRECT_SIZE=
# Additional Java OPTS
#ES_JAVA_OPTS=
# Maximum number of open files
MAX_OPEN_FILES=65535
# Maximum amount of locked memory
MAX_LOCKED_MEMORY=unlimited
# Maximum number of VMA (Virtual Memory Areas) a process can own
MAX_MAP_COUNT=262144
# Elasticsearch log directory
LOG_DIR=/var/log/elasticsearch
# Elasticsearch data directory
DATA_DIR="$ES_HOME/data"
# Elasticsearch work directory
WORK_DIR="$APP_DIR/tmp/elasticsearch"
# Elasticsearch conf directory
CONF_DIR="$ES_HOME/config"
# Elasticsearch configuration file (elasticsearch.yml)
CONF_FILE="$ES_HOME/config/elasticsearch.yml"
# User to run as, change this to a specific elasticsearch user if possible
# Also make sure, this user can write into the log directories in case you change them
# This setting only works for the init script, but has to be configured separately for systemd startup
ES_USER=nagios
ES_GROUP=nagios
# Configure restart on package upgrade (true, every other setting will lead to not restarting)
#RESTART_ON_UPGRADE=true
if [ "x$1" == "xstart" -o "x$1" == "xrestart" -o "x$1" == "xreload" -o "x$1" == "xforce-reload" ];then
GET_ES_CONFIG_MESSAGE="$( php $APP_DIR/scripts/get_es_config.php )"
GET_ES_CONFIG_RETURN=$?
if [ "$GET_ES_CONFIG_RETURN" != "0" ]; then
echo $GET_ES_CONFIG_MESSAGE
exit 1
else
ES_JAVA_OPTS="$GET_ES_CONFIG_MESSAGE"
fi
fi
[root@logserver01 elasticsearch]#
# Directory where the Elasticsearch binary distribution resides
APP_DIR="/usr/local/nagioslogserver"
ES_HOME="$APP_DIR/elasticsearch"
# Heap Size (defaults to 256m min, 1g max)
# Nagios Log Server Default to 0.5 physical Memory
ES_HEAP_SIZE=$(expr $(free -m|awk '/^Mem:/{print $2}') / 2 )m
# Heap new generation
#ES_HEAP_NEWSIZE=
# max direct memory
#ES_DIRECT_SIZE=
# Additional Java OPTS
#ES_JAVA_OPTS=
# Maximum number of open files
MAX_OPEN_FILES=65535
# Maximum amount of locked memory
MAX_LOCKED_MEMORY=unlimited
# Maximum number of VMA (Virtual Memory Areas) a process can own
MAX_MAP_COUNT=262144
# Elasticsearch log directory
LOG_DIR=/var/log/elasticsearch
# Elasticsearch data directory
DATA_DIR="$ES_HOME/data"
# Elasticsearch work directory
WORK_DIR="$APP_DIR/tmp/elasticsearch"
# Elasticsearch conf directory
CONF_DIR="$ES_HOME/config"
# Elasticsearch configuration file (elasticsearch.yml)
CONF_FILE="$ES_HOME/config/elasticsearch.yml"
# User to run as, change this to a specific elasticsearch user if possible
# Also make sure, this user can write into the log directories in case you change them
# This setting only works for the init script, but has to be configured separately for systemd startup
ES_USER=nagios
ES_GROUP=nagios
# Configure restart on package upgrade (true, every other setting will lead to not restarting)
#RESTART_ON_UPGRADE=true
if [ "x$1" == "xstart" -o "x$1" == "xrestart" -o "x$1" == "xreload" -o "x$1" == "xforce-reload" ];then
GET_ES_CONFIG_MESSAGE="$( php $APP_DIR/scripts/get_es_config.php )"
GET_ES_CONFIG_RETURN=$?
if [ "$GET_ES_CONFIG_RETURN" != "0" ]; then
echo $GET_ES_CONFIG_MESSAGE
exit 1
else
ES_JAVA_OPTS="$GET_ES_CONFIG_MESSAGE"
fi
fi
[root@logserver01 elasticsearch]#
Re: Recently downloaded trial Nagios LogServer crashes
How much memory is in your server? I assume you'll need between 8 and 16 GB of memory to handle the load of 2.2GB daily logs.
When your server 'crashes', what are the symptoms?
Your elasticsearch log looks quite normal (or at least I don't see any obvious indication of a crash). Let's also see your logstash log:
When your server 'crashes', what are the symptoms?
Your elasticsearch log looks quite normal (or at least I don't see any obvious indication of a crash). Let's also see your logstash log:
Code: Select all
cat /var/log/logstash/logstash.log
Re: Recently downloaded trial Nagios LogServer crashes
logserver guest is assigned 6GB ram. We can assign more if needed, we just haven't seen an indication.
As far as symptoms, the server console is at the screen as previously attached and the logserver guest is otherwise non-responsive.
Log result is:
[root@logserver01 elasticsearch]# cat /var/log/logstash/logstash.log
[root@logserver01 elasticsearch]#
As far as symptoms, the server console is at the screen as previously attached and the logserver guest is otherwise non-responsive.
Log result is:
[root@logserver01 elasticsearch]# cat /var/log/logstash/logstash.log
[root@logserver01 elasticsearch]#
Re: Recently downloaded trial Nagios LogServer crashes
I think it's a good idea to increase the amount of RAM allocated to the box as a test. Are you capable of giving the box 16GB of RAM? If so, please do so.
Once the RAM has been allocated and the box has been restarted, wait to see if it crashes once more. If so, immediately collect the logs mentioned below:
In the meantime, I'd like to take a look at some of your rotated logs. Please create some .tar.gz archives and send the resulting files to me:
Once the RAM has been allocated and the box has been restarted, wait to see if it crashes once more. If so, immediately collect the logs mentioned below:
Code: Select all
cat /var/log/logstash/logstash.log
cat /var/log/elasticsearch/*.log
Code: Select all
tar zcf /tmp/elasticsearch.tar.gz /var/log/elasticsearch/*
tar zcf /tmp/logstash.tar.gz /var/log/logstash/*
Re: Recently downloaded trial Nagios LogServer crashes
Bumped memory from 6GB to 16GB to see if it makes a difference. Requested logs updated. Will respond back after a couple days of running or next event. Thank you.
You do not have the required permissions to view the files attached to this post.
Re: Recently downloaded trial Nagios LogServer crashes
Sounds good - let us know what you find out. Thanks!
Re: Recently downloaded trial Nagios LogServer crashes
Looks like insufficient memory was to blame. After allocating more memory, we have not had a recurrence where previously it was failing nightly. Thank you for the help!