Page 6 of 7
Re: Log Server does not create backups
Posted: Tue Jan 03, 2017 11:26 am
by mcapra
If you're open to submitting an email ticket to
[email protected], we can schedule a remote session to investigate this more in-depth.
If that's not an option, can you share the output of the following commands executed from the CLI of one of your NLS instances:
Code: Select all
df -h
free -m
curl -s 'localhost:9200/_snapshot/<my_repository_name>'
Replacing my_repository_name with the name of your backup repository.
Re: Log Server does not create backups
Posted: Thu Jan 05, 2017 12:45 pm
by vmesquita
Scheduling a remote session is a bit complicated due to our current infrastructure, so if we could solve without this it would be better... Here's the ouput
Code: Select all
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
912G 515G 351G 60% /
tmpfs 7.8G 0 7.8G 0% /dev/shm
/dev/sda1 477M 64M 388M 15% /boot
//<share server name>/logserver 1.0T 52G 973G 6% /var/backup
# free -m
total used free shared buffers cached
Mem: 15933 15637 296 0 184 5353
-/+ buffers/cache: 10098 5835
Swap: 4095 33 4062
# curl -s 'localhost:9200/_snapshot/Backup'
{"Backup":{"type":"fs","settings":{"compress":"true","location":"/var/backup"}}}[
Re: Log Server does not create backups
Posted: Thu Jan 05, 2017 5:55 pm
by mcapra
Can I see the outputs of the following command:
Code: Select all
curl -s 'localhost:9200/_snapshot/Backup/_all'
And the contents of the following files:
Code: Select all
/etc/init.d/logstash
/etc/sysconfig/elasticsearch
Re: Log Server does not create backups
Posted: Fri Jan 06, 2017 12:09 pm
by vmesquita
Here it is
Code: Select all
# curl -s 'localhost:9200/_snapshot/Backup/_all'
{"error":"ElasticsearchParseException[Failed to derive xcontent from (offset=0, length=0): []]","status":400}
# cat /etc/init.d/logstash
#! /bin/sh
#
# /etc/rc.d/init.d/logstash
#
# Starts Logstash as a daemon
#
# chkconfig: 2345 90 10
# description: Starts Logstash as a daemon.
### BEGIN INIT INFO
# Provides: logstash
# Required-Start: $local_fs $remote_fs
# Required-Stop: $local_fs $remote_fs
# Default-Start: 2 3 4 5
# Default-Stop: S 0 1 6
# Short-Description: Logstash
# Description: Starts Logstash as a daemon.
### END INIT INFO
. /etc/rc.d/init.d/functions
NAME=logstash
DESC="Logstash Daemon"
DEFAULT=/etc/sysconfig/$NAME
if [ `id -u` -ne 0 ]; then
echo "You need root privileges to run this script"
exit 1
fi
# The following variables can be overwritten in $DEFAULT
PATH=/bin:/usr/bin:/sbin:/usr/sbin
# See contents of file named in $DEFAULT for comments
LS_USER=logstash
LS_GROUP=logstash
LS_HOME=/usr/local/nagioslogserver
LS_HEAP_SIZE="500m"
LS_JAVA_OPTS="-Djava.io.tmpdir=${LS_HOME}/tmp"
LS_LOG_FILE=/var/log/logstash/$NAME.log
LS_CONF_DIR=/etc/logstash/conf.d
LS_OPEN_FILES=16384
LS_NICE=19
LS_OPTS=""
LS_PIDFILE=/var/run/$NAME/$NAME.pid
LS_PIDDIR=/var/run/$NAME
# End of variables that can be overwritten in $DEFAULT
if [ -f "$DEFAULT" ]; then
. "$DEFAULT"
fi
# Define other required variables
PID_FILE=${LS_PIDFILE}
# Make sure the pid directory actually exists (CentOS/RHEL 7)
if [ ! -d $LS_PIDDIR ]; then
install -d -m 0755 -o $LS_USER -g $LS_USER $LS_PIDDIR
fi
DAEMON="$LS_HOME/bin/logstash"
DAEMON_OPTS="agent -f ${LS_CONF_DIR} -l ${LS_LOG_FILE} ${LS_OPTS}"
#
# Function that starts the daemon/service
#
do_start()
{
if [ -z "$DAEMON" ]; then
echo "not found - $DAEMON"
exit 1
fi
if pidofproc -p "$PID_FILE" >/dev/null; then
failure
exit 99
fi
# Prepare environment
HOME="${HOME:-$LS_HOME}"
JAVA_OPTS="${LS_JAVA_OPTS}"
ulimit -n ${LS_OPEN_FILES}
cd "${LS_HOME}"
export PATH HOME JAVA_OPTS LS_HEAP_SIZE LS_JAVA_OPTS LS_USE_GC_LOGGING
test -n "${JAVACMD}" && export JAVACMD
nice -n ${LS_NICE} runuser -s /bin/sh -c "exec $DAEMON $DAEMON_OPTS" ${LS_USER} > /dev/null 1>&1 < /dev/null &
RETVAL=$?
local PID=$!
# runuser forks rather than execing our process.
usleep 500000
JAVA_PID=$(ps axo ppid,pid | awk -v "ppid=$PID" '$1==ppid {print $2}')
PID=${JAVA_PID:-$PID}
echo $PID > $PID_FILE
chown $LS_USER:$LS_GROUP $PID_FILE
[ $PID = $JAVA_PID ] && success
}
#
# Function that stops the daemon/service
#
do_stop()
{
killproc -p $PID_FILE $DAEMON
RETVAL=$?
echo
[ $RETVAL = 0 ] && rm -f ${PID_FILE}
}
case "$1" in
start)
echo -n "Starting $DESC: "
do_start
touch /var/run/logstash/$NAME
;;
stop)
echo -n "Stopping $DESC: "
do_stop
rm /var/run/logstash/$NAME
;;
restart|reload)
echo -n "Restarting $DESC: "
do_stop
do_start
;;
status)
echo -n "$DESC"
status -p $PID_FILE
exit $?
;;
*)
echo "Usage: $SCRIPTNAME {start|stop|status|restart}" >&2
exit 3
;;
esac
echo
exit 0
# cat /etc/sysconfig/elasticsearch
# Directory where the Elasticsearch binary distribution resides
APP_DIR="/usr/local/nagioslogserver"
ES_HOME="$APP_DIR/elasticsearch"
# Heap Size (defaults to 256m min, 1g max)
# Nagios Log Server Default to 0.5 physical Memory
ES_HEAP_SIZE=$(expr $(free -m|awk '/^Mem:/{print $2}') / 2 )m
# Heap new generation
#ES_HEAP_NEWSIZE=
# max direct memory
#ES_DIRECT_SIZE=
# Additional Java OPTS
#ES_JAVA_OPTS=
# Maximum number of open files
MAX_OPEN_FILES=65535
# Maximum amount of locked memory
MAX_LOCKED_MEMORY=unlimited
# Maximum number of VMA (Virtual Memory Areas) a process can own
MAX_MAP_COUNT=262144
# Elasticsearch log directory
LOG_DIR=/var/log/elasticsearch
# Elasticsearch data directory
DATA_DIR="$ES_HOME/data"
# Elasticsearch work directory
WORK_DIR="$APP_DIR/tmp/elasticsearch"
# Elasticsearch conf directory
CONF_DIR="$ES_HOME/config"
# Elasticsearch configuration file (elasticsearch.yml)
CONF_FILE="$ES_HOME/config/elasticsearch.yml"
# User to run as, change this to a specific elasticsearch user if possible
# Also make sure, this user can write into the log directories in case you change them
# This setting only works for the init script, but has to be configured separately for systemd startup
ES_USER=nagios
ES_GROUP=nagios
# Configure restart on package upgrade (true, every other setting will lead to not restarting)
#RESTART_ON_UPGRADE=true
if [ "x$1" == "xstart" -o "x$1" == "xrestart" -o "x$1" == "xreload" -o "x$1" == "xforce-reload" ];then
GET_ES_CONFIG_MESSAGE="$( php $APP_DIR/scripts/get_es_config.php )"
GET_ES_CONFIG_RETURN=$?
if [ "$GET_ES_CONFIG_RETURN" != "0" ]; then
echo $GET_ES_CONFIG_MESSAGE
exit 1
else
ES_JAVA_OPTS="$GET_ES_CONFIG_MESSAGE"
fi
fi
Re: Log Server does not create backups
Posted: Fri Jan 06, 2017 1:37 pm
by mcapra
Strange, is Backup the name of the repository you are currently sending backups to? It looks as if there's some corrupted data in there.
I also notice the memory usage on this machine is incredibly high based on the previous output of free -m. Can you share the output of ps -aef?
Re: Log Server does not create backups
Posted: Fri Jan 06, 2017 2:30 pm
by vmesquita
"Backup" is indeed the name of the repository. I tried deleting all the files there and running a manual backup, it works fine but when the time for the scheduled backup comes up, it doesn't work... Is there a better way of cleaning up the repository other than deleting all files?
The output
Code: Select all
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 2016 ? 00:00:00 /sbin/init
root 2 0 0 2016 ? 00:00:00 [kthreadd]
root 3 2 0 2016 ? 00:00:01 [migration/0]
root 4 2 0 2016 ? 00:00:10 [ksoftirqd/0]
root 5 2 0 2016 ? 00:00:00 [stopper/0]
root 6 2 0 2016 ? 00:00:01 [watchdog/0]
root 7 2 0 2016 ? 00:00:02 [migration/1]
root 8 2 0 2016 ? 00:00:00 [stopper/1]
root 9 2 0 2016 ? 00:00:11 [ksoftirqd/1]
root 10 2 0 2016 ? 00:00:00 [watchdog/1]
root 11 2 0 2016 ? 00:00:02 [migration/2]
root 12 2 0 2016 ? 00:00:00 [stopper/2]
root 13 2 0 2016 ? 00:00:11 [ksoftirqd/2]
root 14 2 0 2016 ? 00:00:00 [watchdog/2]
root 15 2 0 2016 ? 00:00:01 [migration/3]
root 16 2 0 2016 ? 00:00:00 [stopper/3]
root 17 2 0 2016 ? 00:00:07 [ksoftirqd/3]
root 18 2 0 2016 ? 00:00:00 [watchdog/3]
root 19 2 0 2016 ? 00:00:38 [events/0]
root 20 2 0 2016 ? 00:00:30 [events/1]
root 21 2 0 2016 ? 00:00:31 [events/2]
root 22 2 0 2016 ? 00:00:45 [events/3]
root 23 2 0 2016 ? 00:00:00 [events/0]
root 24 2 0 2016 ? 00:00:00 [events/1]
root 25 2 0 2016 ? 00:00:00 [events/2]
root 26 2 0 2016 ? 00:00:00 [events/3]
root 27 2 0 2016 ? 00:00:00 [events_long/0]
root 28 2 0 2016 ? 00:00:00 [events_long/1]
root 29 2 0 2016 ? 00:00:00 [events_long/2]
root 30 2 0 2016 ? 00:00:00 [events_long/3]
root 31 2 0 2016 ? 00:00:00 [events_power_ef]
root 32 2 0 2016 ? 00:00:00 [events_power_ef]
root 33 2 0 2016 ? 00:00:00 [events_power_ef]
root 34 2 0 2016 ? 00:00:00 [events_power_ef]
root 35 2 0 2016 ? 00:00:00 [cgroup]
root 36 2 0 2016 ? 00:00:00 [khelper]
root 37 2 0 2016 ? 00:00:00 [netns]
root 38 2 0 2016 ? 00:00:00 [async/mgr]
root 39 2 0 2016 ? 00:00:00 [pm]
root 40 2 0 2016 ? 00:00:02 [sync_supers]
root 41 2 0 2016 ? 00:00:02 [bdi-default]
root 42 2 0 2016 ? 00:00:00 [kintegrityd/0]
root 43 2 0 2016 ? 00:00:00 [kintegrityd/1]
root 44 2 0 2016 ? 00:00:00 [kintegrityd/2]
root 45 2 0 2016 ? 00:00:00 [kintegrityd/3]
root 46 2 0 2016 ? 00:00:04 [kblockd/0]
root 47 2 0 2016 ? 00:00:03 [kblockd/1]
root 48 2 0 2016 ? 00:00:01 [kblockd/2]
root 49 2 0 2016 ? 00:00:39 [kblockd/3]
root 50 2 0 2016 ? 00:00:00 [kacpid]
root 51 2 0 2016 ? 00:00:00 [kacpi_notify]
root 52 2 0 2016 ? 00:00:00 [kacpi_hotplug]
root 53 2 0 2016 ? 00:00:00 [ata_aux]
root 54 2 0 2016 ? 00:00:00 [ata_sff/0]
root 55 2 0 2016 ? 00:00:00 [ata_sff/1]
root 56 2 0 2016 ? 00:00:00 [ata_sff/2]
root 57 2 0 2016 ? 00:00:00 [ata_sff/3]
root 58 2 0 2016 ? 00:00:00 [ksuspend_usbd]
root 59 2 0 2016 ? 00:00:00 [khubd]
root 60 2 0 2016 ? 00:00:00 [kseriod]
root 61 2 0 2016 ? 00:00:00 [md/0]
root 62 2 0 2016 ? 00:00:00 [md/1]
root 63 2 0 2016 ? 00:00:00 [md/2]
root 64 2 0 2016 ? 00:00:00 [md/3]
root 65 2 0 2016 ? 00:00:00 [md_misc/0]
root 66 2 0 2016 ? 00:00:00 [md_misc/1]
root 67 2 0 2016 ? 00:00:00 [md_misc/2]
root 68 2 0 2016 ? 00:00:00 [md_misc/3]
root 69 2 0 2016 ? 00:00:00 [linkwatch]
root 70 2 0 2016 ? 00:00:00 [khungtaskd]
root 71 2 0 2016 ? 00:00:12 [kswapd0]
root 72 2 0 2016 ? 00:00:00 [ksmd]
root 73 2 0 2016 ? 00:00:44 [khugepaged]
root 74 2 0 2016 ? 00:00:00 [aio/0]
root 75 2 0 2016 ? 00:00:00 [aio/1]
root 76 2 0 2016 ? 00:00:00 [aio/2]
root 77 2 0 2016 ? 00:00:00 [aio/3]
root 78 2 0 2016 ? 00:00:00 [crypto/0]
root 79 2 0 2016 ? 00:00:00 [crypto/1]
root 80 2 0 2016 ? 00:00:00 [crypto/2]
root 81 2 0 2016 ? 00:00:00 [crypto/3]
root 88 2 0 2016 ? 00:00:00 [kthrotld/0]
root 89 2 0 2016 ? 00:00:00 [kthrotld/1]
root 90 2 0 2016 ? 00:00:00 [kthrotld/2]
root 91 2 0 2016 ? 00:00:00 [kthrotld/3]
root 93 2 0 2016 ? 00:00:00 [kpsmoused]
root 94 2 0 2016 ? 00:00:00 [usbhid_resumer]
root 95 2 0 2016 ? 00:00:00 [deferwq]
root 128 2 0 2016 ? 00:00:00 [kdmremove]
root 129 2 0 2016 ? 00:00:00 [kstriped]
root 335 2 0 2016 ? 00:00:00 [scsi_eh_0]
root 336 2 0 2016 ? 00:00:00 [scsi_eh_1]
root 338 2 0 2016 ? 00:00:00 [scsi_eh_2]
root 339 2 0 2016 ? 00:00:00 [scsi_eh_3]
root 364 2 0 2016 ? 00:00:00 [scsi_eh_4]
root 390 2 0 2016 ? 00:00:00 [scsi_eh_5]
root 391 2 0 2016 ? 00:00:00 [usb-storage]
root 456 2 0 2016 ? 00:00:00 [kdmflush]
root 458 2 0 2016 ? 00:00:00 [kdmflush]
root 476 2 0 2016 ? 00:02:16 [jbd2/dm-0-8]
root 477 2 0 2016 ? 00:00:00 [ext4-dio-unwrit]
root 577 1 0 2016 ? 00:00:00 /sbin/udevd -d
root 727 2 0 2016 ? 00:00:00 [edac-poller]
root 958 2 0 2016 ? 00:00:00 [kipmi0]
root 1074 2 0 2016 ? 00:03:03 [flush-253:0]
root 1077 2 0 2016 ? 00:00:00 [jbd2/sda1-8]
root 1078 2 0 2016 ? 00:00:00 [ext4-dio-unwrit]
root 1123 2 0 2016 ? 00:00:28 [kauditd]
root 1470 1 0 2016 ? 00:01:15 auditd
root 1504 1 0 2016 ? 00:00:49 /sbin/rsyslogd -i /var/run/syslogd.pid -c 5
root 1540 1 0 2016 ? 00:01:01 irqbalance --pid=/var/run/irqbalance.pid
rpc 1558 1 0 2016 ? 00:00:00 rpcbind
root 1576 1 0 2016 ? 00:00:44 /usr/sbin/sssd -f -D
root 1577 1576 0 2016 ? 00:01:09 /usr/libexec/sssd/sssd_be --domain selic.bc --uid 0 --gid 0 --debug-to-files
root 1578 1576 0 2016 ? 00:01:39 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files
root 1579 1576 0 2016 ? 00:00:39 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files
rpcuser 1601 1 0 2016 ? 00:00:00 rpc.statd
root 1636 2 0 2016 ? 00:00:00 [rpciod/0]
root 1637 2 0 2016 ? 00:00:00 [rpciod/1]
root 1638 2 0 2016 ? 00:00:00 [rpciod/2]
root 1639 2 0 2016 ? 00:00:00 [rpciod/3]
root 1644 1 0 2016 ? 00:00:00 rpc.idmapd
dbus 1671 1 0 2016 ? 00:00:00 dbus-daemon --system
root 1692 1 0 2016 ? 00:00:00 cupsd -C /etc/cups/cupsd.conf
root 1718 2 0 2016 ? 00:00:00 [cifsiod]
root 1719 2 0 2016 ? 00:00:00 [kslowd000]
root 1720 2 0 2016 ? 00:00:00 [kslowd001]
root 1753 1 0 2016 ? 00:00:00 /usr/sbin/acpid
68 1765 1 0 2016 ? 00:00:04 hald
root 1766 1765 0 2016 ? 00:00:00 hald-runner
root 1798 1766 0 2016 ? 00:00:00 hald-addon-input: Listening on /dev/input/event3 /dev/input/event2 /dev/input/event0
68 1812 1766 0 2016 ? 00:00:00 hald-addon-acpi: listening on acpid socket /var/run/acpid.socket
root 1834 1 0 2016 ? 00:00:15 automount --pid-file /var/run/autofs.pid
root 1944 1 0 2016 ? 00:00:00 /usr/sbin/mcelog --daemon
root 1961 1 0 2016 ? 00:00:00 /usr/sbin/sshd
ntp 1972 1 0 2016 ? 00:00:02 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
root 2024 1 0 2016 ? 00:00:18 sendmail: accepting connections
smmsp 2033 1 0 2016 ? 00:00:00 sendmail: Queue runner@01:00:00 for /var/spool/clientmqueue
root 2047 1 0 2016 ? 00:00:00 /usr/sbin/abrtd
root 2066 1 0 2016 ? 00:00:23 /usr/sbin/httpd
root 2078 1 0 2016 ? 00:00:09 crond
apache 2122 2066 0 2016 ? 00:00:15 /usr/sbin/httpd
apache 2123 2066 0 2016 ? 00:00:15 /usr/sbin/httpd
apache 2124 2066 0 2016 ? 00:00:15 /usr/sbin/httpd
apache 2125 2066 0 2016 ? 00:00:15 /usr/sbin/httpd
apache 2126 2066 0 2016 ? 00:00:16 /usr/sbin/httpd
apache 2127 2066 0 2016 ? 00:00:15 /usr/sbin/httpd
apache 2128 2066 0 2016 ? 00:00:15 /usr/sbin/httpd
apache 2129 2066 0 2016 ? 00:00:15 /usr/sbin/httpd
root 2156 1 0 2016 ? 00:00:00 /usr/sbin/atd
root 2169 1 0 2016 ? 00:00:00 rhnsd
root 2180 1 0 2016 ? 00:00:00 /usr/bin/rhsmcertd
root 2192 1 0 2016 ? 00:00:00 /usr/sbin/oddjobd -p /var/run/oddjobd.pid -t 300
root 2206 1 0 2016 tty1 00:00:00 /sbin/mingetty /dev/tty1
root 2208 1 0 2016 tty2 00:00:00 /sbin/mingetty /dev/tty2
root 2210 1 0 2016 tty3 00:00:00 /sbin/mingetty /dev/tty3
root 2212 1 0 2016 tty4 00:00:00 /sbin/mingetty /dev/tty4
root 2214 1 0 2016 tty5 00:00:00 /sbin/mingetty /dev/tty5
root 2216 1 0 2016 tty6 00:00:00 /sbin/mingetty /dev/tty6
root 2218 577 0 2016 ? 00:00:00 /sbin/udevd -d
root 2219 577 0 2016 ? 00:00:00 /sbin/udevd -d
apache 9816 2066 0 2016 ? 00:00:08 /usr/sbin/httpd
apache 9817 2066 0 2016 ? 00:00:07 /usr/sbin/httpd
apache 9818 2066 0 2016 ? 00:00:08 /usr/sbin/httpd
nagios 10252 1 10 2016 ? 17:30:52 /usr/bin/java -Xms7966m -Xmx7966m -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiating
root 11847 1961 0 Jan05 ? 00:00:01 sshd: root@pts/1
root 11850 11847 0 Jan05 pts/1 00:00:00 -bash
apache 12635 2066 0 2016 ? 00:00:12 /usr/sbin/httpd
root 20248 2078 0 17:29 ? 00:00:00 CROND
root 20249 2078 0 17:29 ? 00:00:00 CROND
nagios 20250 20248 0 17:29 ? 00:00:00 /bin/sh -c /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log
nagios 20251 20249 0 17:29 ? 00:00:00 /bin/sh -c /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller > /usr/local/nagioslogserver/var/poller
nagios 20252 20250 0 17:29 ? 00:00:00 /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs
nagios 20253 20251 0 17:29 ? 00:00:00 /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller
root 20302 11850 0 17:29 pts/1 00:00:00 ps -aef
root 22426 1 0 Jan02 ? 00:00:00 runuser -s /bin/sh -c exec /usr/local/nagioslogserver/logstash/bin/logstash agent -f /usr/local/nagioslogserver/logst
root 22428 22426 12 Jan02 ? 11:43:42 /usr/bin/java -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -Xmx500
root 29054 2 0 2016 ? 00:00:04 [cifsd]
Re: Log Server does not create backups
Posted: Fri Jan 06, 2017 3:00 pm
by mcapra
when the time for the scheduled backup comes up, it doesn't work
When you say it doesn't work, does it create the same GUI problems with things loading slowly? Does the snapshot appear and is listed as failed? Does the job simply never run?
I would suggest enabling trace logging for the entire snapshot process. This should allow the Elasticsearch logs (/var/log/elasticsearch) to give us richer information about what's happening with snapshots. You can this with the following command:
Code: Select all
curl -XPUT 'http://localhost:9200/_cluster/settings' -d '{"transient":{"logger.snapshots":"TRACE"}}'
If this is a multi-node environment, please PM me the entire contents of /var/log/elasticsearch from each node after a scheduled backup has failed.
Can you also re-send the full
ps -aef output? It looks like that output has been truncated:
Code: Select all
/usr/bin/java -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -Xmx500
Is there a better way of cleaning up the repository other than deleting all files?
Unfortunately, no. We would need to first identify why the data is being damaged.
Re: Log Server does not create backups
Posted: Mon Jan 09, 2017 10:20 am
by vmesquita
mcapra wrote:
When you say it doesn't work, does it create the same GUI problems with things loading slowly? Does the snapshot appear and is listed as failed? Does the job simply never run?
The gui loads really slowly and when it finally shows up, the manual snapshot is gone and there's nothing there, as if no backup was ever made.
I would suggest enabling trace logging for the entire snapshot process. This should allow the Elasticsearch logs (/var/log/elasticsearch) to give us richer information about what's happening with snapshots. You can this with the following command:
Code: Select all
curl -XPUT 'http://localhost:9200/_cluster/settings' -d '{"transient":{"logger.snapshots":"TRACE"}}'
If this is a multi-node environment, please PM me the entire contents of /var/log/elasticsearch from each node after a scheduled backup has failed.
So to be clear, I should
1) Delete the contents of the backup folder
2) Run a manual backup
3) Wait until a scheduled backup happens and fail
4) PM you with the logs.
Right?
Can you also re-send the full
ps -aef output? It looks like that output has been truncated:
Code: Select all
/usr/bin/java -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -Xmx500
Aparently the ps output truncates processes command-line bigger than screen width. Would you know how to change this behavior?
Re: Log Server does not create backups
Posted: Mon Jan 09, 2017 1:56 pm
by mcapra
vmesquita wrote:
So to be clear, I should
1) Delete the contents of the backup folder
2) Run a manual backup
3) Wait until a scheduled backup happens and fail
4) PM you with the logs.
Yup! It looks as if the scheduled backups are hanging for some reason and that output should tell us specifically where within Elasticsearch things are getting hung up.
vmesquita wrote:
Aparently the ps output truncates processes command-line bigger than screen width. Would you know how to change this behavior?
You can send the command's output to a file and share the resulting file:
Re: Log Server does not create backups
Posted: Wed Jan 11, 2017 3:38 pm
by vmesquita
This time the interface didn't hang when I go to the Backup area, however still the previous manual backup disappeared, and no backup is shown there. I have attached the PS output of both servers plus the /var/log/elasticsearcg from both servers.