Page 6 of 7

Re: Log Server does not create backups

Posted: Tue Jan 03, 2017 11:26 am
by mcapra
If you're open to submitting an email ticket to [email protected], we can schedule a remote session to investigate this more in-depth.

If that's not an option, can you share the output of the following commands executed from the CLI of one of your NLS instances:

Code: Select all

df -h
free -m
curl -s 'localhost:9200/_snapshot/<my_repository_name>'
Replacing my_repository_name with the name of your backup repository.

Re: Log Server does not create backups

Posted: Thu Jan 05, 2017 12:45 pm
by vmesquita
Scheduling a remote session is a bit complicated due to our current infrastructure, so if we could solve without this it would be better... Here's the ouput

Code: Select all

# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
                      912G  515G  351G  60% /
tmpfs                 7.8G     0  7.8G   0% /dev/shm
/dev/sda1             477M   64M  388M  15% /boot
//<share server name>/logserver     1.0T   52G  973G   6% /var/backup
# free -m
             total       used       free     shared    buffers     cached
Mem:         15933      15637        296          0        184       5353
-/+ buffers/cache:      10098       5835
Swap:         4095         33       4062
# curl -s 'localhost:9200/_snapshot/Backup'
{"Backup":{"type":"fs","settings":{"compress":"true","location":"/var/backup"}}}[

Re: Log Server does not create backups

Posted: Thu Jan 05, 2017 5:55 pm
by mcapra
Can I see the outputs of the following command:

Code: Select all

curl -s 'localhost:9200/_snapshot/Backup/_all'
And the contents of the following files:

Code: Select all

/etc/init.d/logstash
/etc/sysconfig/elasticsearch

Re: Log Server does not create backups

Posted: Fri Jan 06, 2017 12:09 pm
by vmesquita
Here it is

Code: Select all

# curl -s 'localhost:9200/_snapshot/Backup/_all'
{"error":"ElasticsearchParseException[Failed to derive xcontent from (offset=0, length=0): []]","status":400}

# cat /etc/init.d/logstash
#! /bin/sh
#
#       /etc/rc.d/init.d/logstash
#
#       Starts Logstash as a daemon
#
# chkconfig: 2345 90 10
# description: Starts Logstash as a daemon.

### BEGIN INIT INFO
# Provides: logstash
# Required-Start: $local_fs $remote_fs
# Required-Stop: $local_fs $remote_fs
# Default-Start: 2 3 4 5
# Default-Stop: S 0 1 6
# Short-Description: Logstash
# Description: Starts Logstash as a daemon.
### END INIT INFO

. /etc/rc.d/init.d/functions

NAME=logstash
DESC="Logstash Daemon"
DEFAULT=/etc/sysconfig/$NAME

if [ `id -u` -ne 0 ]; then
   echo "You need root privileges to run this script"
   exit 1
fi

# The following variables can be overwritten in $DEFAULT
PATH=/bin:/usr/bin:/sbin:/usr/sbin

# See contents of file named in $DEFAULT for comments
LS_USER=logstash
LS_GROUP=logstash
LS_HOME=/usr/local/nagioslogserver
LS_HEAP_SIZE="500m"
LS_JAVA_OPTS="-Djava.io.tmpdir=${LS_HOME}/tmp"
LS_LOG_FILE=/var/log/logstash/$NAME.log
LS_CONF_DIR=/etc/logstash/conf.d
LS_OPEN_FILES=16384
LS_NICE=19
LS_OPTS=""
LS_PIDFILE=/var/run/$NAME/$NAME.pid
LS_PIDDIR=/var/run/$NAME

# End of variables that can be overwritten in $DEFAULT

if [ -f "$DEFAULT" ]; then
  . "$DEFAULT"
fi

# Define other required variables
PID_FILE=${LS_PIDFILE}

# Make sure the pid directory actually exists (CentOS/RHEL 7)
if [ ! -d $LS_PIDDIR ]; then
   install -d -m 0755 -o $LS_USER -g $LS_USER $LS_PIDDIR
fi

DAEMON="$LS_HOME/bin/logstash"
DAEMON_OPTS="agent -f ${LS_CONF_DIR} -l ${LS_LOG_FILE} ${LS_OPTS}"

#
# Function that starts the daemon/service
#
do_start()
{

  if [ -z "$DAEMON" ]; then
    echo "not found - $DAEMON"
    exit 1
  fi

  if pidofproc -p "$PID_FILE" >/dev/null; then
    failure
    exit 99
  fi

  # Prepare environment
  HOME="${HOME:-$LS_HOME}"
  JAVA_OPTS="${LS_JAVA_OPTS}"
  ulimit -n ${LS_OPEN_FILES}
  cd "${LS_HOME}"
  export PATH HOME JAVA_OPTS LS_HEAP_SIZE LS_JAVA_OPTS LS_USE_GC_LOGGING
  test -n "${JAVACMD}" && export JAVACMD

  nice -n ${LS_NICE} runuser -s /bin/sh -c "exec $DAEMON $DAEMON_OPTS" ${LS_USER} > /dev/null 1>&1 < /dev/null &

  RETVAL=$?
  local PID=$!
  # runuser forks rather than execing our process.
  usleep 500000
  JAVA_PID=$(ps axo ppid,pid | awk -v "ppid=$PID" '$1==ppid {print $2}')
  PID=${JAVA_PID:-$PID}
  echo $PID > $PID_FILE
  chown $LS_USER:$LS_GROUP $PID_FILE
  [ $PID = $JAVA_PID ] && success
}

#
# Function that stops the daemon/service
#
do_stop()
{
    killproc -p $PID_FILE $DAEMON
    RETVAL=$?
    echo
    [ $RETVAL = 0 ] && rm -f ${PID_FILE}
}

case "$1" in
  start)
    echo -n "Starting $DESC: "
    do_start
    touch /var/run/logstash/$NAME
    ;;
  stop)
    echo -n "Stopping $DESC: "
    do_stop
    rm /var/run/logstash/$NAME
    ;;
  restart|reload)
    echo -n "Restarting $DESC: "
    do_stop
    do_start
    ;;
  status)
    echo -n "$DESC"
    status -p $PID_FILE
    exit $?
    ;;
  *)
    echo "Usage: $SCRIPTNAME {start|stop|status|restart}" >&2
    exit 3
    ;;
esac

echo
exit 0


# cat /etc/sysconfig/elasticsearch
# Directory where the Elasticsearch binary distribution resides
APP_DIR="/usr/local/nagioslogserver"
ES_HOME="$APP_DIR/elasticsearch"

# Heap Size (defaults to 256m min, 1g max)
# Nagios Log Server Default to 0.5 physical Memory
ES_HEAP_SIZE=$(expr $(free -m|awk '/^Mem:/{print $2}') / 2 )m

# Heap new generation
#ES_HEAP_NEWSIZE=

# max direct memory
#ES_DIRECT_SIZE=

# Additional Java OPTS
#ES_JAVA_OPTS=

# Maximum number of open files
MAX_OPEN_FILES=65535

# Maximum amount of locked memory
MAX_LOCKED_MEMORY=unlimited

# Maximum number of VMA (Virtual Memory Areas) a process can own
MAX_MAP_COUNT=262144

# Elasticsearch log directory
LOG_DIR=/var/log/elasticsearch

# Elasticsearch data directory
DATA_DIR="$ES_HOME/data"

# Elasticsearch work directory
WORK_DIR="$APP_DIR/tmp/elasticsearch"

# Elasticsearch conf directory
CONF_DIR="$ES_HOME/config"

# Elasticsearch configuration file (elasticsearch.yml)
CONF_FILE="$ES_HOME/config/elasticsearch.yml"

# User to run as, change this to a specific elasticsearch user if possible
# Also make sure, this user can write into the log directories in case you change them
# This setting only works for the init script, but has to be configured separately for systemd startup
ES_USER=nagios
ES_GROUP=nagios

# Configure restart on package upgrade (true, every other setting will lead to not restarting)
#RESTART_ON_UPGRADE=true

if [ "x$1" == "xstart" -o "x$1" == "xrestart" -o "x$1" == "xreload" -o "x$1" == "xforce-reload" ];then
        GET_ES_CONFIG_MESSAGE="$( php $APP_DIR/scripts/get_es_config.php )"
        GET_ES_CONFIG_RETURN=$?

        if [ "$GET_ES_CONFIG_RETURN" != "0" ]; then
                echo $GET_ES_CONFIG_MESSAGE
                exit 1
        else
                ES_JAVA_OPTS="$GET_ES_CONFIG_MESSAGE"
        fi
fi

Re: Log Server does not create backups

Posted: Fri Jan 06, 2017 1:37 pm
by mcapra
Strange, is Backup the name of the repository you are currently sending backups to? It looks as if there's some corrupted data in there.

I also notice the memory usage on this machine is incredibly high based on the previous output of free -m. Can you share the output of ps -aef?

Re: Log Server does not create backups

Posted: Fri Jan 06, 2017 2:30 pm
by vmesquita
"Backup" is indeed the name of the repository. I tried deleting all the files there and running a manual backup, it works fine but when the time for the scheduled backup comes up, it doesn't work... Is there a better way of cleaning up the repository other than deleting all files?

The output

Code: Select all

UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0  2016 ?        00:00:00 /sbin/init
root         2     0  0  2016 ?        00:00:00 [kthreadd]
root         3     2  0  2016 ?        00:00:01 [migration/0]
root         4     2  0  2016 ?        00:00:10 [ksoftirqd/0]
root         5     2  0  2016 ?        00:00:00 [stopper/0]
root         6     2  0  2016 ?        00:00:01 [watchdog/0]
root         7     2  0  2016 ?        00:00:02 [migration/1]
root         8     2  0  2016 ?        00:00:00 [stopper/1]
root         9     2  0  2016 ?        00:00:11 [ksoftirqd/1]
root        10     2  0  2016 ?        00:00:00 [watchdog/1]
root        11     2  0  2016 ?        00:00:02 [migration/2]
root        12     2  0  2016 ?        00:00:00 [stopper/2]
root        13     2  0  2016 ?        00:00:11 [ksoftirqd/2]
root        14     2  0  2016 ?        00:00:00 [watchdog/2]
root        15     2  0  2016 ?        00:00:01 [migration/3]
root        16     2  0  2016 ?        00:00:00 [stopper/3]
root        17     2  0  2016 ?        00:00:07 [ksoftirqd/3]
root        18     2  0  2016 ?        00:00:00 [watchdog/3]
root        19     2  0  2016 ?        00:00:38 [events/0]
root        20     2  0  2016 ?        00:00:30 [events/1]
root        21     2  0  2016 ?        00:00:31 [events/2]
root        22     2  0  2016 ?        00:00:45 [events/3]
root        23     2  0  2016 ?        00:00:00 [events/0]
root        24     2  0  2016 ?        00:00:00 [events/1]
root        25     2  0  2016 ?        00:00:00 [events/2]
root        26     2  0  2016 ?        00:00:00 [events/3]
root        27     2  0  2016 ?        00:00:00 [events_long/0]
root        28     2  0  2016 ?        00:00:00 [events_long/1]
root        29     2  0  2016 ?        00:00:00 [events_long/2]
root        30     2  0  2016 ?        00:00:00 [events_long/3]
root        31     2  0  2016 ?        00:00:00 [events_power_ef]
root        32     2  0  2016 ?        00:00:00 [events_power_ef]
root        33     2  0  2016 ?        00:00:00 [events_power_ef]
root        34     2  0  2016 ?        00:00:00 [events_power_ef]
root        35     2  0  2016 ?        00:00:00 [cgroup]
root        36     2  0  2016 ?        00:00:00 [khelper]
root        37     2  0  2016 ?        00:00:00 [netns]
root        38     2  0  2016 ?        00:00:00 [async/mgr]
root        39     2  0  2016 ?        00:00:00 [pm]
root        40     2  0  2016 ?        00:00:02 [sync_supers]
root        41     2  0  2016 ?        00:00:02 [bdi-default]
root        42     2  0  2016 ?        00:00:00 [kintegrityd/0]
root        43     2  0  2016 ?        00:00:00 [kintegrityd/1]
root        44     2  0  2016 ?        00:00:00 [kintegrityd/2]
root        45     2  0  2016 ?        00:00:00 [kintegrityd/3]
root        46     2  0  2016 ?        00:00:04 [kblockd/0]
root        47     2  0  2016 ?        00:00:03 [kblockd/1]
root        48     2  0  2016 ?        00:00:01 [kblockd/2]
root        49     2  0  2016 ?        00:00:39 [kblockd/3]
root        50     2  0  2016 ?        00:00:00 [kacpid]
root        51     2  0  2016 ?        00:00:00 [kacpi_notify]
root        52     2  0  2016 ?        00:00:00 [kacpi_hotplug]
root        53     2  0  2016 ?        00:00:00 [ata_aux]
root        54     2  0  2016 ?        00:00:00 [ata_sff/0]
root        55     2  0  2016 ?        00:00:00 [ata_sff/1]
root        56     2  0  2016 ?        00:00:00 [ata_sff/2]
root        57     2  0  2016 ?        00:00:00 [ata_sff/3]
root        58     2  0  2016 ?        00:00:00 [ksuspend_usbd]
root        59     2  0  2016 ?        00:00:00 [khubd]
root        60     2  0  2016 ?        00:00:00 [kseriod]
root        61     2  0  2016 ?        00:00:00 [md/0]
root        62     2  0  2016 ?        00:00:00 [md/1]
root        63     2  0  2016 ?        00:00:00 [md/2]
root        64     2  0  2016 ?        00:00:00 [md/3]
root        65     2  0  2016 ?        00:00:00 [md_misc/0]
root        66     2  0  2016 ?        00:00:00 [md_misc/1]
root        67     2  0  2016 ?        00:00:00 [md_misc/2]
root        68     2  0  2016 ?        00:00:00 [md_misc/3]
root        69     2  0  2016 ?        00:00:00 [linkwatch]
root        70     2  0  2016 ?        00:00:00 [khungtaskd]
root        71     2  0  2016 ?        00:00:12 [kswapd0]
root        72     2  0  2016 ?        00:00:00 [ksmd]
root        73     2  0  2016 ?        00:00:44 [khugepaged]
root        74     2  0  2016 ?        00:00:00 [aio/0]
root        75     2  0  2016 ?        00:00:00 [aio/1]
root        76     2  0  2016 ?        00:00:00 [aio/2]
root        77     2  0  2016 ?        00:00:00 [aio/3]
root        78     2  0  2016 ?        00:00:00 [crypto/0]
root        79     2  0  2016 ?        00:00:00 [crypto/1]
root        80     2  0  2016 ?        00:00:00 [crypto/2]
root        81     2  0  2016 ?        00:00:00 [crypto/3]
root        88     2  0  2016 ?        00:00:00 [kthrotld/0]
root        89     2  0  2016 ?        00:00:00 [kthrotld/1]
root        90     2  0  2016 ?        00:00:00 [kthrotld/2]
root        91     2  0  2016 ?        00:00:00 [kthrotld/3]
root        93     2  0  2016 ?        00:00:00 [kpsmoused]
root        94     2  0  2016 ?        00:00:00 [usbhid_resumer]
root        95     2  0  2016 ?        00:00:00 [deferwq]
root       128     2  0  2016 ?        00:00:00 [kdmremove]
root       129     2  0  2016 ?        00:00:00 [kstriped]
root       335     2  0  2016 ?        00:00:00 [scsi_eh_0]
root       336     2  0  2016 ?        00:00:00 [scsi_eh_1]
root       338     2  0  2016 ?        00:00:00 [scsi_eh_2]
root       339     2  0  2016 ?        00:00:00 [scsi_eh_3]
root       364     2  0  2016 ?        00:00:00 [scsi_eh_4]
root       390     2  0  2016 ?        00:00:00 [scsi_eh_5]
root       391     2  0  2016 ?        00:00:00 [usb-storage]
root       456     2  0  2016 ?        00:00:00 [kdmflush]
root       458     2  0  2016 ?        00:00:00 [kdmflush]
root       476     2  0  2016 ?        00:02:16 [jbd2/dm-0-8]
root       477     2  0  2016 ?        00:00:00 [ext4-dio-unwrit]
root       577     1  0  2016 ?        00:00:00 /sbin/udevd -d
root       727     2  0  2016 ?        00:00:00 [edac-poller]
root       958     2  0  2016 ?        00:00:00 [kipmi0]
root      1074     2  0  2016 ?        00:03:03 [flush-253:0]
root      1077     2  0  2016 ?        00:00:00 [jbd2/sda1-8]
root      1078     2  0  2016 ?        00:00:00 [ext4-dio-unwrit]
root      1123     2  0  2016 ?        00:00:28 [kauditd]
root      1470     1  0  2016 ?        00:01:15 auditd
root      1504     1  0  2016 ?        00:00:49 /sbin/rsyslogd -i /var/run/syslogd.pid -c 5
root      1540     1  0  2016 ?        00:01:01 irqbalance --pid=/var/run/irqbalance.pid
rpc       1558     1  0  2016 ?        00:00:00 rpcbind
root      1576     1  0  2016 ?        00:00:44 /usr/sbin/sssd -f -D
root      1577  1576  0  2016 ?        00:01:09 /usr/libexec/sssd/sssd_be --domain selic.bc --uid 0 --gid 0 --debug-to-files
root      1578  1576  0  2016 ?        00:01:39 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files
root      1579  1576  0  2016 ?        00:00:39 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files
rpcuser   1601     1  0  2016 ?        00:00:00 rpc.statd
root      1636     2  0  2016 ?        00:00:00 [rpciod/0]
root      1637     2  0  2016 ?        00:00:00 [rpciod/1]
root      1638     2  0  2016 ?        00:00:00 [rpciod/2]
root      1639     2  0  2016 ?        00:00:00 [rpciod/3]
root      1644     1  0  2016 ?        00:00:00 rpc.idmapd
dbus      1671     1  0  2016 ?        00:00:00 dbus-daemon --system
root      1692     1  0  2016 ?        00:00:00 cupsd -C /etc/cups/cupsd.conf
root      1718     2  0  2016 ?        00:00:00 [cifsiod]
root      1719     2  0  2016 ?        00:00:00 [kslowd000]
root      1720     2  0  2016 ?        00:00:00 [kslowd001]
root      1753     1  0  2016 ?        00:00:00 /usr/sbin/acpid
68        1765     1  0  2016 ?        00:00:04 hald
root      1766  1765  0  2016 ?        00:00:00 hald-runner
root      1798  1766  0  2016 ?        00:00:00 hald-addon-input: Listening on /dev/input/event3 /dev/input/event2 /dev/input/event0
68        1812  1766  0  2016 ?        00:00:00 hald-addon-acpi: listening on acpid socket /var/run/acpid.socket
root      1834     1  0  2016 ?        00:00:15 automount --pid-file /var/run/autofs.pid
root      1944     1  0  2016 ?        00:00:00 /usr/sbin/mcelog --daemon
root      1961     1  0  2016 ?        00:00:00 /usr/sbin/sshd
ntp       1972     1  0  2016 ?        00:00:02 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
root      2024     1  0  2016 ?        00:00:18 sendmail: accepting connections
smmsp     2033     1  0  2016 ?        00:00:00 sendmail: Queue runner@01:00:00 for /var/spool/clientmqueue
root      2047     1  0  2016 ?        00:00:00 /usr/sbin/abrtd
root      2066     1  0  2016 ?        00:00:23 /usr/sbin/httpd
root      2078     1  0  2016 ?        00:00:09 crond
apache    2122  2066  0  2016 ?        00:00:15 /usr/sbin/httpd
apache    2123  2066  0  2016 ?        00:00:15 /usr/sbin/httpd
apache    2124  2066  0  2016 ?        00:00:15 /usr/sbin/httpd
apache    2125  2066  0  2016 ?        00:00:15 /usr/sbin/httpd
apache    2126  2066  0  2016 ?        00:00:16 /usr/sbin/httpd
apache    2127  2066  0  2016 ?        00:00:15 /usr/sbin/httpd
apache    2128  2066  0  2016 ?        00:00:15 /usr/sbin/httpd
apache    2129  2066  0  2016 ?        00:00:15 /usr/sbin/httpd
root      2156     1  0  2016 ?        00:00:00 /usr/sbin/atd
root      2169     1  0  2016 ?        00:00:00 rhnsd
root      2180     1  0  2016 ?        00:00:00 /usr/bin/rhsmcertd
root      2192     1  0  2016 ?        00:00:00 /usr/sbin/oddjobd -p /var/run/oddjobd.pid -t 300
root      2206     1  0  2016 tty1     00:00:00 /sbin/mingetty /dev/tty1
root      2208     1  0  2016 tty2     00:00:00 /sbin/mingetty /dev/tty2
root      2210     1  0  2016 tty3     00:00:00 /sbin/mingetty /dev/tty3
root      2212     1  0  2016 tty4     00:00:00 /sbin/mingetty /dev/tty4
root      2214     1  0  2016 tty5     00:00:00 /sbin/mingetty /dev/tty5
root      2216     1  0  2016 tty6     00:00:00 /sbin/mingetty /dev/tty6
root      2218   577  0  2016 ?        00:00:00 /sbin/udevd -d
root      2219   577  0  2016 ?        00:00:00 /sbin/udevd -d
apache    9816  2066  0  2016 ?        00:00:08 /usr/sbin/httpd
apache    9817  2066  0  2016 ?        00:00:07 /usr/sbin/httpd
apache    9818  2066  0  2016 ?        00:00:08 /usr/sbin/httpd
nagios   10252     1 10  2016 ?        17:30:52 /usr/bin/java -Xms7966m -Xmx7966m -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiating
root     11847  1961  0 Jan05 ?        00:00:01 sshd: root@pts/1
root     11850 11847  0 Jan05 pts/1    00:00:00 -bash
apache   12635  2066  0  2016 ?        00:00:12 /usr/sbin/httpd
root     20248  2078  0 17:29 ?        00:00:00 CROND
root     20249  2078  0 17:29 ?        00:00:00 CROND
nagios   20250 20248  0 17:29 ?        00:00:00 /bin/sh -c /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs > /usr/local/nagioslogserver/var/jobs.log
nagios   20251 20249  0 17:29 ?        00:00:00 /bin/sh -c /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller > /usr/local/nagioslogserver/var/poller
nagios   20252 20250  0 17:29 ?        00:00:00 /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs
nagios   20253 20251  0 17:29 ?        00:00:00 /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller
root     20302 11850  0 17:29 pts/1    00:00:00 ps -aef
root     22426     1  0 Jan02 ?        00:00:00 runuser -s /bin/sh -c exec /usr/local/nagioslogserver/logstash/bin/logstash agent -f /usr/local/nagioslogserver/logst
root     22428 22426 12 Jan02 ?        11:43:42 /usr/bin/java -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -Xmx500
root     29054     2  0  2016 ?        00:00:04 [cifsd]

Re: Log Server does not create backups

Posted: Fri Jan 06, 2017 3:00 pm
by mcapra
when the time for the scheduled backup comes up, it doesn't work
When you say it doesn't work, does it create the same GUI problems with things loading slowly? Does the snapshot appear and is listed as failed? Does the job simply never run?

I would suggest enabling trace logging for the entire snapshot process. This should allow the Elasticsearch logs (/var/log/elasticsearch) to give us richer information about what's happening with snapshots. You can this with the following command:

Code: Select all

curl -XPUT 'http://localhost:9200/_cluster/settings' -d '{"transient":{"logger.snapshots":"TRACE"}}'
If this is a multi-node environment, please PM me the entire contents of /var/log/elasticsearch from each node after a scheduled backup has failed.

Can you also re-send the full ps -aef output? It looks like that output has been truncated:

Code: Select all

/usr/bin/java -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -Xmx500
Is there a better way of cleaning up the repository other than deleting all files?
Unfortunately, no. We would need to first identify why the data is being damaged.

Re: Log Server does not create backups

Posted: Mon Jan 09, 2017 10:20 am
by vmesquita
mcapra wrote: When you say it doesn't work, does it create the same GUI problems with things loading slowly? Does the snapshot appear and is listed as failed? Does the job simply never run?
The gui loads really slowly and when it finally shows up, the manual snapshot is gone and there's nothing there, as if no backup was ever made.
I would suggest enabling trace logging for the entire snapshot process. This should allow the Elasticsearch logs (/var/log/elasticsearch) to give us richer information about what's happening with snapshots. You can this with the following command:

Code: Select all

curl -XPUT 'http://localhost:9200/_cluster/settings' -d '{"transient":{"logger.snapshots":"TRACE"}}'
If this is a multi-node environment, please PM me the entire contents of /var/log/elasticsearch from each node after a scheduled backup has failed.
So to be clear, I should
1) Delete the contents of the backup folder
2) Run a manual backup
3) Wait until a scheduled backup happens and fail
4) PM you with the logs.

Right?
Can you also re-send the full ps -aef output? It looks like that output has been truncated:

Code: Select all

/usr/bin/java -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -Xmx500
Aparently the ps output truncates processes command-line bigger than screen width. Would you know how to change this behavior?

Re: Log Server does not create backups

Posted: Mon Jan 09, 2017 1:56 pm
by mcapra
vmesquita wrote: So to be clear, I should
1) Delete the contents of the backup folder
2) Run a manual backup
3) Wait until a scheduled backup happens and fail
4) PM you with the logs.
Yup! It looks as if the scheduled backups are hanging for some reason and that output should tell us specifically where within Elasticsearch things are getting hung up.
vmesquita wrote: Aparently the ps output truncates processes command-line bigger than screen width. Would you know how to change this behavior?
You can send the command's output to a file and share the resulting file:

Code: Select all

ps -aef > /tmp/out.txt

Re: Log Server does not create backups

Posted: Wed Jan 11, 2017 3:38 pm
by vmesquita
This time the interface didn't hang when I go to the Backup area, however still the previous manual backup disappeared, and no backup is shown there. I have attached the PS output of both servers plus the /var/log/elasticsearcg from both servers.