Time to start debugging. Let us begin modifying the curator.sh script on all nodes (since this kind of subsystem job may run on any node) after making a copy.
Code: Select all
cp -p /usr/local/nagioslogserver/scripts/curator.sh{,.ORG}Code: Select all
[root@nls143 ~]# cat /usr/local/nagioslogserver/scripts/curator.sh
#!/bin/sh
date=$(date)
curator "$@"
echo --- >> /tmp/curatordebug.txt
echo $date >> /tmp/curatordebug.txt
echo "$@" >> /tmp/curatordebug.txt
Observe the results in /tmp/curatordebug.txt on every node. Below you'll find what I looked for using grep.
Code: Select all
[root@nls143 ~]# # grep -B1 ^snapshot /tmp/curatordebug.txt
Fri Nov 11 22:01:34 CET 2016
snapshot --repository SharedBackupRepo indices --older-than 1 --time-unit days --timestring %Y.%m.%d --ignore_unavailable
Code: Select all
[root@nls143 ~]# curator snapshot --repository SharedBackupRepo indices --older-than 1 --time-unit days --timestring %Y.%m.%d --ignore_unavailable
Error: no such option: --ignore_unavailableCode: Select all
[root@nls143 ~]# curator snapshot --help
Usage: curator snapshot [OPTIONS] COMMAND [ARGS]...
Take snapshots of indices (Backup)
Options:
--repository TEXT Repository name.
--name TEXT Override default name.
--prefix TEXT Override default prefix.
--wait_for_completion BOOLEAN Wait for snapshot to complete before
returning. [default: True]
--ignore_unavailable Ignore unavailable shards/indices.
--include_global_state BOOLEAN Store cluster global state with snapshot.
[default: True]
--partial Do not fail if primary shard is unavailable.
--request_timeout INTEGER Allow this many seconds before the
transaction times out. [default: 21600]
--skip-repo-validation Skip repository access validation.
--help Show this message and exit.
Commands:
indices Index selection.
That solves your problem! Look at this:
Code: Select all
# curator snapshot --repository SharedBackupRepo --ignore_unavailable indices --older-than 1 --time-unit days --timestring %Y.%m.%d
2016-11-11 22:18:47,869 INFO Job starting: snapshot indices
(...)
2016-11-11 22:18:50,495 INFO Snapshot name: curator-20161111211850
2016-11-11 22:18:54,476 INFO Snapshot curator-20161111211850 successfully completed.
2016-11-11 22:18:54,476 INFO Job completed successfully.Additional tip for NLS admins: use Nagios Core/XI to check snapshot backups: https://github.com/jvandermeulen/Nagios ... _backup.sh
In this particular case an UNKOWN state (UNKNOWN: Unable to determine result within last 25 hours: No snapshots matched provided args.) brought this to my attention. Default threshold is 24+1=25 hours. No backup results in UNKNOWN, failed backup results in CRITICAL, partial backup results in WARNING.
Jørgen van der Meulen