Snapshot state shown as partial

cdcsysadmin · Post by **cdcsysadmin** » Mon Jan 11, 2021 9:10 pm

The snapshot state have been shown as partial for a few days.
What made them partial and how to resume it?

Post by **cdienger** » Tue Jan 12, 2021 4:22 pm

This will happen if a primary shard of an index isn't available during the snapshot. They're usually temporary as the elasticsearch backend will work to assign unavailable/unassigned shards. You can see a list of unassigned shards with:

Code: Select all

curl 'localhost:9200/_cat/shards?pretty' | grep -i unassigned

The 'p' or 'r' next to each shard indicates if it is a primary or redundant shard.

If it is happening frequently then there may be a resource or connectivity issue with one or more of the nodes in the cluster. If this is the case, please send me a private message with a profile from each machine.

cdcsysadmin · Post by **cdcsysadmin** » Thu Jan 14, 2021 8:50 pm

Profile has been sent.
Any update?

Post by **cdienger** » Fri Jan 15, 2021 5:03 pm

The profiles show that all the primary shards were assigned at the time it was generated. When was the last time there was a partial snapshot?

What repository is NLS configured to use currently(Admin > System > Snapshots & Maintenance > Maintenance and Repository Settings > Repository to store snapshots in) ? It looks like the machine has multiple repos mounted, and one of them is 100% full. You can see this by running "df -h" on the command line.

You can also try running the snapshot from the command line manually which can give us more information to work with if there is a problem. It looks like

Code: Select all

/usr/local/nagioslogserver/scripts/curator.sh snapshot --repository '$repository' --ignore_unavailable indices --older-than 1 --time-unit days --timestring %Y.%m.%d

'$repository' is the name of the repo selected under the maintenance settings page.

cdcsysadmin · Post by **cdcsysadmin** » Sun Jan 17, 2021 9:05 pm

There are 4 repositories.
The mountpoint I chose to store indexes was full and I alter it to another mountpoint.
New snapshot still could not be created.
Besides,when I want to housekeep the indexes which are older than 365 days in Web UI, it failed.
I just have no idea how to resume the snapshot service.

[root@nxlog02 ~]# /usr/local/nagioslogserver/scripts/curator.sh snapshot --repository '$repository' --ignore_unavailable indices --older-than 1 --time-unit days --timestring %Y.%m.%d
2021-01-18 09:55:08,127 INFO Job starting: snapshot indices
2021-01-18 09:55:08,127 WARNING Overriding default connection timeout. New timeout: 21600
2021-01-18 09:55:08,149 INFO Action snapshot will be performed on the following indices: [u'logstash-2021.01.11', u'logstash-2021.01.12', u'logstash-2021.01.13', u'logstash-2021.01.14', u'logstash-2021.01.15', u'logstash-2021.01.16', u'logstash-2021.01.17']
2021-01-18 09:55:08,970 ERROR Failed to verify all nodes have repository access.
2021-01-18 09:55:08,970 WARNING Job did not complete successfully.

ssax · Post by **ssax** » Mon Jan 18, 2021 6:39 pm

This is saying that not all nodes have access to the repositories, they need to:

Code: Select all

ERROR Failed to verify all nodes have repository access.

Check the permissions on them and in the subdirectories.

Code: Select all

ls -la /full/path/to/your/mounts

cdcsysadmin · Post by **cdcsysadmin** » Mon Jan 18, 2021 8:40 pm

does not seem to be permission issue

[root@nxlog02 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs 7.8G 0 7.8G 0% /dev/shm
tmpfs 7.8G 788M 7.0G 10% /run
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/mapper/centos-root 2.1T 1.4T 548G 73% /
/dev/sda1 976M 197M 713M 22% /boot
/dev/mapper/vg_01-lv_01 4.9T 4.7T 0 100% /Snapshots05
tmpfs 1.6G 0 1.6G 0% /run/user/0
10.10.110.225:/volume4/nagioslog01 11T 9.4T 1.5T 87% /Snapshots01
10.10.110.225:/volume4/nagioslog03 11T 9.4T 1.5T 87% /Snapshots03
10.10.110.225:/volume3/nagioslog04 28T 28T 74G 100% /Snapshots04
tmpfs 1.6G 0 1.6G 0% /run/user/1000
tmpfs 1.6G 0 1.6G 0% /run/user/48
...
[root@nxlog02 ~]# ls -la /Snapshots01
total 1316
drwxrwxrwx 4 nagios nagios 40960 Jan 18 20:09 .
...
[root@nxlog02 ~]# ls -la /Snapshots03
total 156
drwxrwxrwx 4 nagios nagios 4096 Jan 6 11:41 .
...
[root@nxlog02 ~]# ls -la /Snapshots04
total 1268
drwxrwxrwx 4 nagios nagios 20480 Jan 6 11:43 .
...
[root@nxlog02 ~]# ls -la /Snapshots05
total 168
drwxrwxrwx 4 nagios nagios 4096 Jan 17 20:10 .
...

Post by **cdienger** » Tue Jan 19, 2021 5:01 pm

What repository is NLS configured to use currently(Admin > System > Snapshots & Maintenance > Maintenance and Repository Settings > Repository to store snapshots in) ?

If Snapshots05 is selected then this would be the command to run:

Code: Select all

/usr/local/nagioslogserver/scripts/curator.sh snapshot --repository 'Snapshots05' --ignore_unavailable indices --older-than 1 --time-unit days --timestring %Y.%m.%d

Nagios Support Forum

Snapshot state shown as partial

Snapshot state shown as partial

Re: Snapshot state shown as partial

Re: Snapshot state shown as partial

Re: Snapshot state shown as partial

Re: Snapshot state shown as partial

Re: Snapshot state shown as partial

Re: Snapshot state shown as partial

Re: Snapshot state shown as partial