The snapshot state have been shown as partial for a few days.
What made them partial and how to resume it?
Snapshot state shown as partial
Re: Snapshot state shown as partial
This will happen if a primary shard of an index isn't available during the snapshot. They're usually temporary as the elasticsearch backend will work to assign unavailable/unassigned shards. You can see a list of unassigned shards with:
The 'p' or 'r' next to each shard indicates if it is a primary or redundant shard.
If it is happening frequently then there may be a resource or connectivity issue with one or more of the nodes in the cluster. If this is the case, please send me a private message with a profile from each machine.
Code: Select all
curl 'localhost:9200/_cat/shards?pretty' | grep -i unassignedIf it is happening frequently then there may be a resource or connectivity issue with one or more of the nodes in the cluster. If this is the case, please send me a private message with a profile from each machine.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
-
cdcsysadmin
- Posts: 55
- Joined: Tue Dec 04, 2018 9:52 pm
Re: Snapshot state shown as partial
Profile has been sent.
Any update?
Any update?
Re: Snapshot state shown as partial
The profiles show that all the primary shards were assigned at the time it was generated. When was the last time there was a partial snapshot?
What repository is NLS configured to use currently(Admin > System > Snapshots & Maintenance > Maintenance and Repository Settings > Repository to store snapshots in) ? It looks like the machine has multiple repos mounted, and one of them is 100% full. You can see this by running "df -h" on the command line.
You can also try running the snapshot from the command line manually which can give us more information to work with if there is a problem. It looks like
'$repository' is the name of the repo selected under the maintenance settings page.
What repository is NLS configured to use currently(Admin > System > Snapshots & Maintenance > Maintenance and Repository Settings > Repository to store snapshots in) ? It looks like the machine has multiple repos mounted, and one of them is 100% full. You can see this by running "df -h" on the command line.
You can also try running the snapshot from the command line manually which can give us more information to work with if there is a problem. It looks like
Code: Select all
/usr/local/nagioslogserver/scripts/curator.sh snapshot --repository '$repository' --ignore_unavailable indices --older-than 1 --time-unit days --timestring %Y.%m.%dAs of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
-
cdcsysadmin
- Posts: 55
- Joined: Tue Dec 04, 2018 9:52 pm
Re: Snapshot state shown as partial
There are 4 repositories.
The mountpoint I chose to store indexes was full and I alter it to another mountpoint.
New snapshot still could not be created.
Besides,when I want to housekeep the indexes which are older than 365 days in Web UI, it failed.
I just have no idea how to resume the snapshot service.
[root@nxlog02 ~]# /usr/local/nagioslogserver/scripts/curator.sh snapshot --repository '$repository' --ignore_unavailable indices --older-than 1 --time-unit days --timestring %Y.%m.%d
2021-01-18 09:55:08,127 INFO Job starting: snapshot indices
2021-01-18 09:55:08,127 WARNING Overriding default connection timeout. New timeout: 21600
2021-01-18 09:55:08,149 INFO Action snapshot will be performed on the following indices: [u'logstash-2021.01.11', u'logstash-2021.01.12', u'logstash-2021.01.13', u'logstash-2021.01.14', u'logstash-2021.01.15', u'logstash-2021.01.16', u'logstash-2021.01.17']
2021-01-18 09:55:08,970 ERROR Failed to verify all nodes have repository access.
2021-01-18 09:55:08,970 WARNING Job did not complete successfully.
The mountpoint I chose to store indexes was full and I alter it to another mountpoint.
New snapshot still could not be created.
Besides,when I want to housekeep the indexes which are older than 365 days in Web UI, it failed.
I just have no idea how to resume the snapshot service.
[root@nxlog02 ~]# /usr/local/nagioslogserver/scripts/curator.sh snapshot --repository '$repository' --ignore_unavailable indices --older-than 1 --time-unit days --timestring %Y.%m.%d
2021-01-18 09:55:08,127 INFO Job starting: snapshot indices
2021-01-18 09:55:08,127 WARNING Overriding default connection timeout. New timeout: 21600
2021-01-18 09:55:08,149 INFO Action snapshot will be performed on the following indices: [u'logstash-2021.01.11', u'logstash-2021.01.12', u'logstash-2021.01.13', u'logstash-2021.01.14', u'logstash-2021.01.15', u'logstash-2021.01.16', u'logstash-2021.01.17']
2021-01-18 09:55:08,970 ERROR Failed to verify all nodes have repository access.
2021-01-18 09:55:08,970 WARNING Job did not complete successfully.
Re: Snapshot state shown as partial
This is saying that not all nodes have access to the repositories, they need to:
Check the permissions on them and in the subdirectories.
Code: Select all
ERROR Failed to verify all nodes have repository access.Code: Select all
ls -la /full/path/to/your/mounts-
cdcsysadmin
- Posts: 55
- Joined: Tue Dec 04, 2018 9:52 pm
Re: Snapshot state shown as partial
does not seem to be permission issue
[root@nxlog02 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs 7.8G 0 7.8G 0% /dev/shm
tmpfs 7.8G 788M 7.0G 10% /run
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/mapper/centos-root 2.1T 1.4T 548G 73% /
/dev/sda1 976M 197M 713M 22% /boot
/dev/mapper/vg_01-lv_01 4.9T 4.7T 0 100% /Snapshots05
tmpfs 1.6G 0 1.6G 0% /run/user/0
10.10.110.225:/volume4/nagioslog01 11T 9.4T 1.5T 87% /Snapshots01
10.10.110.225:/volume4/nagioslog03 11T 9.4T 1.5T 87% /Snapshots03
10.10.110.225:/volume3/nagioslog04 28T 28T 74G 100% /Snapshots04
tmpfs 1.6G 0 1.6G 0% /run/user/1000
tmpfs 1.6G 0 1.6G 0% /run/user/48
...
[root@nxlog02 ~]# ls -la /Snapshots01
total 1316
drwxrwxrwx 4 nagios nagios 40960 Jan 18 20:09 .
...
[root@nxlog02 ~]# ls -la /Snapshots03
total 156
drwxrwxrwx 4 nagios nagios 4096 Jan 6 11:41 .
...
[root@nxlog02 ~]# ls -la /Snapshots04
total 1268
drwxrwxrwx 4 nagios nagios 20480 Jan 6 11:43 .
...
[root@nxlog02 ~]# ls -la /Snapshots05
total 168
drwxrwxrwx 4 nagios nagios 4096 Jan 17 20:10 .
...
[root@nxlog02 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs 7.8G 0 7.8G 0% /dev/shm
tmpfs 7.8G 788M 7.0G 10% /run
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/mapper/centos-root 2.1T 1.4T 548G 73% /
/dev/sda1 976M 197M 713M 22% /boot
/dev/mapper/vg_01-lv_01 4.9T 4.7T 0 100% /Snapshots05
tmpfs 1.6G 0 1.6G 0% /run/user/0
10.10.110.225:/volume4/nagioslog01 11T 9.4T 1.5T 87% /Snapshots01
10.10.110.225:/volume4/nagioslog03 11T 9.4T 1.5T 87% /Snapshots03
10.10.110.225:/volume3/nagioslog04 28T 28T 74G 100% /Snapshots04
tmpfs 1.6G 0 1.6G 0% /run/user/1000
tmpfs 1.6G 0 1.6G 0% /run/user/48
...
[root@nxlog02 ~]# ls -la /Snapshots01
total 1316
drwxrwxrwx 4 nagios nagios 40960 Jan 18 20:09 .
...
[root@nxlog02 ~]# ls -la /Snapshots03
total 156
drwxrwxrwx 4 nagios nagios 4096 Jan 6 11:41 .
...
[root@nxlog02 ~]# ls -la /Snapshots04
total 1268
drwxrwxrwx 4 nagios nagios 20480 Jan 6 11:43 .
...
[root@nxlog02 ~]# ls -la /Snapshots05
total 168
drwxrwxrwx 4 nagios nagios 4096 Jan 17 20:10 .
...
Re: Snapshot state shown as partial
What repository is NLS configured to use currently(Admin > System > Snapshots & Maintenance > Maintenance and Repository Settings > Repository to store snapshots in) ?
If Snapshots05 is selected then this would be the command to run:
If Snapshots05 is selected then this would be the command to run:
Code: Select all
/usr/local/nagioslogserver/scripts/curator.sh snapshot --repository 'Snapshots05' --ignore_unavailable indices --older-than 1 --time-unit days --timestring %Y.%m.%dAs of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.