Page 1 of 1

Snapshot to NAS failed

Posted: Sun May 24, 2020 10:04 pm
by cdcsysadmin
I have mapped the NAS path to the log server

[root@nxlog03 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs 7.8G 460M 7.4G 6% /run
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/mapper/centos-root 5.2T 5.0T 2.6G 100% /
/dev/sda1 976M 197M 713M 22% /boot
10.10.110.225:/volume4/nagioslog01 11T 8.3T 2.6T 76% /usr/local/nagioslogse rver/snapshots01
10.10.110.225:/volume4/nagioslog03 11T 8.3T 2.6T 76% /usr/local/nagioslogse rver/snapshots03
tmpfs 1.6G 0 1.6G 0% /run/user/1000

The following error messages were shown when the snapshot to be proceeded in the NAS path:

2020-05-25 10:41:33,017 ERROR Failed to verify all nodes have repository access.
2020-05-25 10:41:33,017 WARNING Job did not complete successfully.

Re: Snapshot to NAS failed

Posted: Tue May 26, 2020 11:04 am
by cdienger
What is the output of the command 'mount' ?

Re: Snapshot to NAS failed

Posted: Wed May 27, 2020 12:39 am
by cdcsysadmin
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,size=8117000k,nr_inodes=2029250,mode=7 55)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmod e=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xa ttr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
/dev/mapper/centos-root on / type ext4 (rw,relatime,data=ordered)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
/dev/sda1 on /boot type ext4 (rw,relatime,data=ordered)
10.10.110.225:/volume4/nagioslog01 on /usr/local/nagioslogserver/snapshots01 typ e nfs (rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,ti meo=600,retrans=2,sec=sys,mountaddr=10.10.110.225,mountvers=3,mountport=892,moun tproto=udp,local_lock=none,addr=10.10.110.225)
10.10.110.225:/volume4/nagioslog03 on /usr/local/nagioslogserver/snapshots03 typ e nfs (rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,ti meo=600,retrans=2,sec=sys,mountaddr=10.10.110.225,mountvers=3,mountport=892,moun tproto=udp,local_lock=none,addr=10.10.110.225)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,size=1625740k,mode= 700,uid=1000,gid=100)

Re: Snapshot to NAS failed

Posted: Wed May 27, 2020 12:56 pm
by cdienger
Check the nagios user's UID and GID on the NLS machine and that they match the UID and GID on the NFS server. https://assets.nagios.com/downloads/nag ... ations.pdf has details on NFS setups and how to check this on the NLS server:

Code: Select all

grep nagios /etc/passwd /etc/group

Re: Snapshot to NAS failed

Posted: Tue Jun 02, 2020 2:59 am
by cdcsysadmin
Snapshots could be written on the NFS path now.
However, the state shown in 'Snapshots & Maintenance' is partial instead of success.
How do I troubleshoot the partial snapshot state?

Re: Snapshot to NAS failed

Posted: Tue Jun 02, 2020 3:47 pm
by ssax
Please run this tail command (as root) and leave it running:

Code: Select all

tail -Fn0 /usr/local/nagioslogserver/var/jobs.log
Then go to Admin > Command Subsystem and manually run the snapshots_maintenance job. Once it's complete, please send us the full output of the tail command.

Then PM a copy of your profile as well, you can download it from Admin > System Status by clicking the Download System Profile button​.

Thank you

Re: Snapshot to NAS failed

Posted: Mon Jun 08, 2020 11:38 am
by ssax
Please do this:

Please run this tail command (as root) and leave it running:

Code: Select all

tail -Fn0 /usr/local/nagioslogserver/var/jobs.log
Then go to Admin > Command Subsystem and manually run the snapshots_maintenance job. Once it's complete, please send us the full output of the tail command.

I'm seeing this:

Code: Select all

[2020-06-08 03:20:26,175][WARN ][cluster.routing.allocation.decider] [645f050f-df0b-4ddf-996b-662d84179901] high disk watermark [90%] exceeded on [OTexgAUgTo6npAoXlUC35g][645f050f-df0b-4ddf-996b-662d84179901] free: 434gb[8.1%], shards will be relocated away from this node

Code: Select all

df -h
--------------------------

Filesystem                          Size  Used Avail Use% Mounted on
/dev/mapper/centos-root             5.2T  4.7T  373G  93% /
devtmpfs                            7.8G     0  7.8G   0% /dev
tmpfs                               7.8G     0  7.8G   0% /dev/shm
tmpfs                               7.8G     0  7.8G   0% /sys/fs/cgroup
tmpfs                               7.8G  788M  7.0G  10% /run
/dev/sda1                           976M  197M  713M  22% /boot
10.X.X.X/volume4/nagioslog01   11T  9.3T  1.6T  86% /usr/local/nagioslogserver/snapshots01
10.X.X.X:/volume4/nagioslog03   11T  9.3T  1.6T  86% /usr/local/nagioslogserver/snapshots03
tmpfs                               1.6G     0  1.6G   0% /run/user/1000
tmpfs                               1.6G     0  1.6G   0% /run/user/0
See here to adjust the low/high watermark:

https://support.nagios.com/kb/article/n ... h-469.html

See here to expand the disk:

https://support.nagios.com/kb/article.p ... tegory=128

Re: Snapshot to NAS failed

Posted: Mon Jun 08, 2020 8:21 pm
by cdcsysadmin
tail log attached

Re: Snapshot to NAS failed

Posted: Tue Jun 09, 2020 4:42 pm
by ssax
Did you resolve the watermark issue before doing that? If not, please do.