Details

    • New Feature
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • Upstream, Lustre 2.15.0
    • Lustre filesystem with ZFS as backend filesystem.
    • 9223372036854775807

    Description

      lctl snapshot create does not work if one of the nodes is not reachable:

      It seems like the lctl snapshot commands do not work if the resources fail to the partner node even though the partner node details are mentioned in ldev.conf.

      Sample ldev.conf is as follows:

      cslmo4702       cslmo4703       testfs-MDT0000  zfs:/dev/pool-mds65/mdt65 - -
      cslmo4703       cslmo4702       testfs-MDT0001  zfs:/dev/pool-mds66/mdt66 - -
      cslmo4704       cslmo4705       testfs-OST0000  zfs:/dev/pool-oss0/ost0 - -
      cslmo4705      cslmo4704       testfs-OST0001  zfs:/dev/pool-oss1/ost1 - -
      

      For eg. let's say there are 2 nodes cslmo4704 & cslmo4705. cslmo4705 is the partner of cslmo4704 and vice versa. cslmo4704 has dataset zfs:/dev/pool-oss0/ost0 and cslmo4705 has dataset zfs:/dev/pool-oss1/ost1. I fail/power off host cslmo4705 and hence the dataset /dev/pool-oss1/ost1 correctly fails over to cslmo4704. cslmo4704 has both datasets. In this situation, if I try to create lustre snapshot using “lctl snapshot create” command, the command fails on the dataset /dev/pool-oss1/ost1.

      [root@cslmo4702 ~]# lctl snapshot_create -F testfs -n snap_test5				
      ssh: connect to host cslmo4705 port 22: No route to host				
      ssh: connect to host cslmo4705 port 22: No route to host				
      ssh: connect to host cslmo4705 port 22: No route to host				
      Can't create the snapshot snap_test5

      Attachments

        Issue Links

          Activity

            [LU-16072] snapshot support to foreign host
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48226/
            Subject: LU-16072 utils: snapshot support to foreign host
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 815ca64afc8e54f9707ed9f458e14a9c99629ed7

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48226/ Subject: LU-16072 utils: snapshot support to foreign host Project: fs/lustre-release Branch: master Current Patch Set: Commit: 815ca64afc8e54f9707ed9f458e14a9c99629ed7
            akash-b Akash B added a comment -

            CR/Patch: https://review.whamcloud.com/#/c/48226/

            Subject: LU-16072 utils: snapshot support to foreign host

            Project: fs/lustre-release
            Branch: master

            akash-b Akash B added a comment - CR/Patch: https://review.whamcloud.com/#/c/48226/ Subject: LU-16072 utils: snapshot support to foreign host Project: fs/lustre-release Branch: master
            pjones Peter Jones added a comment -

            Cory

            I think that was JIRA's helpful error message to tell you that Akash B was not a valid selection  I have added Akash B into the developers group and now tickets can be assigned ok

            Peter

            pjones Peter Jones added a comment - Cory I think that was JIRA's helpful error message to tell you that Akash B was not a valid selection  I have added Akash B into the developers group and now tickets can be assigned ok Peter
            spitzcor Cory Spitz added a comment -

            pjones, can you please assign this to akash-b? Jira gives me some "communications breakdown" error when I try to do it.

            spitzcor Cory Spitz added a comment - pjones , can you please assign this to akash-b ? Jira gives me some "communications breakdown" error when I try to do it.
            akash-b Akash B added a comment - - edited

            HPE bug-id: LUS-10648

            Reproduced with lustre 2.15:

            snapshot config used:

            [root@cslmo1602 ~]# cat /etc/ldev.conf
            #local foreign/- label [md|zfs:]device-path [journal-path]/- [raidtab]
            cslmo1602 cslmo1603 testfs-MDT0000 zfs:pool-mds65/mdt65
            cslmo1603 cslmo1602 testfs-MDT0001 zfs:pool-mds66/mdt66
            cslmo1604 cslmo1605 testfs-OST0000 zfs:pool-oss0/ost0
            cslmo1605 cslmo1604 testfs-OST0001 zfs:pool-oss1/ost1
            cslmo1606 cslmo1607 testfs-OST0002 zfs:pool-oss0/ost0
            cslmo1607 cslmo1606 testfs-OST0003 zfs:pool-oss1/ost1
            

             Lustre targets when nodes are in a failed-over state:

            [root@cslmo1600 ~]# pdsh -g lustre mount -t lustre | sort
            cslmo1602: pool-mds65/mdt65 on /data/mdt65 type lustre (ro,svname=testfs-MDT0000,mgs,osd=osd-zfs)
            cslmo1602: pool-mds66/mdt66 on /data/mdt66 type lustre (ro,svname=testfs-MDT0001,mgsnode=:10.230.26.5@o2ib,10.230.26.6@o2ib:10.230.26.7@o2ib,10.230.26.8@o2ib,osd=osd-zfs)
            cslmo1605: pool-oss0/ost0 on /data/ost0 type lustre (ro,svname=testfs-OST0000,mgsnode=:10.230.26.5@o2ib,10.230.26.6@o2ib:10.230.26.7@o2ib,10.230.26.8@o2ib,osd=osd-zfs)
            cslmo1605: pool-oss1/ost1 on /data/ost1 type lustre (ro,svname=testfs-OST0001,mgsnode=:10.230.26.5@o2ib,10.230.26.6@o2ib:10.230.26.7@o2ib,10.230.26.8@o2ib,osd=osd-zfs)
            cslmo1606: pool-oss0/ost0 on /data/ost0 type lustre (ro,svname=testfs-OST0002,mgsnode=:10.230.26.5@o2ib,10.230.26.6@o2ib:10.230.26.7@o2ib,10.230.26.8@o2ib,osd=osd-zfs)
            cslmo1606: pool-oss1/ost1 on /data/ost1 type lustre (ro,svname=testfs-OST0003,mgsnode=:10.230.26.5@o2ib,10.230.26.6@o2ib:10.230.26.7@o2ib,10.230.26.8@o2ib,osd=osd-zfs)
            
            

            Before PATCH:
            =========

            snapshot create:

            [root@cslmo1602 ~]# lctl snapshot_create -F testfs -n snap1
            ssh: connect to host cslmo1604 port 22: No route to host
            Can't create the snapshot snap1
             
            [root@cslmo1602 ~]# lctl snapshot_list -F testfs
             
            filesystem_name: testfs
            snapshot_name: pre_snap
            create_time: Thu Jul  7 16:19:49 2022
            modify_time: Thu Jul  7 16:19:49 2022
            snapshot_fsname: 01a29921
            status: not mount
            [root@cslmo1602 ~]#
            

            snapshot list:

            [root@cslmo1602 ~]# lctl snapshot_list -F testfs -d
             
            filesystem_name: testfs
            snapshot_name: pre_snap
             
            snapshot_role: MDT0000
            create_time: Thu Jul  7 16:19:49 2022
            modify_time: Thu Jul  7 16:19:49 2022
            snapshot_fsname: 01a29921
            status: not mount
             
            snapshot_role: MDT0001
            cannot open 'pool-mds66/mdt66@pre_snap': dataset does not exist
            status: not mount
             
            snapshot_role: OST0000
            cannot open 'pool-oss0/ost0@pre_snap': dataset does not exist
            status: not mount
             
            snapshot_role: OST0001
            create_time: Thu Jul  7 16:19:49 2022
            modify_time: Thu Jul  7 16:19:49 2022
            snapshot_fsname: 01a29921
            status: not mount
             
            snapshot_role: OST0002
            snapshot_fsname: 01a29921
            modify_time: Thu Jul  7 16:19:49 2022
            create_time: Thu Jul  7 16:19:49 2022
            status: not mount
             
            snapshot_role: OST0003
            cannot open 'pool-oss1/ost1@pre_snap': dataset does not exist
            status: not mount
             

            snapshot mount/umount:

            [root@cslmo1602 ~]# lctl snapshot_mount -F testfs -n pre_snap
            mount.lustre: pool-mds66/mdt66@pre_snap has not been formatted with mkfs.lustre or the backend filesystem type is not supported by this tool
            mount.lustre: pool-oss0/ost0@pre_snap has not been formatted with mkfs.lustre or the backend filesystem type is not supported by this tool
            mount.lustre: pool-oss1/ost1@pre_snap has not been formatted with mkfs.lustre or the backend filesystem type is not supported by this tool
            3 of 6 pieces of the snapshot pre_snap can't be mounted: No such device
             

            snapshot modify:

            [root@cslmo1602 ~]# lctl snapshot_modify -F testfs -n pre_snap -N mod_snap
            cannot open 'pool-oss0/ost0@pre_snap': dataset does not exist
            cannot open 'pool-mds66/mdt66@pre_snap': dataset does not exist
            cannot open 'pool-oss1/ost1@pre_snap': dataset does not exist
            Can't modify the snapshot pre_snap
             

            snapshot destroy:

            [root@cslmo1602 ~]# lctl snapshot_destroy -F testfs -n pre_snap
            Miss snapshot piece on the OST0000. Use '-f' option if want to destroy it by force.
            Can't destroy the snapshot pre_snap
            [root@cslmo1602 ~]#
            

             After PATCH:
            ========

            Applied lustre fix:

            snapshot create:

            [root@cslmo1602 ~]# lctl snapshot_create -F testfs -n snap1
            [root@cslmo1602 ~]#
            

            snapshot list:

            [root@cslmo1602 ~]# lctl snapshot_list -F testfs
             
            filesystem_name: testfs
            snapshot_name: snap1
            modify_time: Fri Jul  8 14:06:05 2022
            create_time: Fri Jul  8 14:06:05 2022
            snapshot_fsname: 4d4aaffb
            status: not mount
             
            filesystem_name: testfs
            snapshot_name: pre_snap
            create_time: Thu Jul  7 16:19:49 2022
            modify_time: Thu Jul  7 16:35:34 2022
            snapshot_fsname: 01a29921
            status: not mount
            [root@cslmo1602 ~]#
            [root@cslmo1602 ~]# lctl snapshot_list -F testfs -n snap1 -d
             
            filesystem_name: testfs
            snapshot_name: snap1
             
            snapshot_role: MDT0000
            modify_time: Fri Jul  8 14:06:05 2022
            create_time: Fri Jul  8 14:06:05 2022
            snapshot_fsname: 4d4aaffb
            status: not mount
             
            snapshot_role: MDT0001
            modify_time: Fri Jul  8 14:06:05 2022
            create_time: Fri Jul  8 14:06:05 2022
            snapshot_fsname: 4d4aaffb
            status: not mount
             
            snapshot_role: OST0000
            create_time: Fri Jul  8 14:06:05 2022
            modify_time: Fri Jul  8 14:06:05 2022
            snapshot_fsname: 4d4aaffb
            status: not mount
             
            snapshot_role: OST0001
            snapshot_fsname: 4d4aaffb
            create_time: Fri Jul  8 14:06:05 2022
            modify_time: Fri Jul  8 14:06:05 2022
            status: not mount
             
            snapshot_role: OST0002
            snapshot_fsname: 4d4aaffb
            create_time: Fri Jul  8 14:06:05 2022
            modify_time: Fri Jul  8 14:06:05 2022
            status: not mount
             
            snapshot_role: OST0003
            create_time: Fri Jul  8 14:06:05 2022
            modify_time: Fri Jul  8 14:06:05 2022
            snapshot_fsname: 4d4aaffb
            status: not mount
            [root@cslmo1602 ~]#
            

            snapshot mount/umount:

            [root@cslmo1602 ~]# lctl snapshot_mount -F testfs -n snap1
            mounted the snapshot snap1 with fsname 4d4aaffb
            [root@cslmo1602 ~]#
            [root@cslmo1602 ~]# lctl snapshot_umount -F testfs -n snap1
            [root@cslmo1602 ~]#
            

            snapshot modify:

            [root@cslmo1602 ~]# lctl snapshot_modify -F testfs -n snap1 -N Snap1
            [root@cslmo1602 ~]#
            

            snapshot destroy:

            [root@cslmo1602 ~]# lctl snapshot_destroy -F testfs -n Snap1
            [root@cslmo1602 ~]#
            [root@cslmo1602 ~]# lctl snapshot_list -F testfs -n Snap1
            Can't list the snapshot Snap1
            [root@cslmo1602 ~]#
            

            With the fix applied, we will be able to create/destroy/modify/list/mount/umount a lustre snapshot even when lustre targets are failed over to the partner nodes as defined by the /etc/ldev.conf configuration file. Previously this would fail as foreign host field in /etc/ldev.conf configuration file was ignored.

            akash-b Akash B added a comment - - edited HPE bug-id: LUS-10648 Reproduced with lustre 2.15: snapshot config used: [root@cslmo1602 ~]# cat /etc/ldev.conf #local foreign/- label [md|zfs:]device-path [journal-path]/- [raidtab] cslmo1602 cslmo1603 testfs-MDT0000 zfs:pool-mds65/mdt65 cslmo1603 cslmo1602 testfs-MDT0001 zfs:pool-mds66/mdt66 cslmo1604 cslmo1605 testfs-OST0000 zfs:pool-oss0/ost0 cslmo1605 cslmo1604 testfs-OST0001 zfs:pool-oss1/ost1 cslmo1606 cslmo1607 testfs-OST0002 zfs:pool-oss0/ost0 cslmo1607 cslmo1606 testfs-OST0003 zfs:pool-oss1/ost1  Lustre targets when nodes are in a failed-over state: [root@cslmo1600 ~]# pdsh -g lustre mount -t lustre | sort cslmo1602: pool-mds65/mdt65 on /data/mdt65 type lustre (ro,svname=testfs-MDT0000,mgs,osd=osd-zfs) cslmo1602: pool-mds66/mdt66 on /data/mdt66 type lustre (ro,svname=testfs-MDT0001,mgsnode=:10.230.26.5@o2ib,10.230.26.6@o2ib:10.230.26.7@o2ib,10.230.26.8@o2ib,osd=osd-zfs) cslmo1605: pool-oss0/ost0 on /data/ost0 type lustre (ro,svname=testfs-OST0000,mgsnode=:10.230.26.5@o2ib,10.230.26.6@o2ib:10.230.26.7@o2ib,10.230.26.8@o2ib,osd=osd-zfs) cslmo1605: pool-oss1/ost1 on /data/ost1 type lustre (ro,svname=testfs-OST0001,mgsnode=:10.230.26.5@o2ib,10.230.26.6@o2ib:10.230.26.7@o2ib,10.230.26.8@o2ib,osd=osd-zfs) cslmo1606: pool-oss0/ost0 on /data/ost0 type lustre (ro,svname=testfs-OST0002,mgsnode=:10.230.26.5@o2ib,10.230.26.6@o2ib:10.230.26.7@o2ib,10.230.26.8@o2ib,osd=osd-zfs) cslmo1606: pool-oss1/ost1 on /data/ost1 type lustre (ro,svname=testfs-OST0003,mgsnode=:10.230.26.5@o2ib,10.230.26.6@o2ib:10.230.26.7@o2ib,10.230.26.8@o2ib,osd=osd-zfs) Before PATCH: ========= snapshot create: [root@cslmo1602 ~]# lctl snapshot_create -F testfs -n snap1 ssh: connect to host cslmo1604 port 22: No route to host Can't create the snapshot snap1   [root@cslmo1602 ~]# lctl snapshot_list -F testfs   filesystem_name: testfs snapshot_name: pre_snap create_time: Thu Jul  7 16:19:49 2022 modify_time: Thu Jul  7 16:19:49 2022 snapshot_fsname: 01a29921 status: not mount [root@cslmo1602 ~]# snapshot list: [root@cslmo1602 ~]# lctl snapshot_list -F testfs -d   filesystem_name: testfs snapshot_name: pre_snap   snapshot_role: MDT0000 create_time: Thu Jul  7 16:19:49 2022 modify_time: Thu Jul  7 16:19:49 2022 snapshot_fsname: 01a29921 status: not mount   snapshot_role: MDT0001 cannot open 'pool-mds66/mdt66@pre_snap' : dataset does not exist status: not mount   snapshot_role: OST0000 cannot open 'pool-oss0/ost0@pre_snap' : dataset does not exist status: not mount   snapshot_role: OST0001 create_time: Thu Jul  7 16:19:49 2022 modify_time: Thu Jul  7 16:19:49 2022 snapshot_fsname: 01a29921 status: not mount   snapshot_role: OST0002 snapshot_fsname: 01a29921 modify_time: Thu Jul  7 16:19:49 2022 create_time: Thu Jul  7 16:19:49 2022 status: not mount   snapshot_role: OST0003 cannot open 'pool-oss1/ost1@pre_snap' : dataset does not exist status: not mount   snapshot mount/umount: [root@cslmo1602 ~]# lctl snapshot_mount -F testfs -n pre_snap mount.lustre: pool-mds66/mdt66@pre_snap has not been formatted with mkfs.lustre or the backend filesystem type is not supported by this tool mount.lustre: pool-oss0/ost0@pre_snap has not been formatted with mkfs.lustre or the backend filesystem type is not supported by this tool mount.lustre: pool-oss1/ost1@pre_snap has not been formatted with mkfs.lustre or the backend filesystem type is not supported by this tool 3 of 6 pieces of the snapshot pre_snap can't be mounted: No such device   snapshot modify: [root@cslmo1602 ~]# lctl snapshot_modify -F testfs -n pre_snap -N mod_snap cannot open 'pool-oss0/ost0@pre_snap' : dataset does not exist cannot open 'pool-mds66/mdt66@pre_snap' : dataset does not exist cannot open 'pool-oss1/ost1@pre_snap' : dataset does not exist Can't modify the snapshot pre_snap   snapshot destroy: [root@cslmo1602 ~]# lctl snapshot_destroy -F testfs -n pre_snap Miss snapshot piece on the OST0000. Use '-f' option if want to destroy it by force. Can't destroy the snapshot pre_snap [root@cslmo1602 ~]#  After PATCH: ======== Applied lustre fix: snapshot create: [root@cslmo1602 ~]# lctl snapshot_create -F testfs -n snap1 [root@cslmo1602 ~]# snapshot list: [root@cslmo1602 ~]# lctl snapshot_list -F testfs   filesystem_name: testfs snapshot_name: snap1 modify_time: Fri Jul  8 14:06:05 2022 create_time: Fri Jul  8 14:06:05 2022 snapshot_fsname: 4d4aaffb status: not mount   filesystem_name: testfs snapshot_name: pre_snap create_time: Thu Jul  7 16:19:49 2022 modify_time: Thu Jul  7 16:35:34 2022 snapshot_fsname: 01a29921 status: not mount [root@cslmo1602 ~]# [root@cslmo1602 ~]# lctl snapshot_list -F testfs -n snap1 -d   filesystem_name: testfs snapshot_name: snap1   snapshot_role: MDT0000 modify_time: Fri Jul  8 14:06:05 2022 create_time: Fri Jul  8 14:06:05 2022 snapshot_fsname: 4d4aaffb status: not mount   snapshot_role: MDT0001 modify_time: Fri Jul  8 14:06:05 2022 create_time: Fri Jul  8 14:06:05 2022 snapshot_fsname: 4d4aaffb status: not mount   snapshot_role: OST0000 create_time: Fri Jul  8 14:06:05 2022 modify_time: Fri Jul  8 14:06:05 2022 snapshot_fsname: 4d4aaffb status: not mount   snapshot_role: OST0001 snapshot_fsname: 4d4aaffb create_time: Fri Jul  8 14:06:05 2022 modify_time: Fri Jul  8 14:06:05 2022 status: not mount   snapshot_role: OST0002 snapshot_fsname: 4d4aaffb create_time: Fri Jul  8 14:06:05 2022 modify_time: Fri Jul  8 14:06:05 2022 status: not mount   snapshot_role: OST0003 create_time: Fri Jul  8 14:06:05 2022 modify_time: Fri Jul  8 14:06:05 2022 snapshot_fsname: 4d4aaffb status: not mount [root@cslmo1602 ~]# snapshot mount/umount: [root@cslmo1602 ~]# lctl snapshot_mount -F testfs -n snap1 mounted the snapshot snap1 with fsname 4d4aaffb [root@cslmo1602 ~]# [root@cslmo1602 ~]# lctl snapshot_umount -F testfs -n snap1 [root@cslmo1602 ~]# snapshot modify: [root@cslmo1602 ~]# lctl snapshot_modify -F testfs -n snap1 -N Snap1 [root@cslmo1602 ~]# snapshot destroy: [root@cslmo1602 ~]# lctl snapshot_destroy -F testfs -n Snap1 [root@cslmo1602 ~]# [root@cslmo1602 ~]# lctl snapshot_list -F testfs -n Snap1 Can't list the snapshot Snap1 [root@cslmo1602 ~]# With the fix applied, we will be able to create/destroy/modify/list/mount/umount a lustre snapshot even when lustre targets are failed over to the partner nodes as defined by the /etc/ldev.conf configuration file. Previously this would fail as foreign host field in /etc/ldev.conf configuration file was ignored.

            People

              akash-b Akash B
              akash-b Akash B
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: