Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11656

"lfs getstripe" on directory does not show default root layout

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.14.0, Lustre 2.12.4
    • Lustre 2.12.0, Lustre 2.10.8
    • None
    • 3
    • 9223372036854775807

    Description

      Running "lfs getstripe" on a directory only shows the simple filesystem default layout and not the root default layout that will actually be inherited by new files.

      # lfs setstripe -p ssd /mnt/testfs
      # lfs getstripe -d /mnt/testfs
      stripe_count: 1 stripe_size: 1048576 stripe_offset: -1 pool: ssd
      # mkdir /mnt/testfs/tmp/should_be_old
      # lfs getstripe -d /mnt/testfs/tmp/should_be_old
      stripe_count: 1 stripe_size: 1048576 stripe_offset: -1
      

      Attachments

        Issue Links

          Activity

            [LU-11656] "lfs getstripe" on directory does not show default root layout

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36609/
            Subject: LU-11656 llite: fetch default layout for a directory
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 3e8fa8a7396cd029cb0d7714a324343eed7f535e

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36609/ Subject: LU-11656 llite: fetch default layout for a directory Project: fs/lustre-release Branch: master Current Patch Set: Commit: 3e8fa8a7396cd029cb0d7714a324343eed7f535e

            Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36609
            Subject: LU-11656 utils: fix "lfs getstripe" to fetch default layout for a directory
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 497c3bd1bd884f4f2c60870a096b19c1a8ceb9fd

            gerrit Gerrit Updater added a comment - Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36609 Subject: LU-11656 utils: fix "lfs getstripe" to fetch default layout for a directory Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 497c3bd1bd884f4f2c60870a096b19c1a8ceb9fd
            yujian Jian Yu added a comment -

            Hi Andreas,

            As per your suggestions, fetching default layout for a directory will be done in lfs_getstripe()->...->cb_getstripe(). If get_lmd_info() returns -ENODATA for a directory that does not have trusted.lov xattr, cb_getstripe() should get the lov_user_md data from ROOT FID, and then print the layout information in llapi_lov_dump_user_lmm().

            One solution would be to special-case the [FID_SEQ_ROOT, FID_OID_ROOT, 0] lookup in the directory in ioctl(IOC_MDC_GETFILESTRIPE) to convert this ioctl onto the actual ROOT FID, since this is out of the critical code path, and is unlikely to cause any problems for existing files. It should treat -ENOENT as -ENODATA, in case it is running on an old client that does not have this feature, and it will just keep the same behaviour as today.

            That will probably involve taking the core of ll_lov_getstripe_ea_info() into a separate helper function that allows passing the ROOT FID, while the original ll_lov_getstripe_ea_info() gets the FID from inode.

            One upcoming issue with explicitly using the ROOT FID is that this will complicate things is the patch https://review.whamcloud.com/28972 "LU-9982 lustre: Clients striping from mapped FID in nodemap", since that feature will allow clients to inherit layouts from a non-ROOT directory based on their nodemap. In that case, it might make sense to handle the ROOT FID getstripe to be mapped to the nodemap root layout (if set) so that the "lfs getstripe" output is consistent.

            yujian Jian Yu added a comment - Hi Andreas, As per your suggestions, fetching default layout for a directory will be done in lfs_getstripe()->...->cb_getstripe(). If get_lmd_info() returns -ENODATA for a directory that does not have trusted.lov xattr, cb_getstripe() should get the lov_user_md data from ROOT FID, and then print the layout information in llapi_lov_dump_user_lmm(). One solution would be to special-case the [FID_SEQ_ROOT, FID_OID_ROOT, 0] lookup in the directory in ioctl(IOC_MDC_GETFILESTRIPE) to convert this ioctl onto the actual ROOT FID, since this is out of the critical code path, and is unlikely to cause any problems for existing files. It should treat -ENOENT as -ENODATA, in case it is running on an old client that does not have this feature, and it will just keep the same behaviour as today. That will probably involve taking the core of ll_lov_getstripe_ea_info() into a separate helper function that allows passing the ROOT FID, while the original ll_lov_getstripe_ea_info() gets the FID from inode. One upcoming issue with explicitly using the ROOT FID is that this will complicate things is the patch https://review.whamcloud.com/28972 " LU-9982 lustre: Clients striping from mapped FID in nodemap", since that feature will allow clients to inherit layouts from a non-ROOT directory based on their nodemap. In that case, it might make sense to handle the ROOT FID getstripe to be mapped to the nodemap root layout (if set) so that the "lfs getstripe" output is consistent.

            Jian, is the default layout for the directory going to be fetched from the ROOT FID inside the kernel or from the "lfs getstripe" command? Fetching it from the kernel simplifies some issues for userspace applications, but has the drawback that tools like "tar" will make a copy of the default trusted.lov xattr for every directory in the filesystem which doesn't have its own xattr.

            Doing it in "lfs getstripe" avoids this issue, but may have other issues (e.g. opening $MOUNT/.lustre/fid is only accessible to root, and is not available in subdirectory mounts.

            adilger Andreas Dilger added a comment - Jian, is the default layout for the directory going to be fetched from the ROOT FID inside the kernel or from the " lfs getstripe " command? Fetching it from the kernel simplifies some issues for userspace applications, but has the drawback that tools like "tar" will make a copy of the default trusted.lov xattr for every directory in the filesystem which doesn't have its own xattr. Doing it in " lfs getstripe " avoids this issue, but may have other issues (e.g. opening $MOUNT/.lustre/fid is only accessible to root, and is not available in subdirectory mounts.
            yujian Jian Yu added a comment -

            According to Andreas' comments, I'm working on the patch to fetch the layout from the ROOT FID [FID_SEQ_ROOT, FID_OID_ROOT, 0].

            yujian Jian Yu added a comment - According to Andreas' comments, I'm working on the patch to fetch the layout from the ROOT FID [FID_SEQ_ROOT, FID_OID_ROOT, 0] .
            yujian Jian Yu added a comment -

            Hi Nathan,
            Let me look into this and work on it.

            yujian Jian Yu added a comment - Hi Nathan, Let me look into this and work on it.

            Jian.

            I just wanted to make sure you were aware that this is still a problem, even with LU-10629 fixed or deleting the stripe info on a directory so that it should implicitly inherit the filesystem default.  I agree with the statement that Andreas made "We need some way to fetch the actual default layout that will be used when new files will be created in that directory."

            Thanks,
            Nathan

            dauchy Nathan Dauchy (Inactive) added a comment - Jian. I just wanted to make sure you were aware that this is still a problem, even with LU-10629 fixed or deleting the stripe info on a directory so that it should implicitly inherit the filesystem default.  I agree with the statement that Andreas made " We need some way to fetch the actual default layout that will be used when new files will be created in that directory. " Thanks, Nathan
            yujian Jian Yu added a comment -

            Thank you, Nathan.

            However, "lfs setstripe -d /lfstmp/toplevel" will delete both the striping pattern and OST pool (fixed in LU-10629) on the directory. So, the above test results are correct.

            yujian Jian Yu added a comment - Thank you, Nathan. However, "lfs setstripe -d /lfstmp/toplevel" will delete both the striping pattern and OST pool (fixed in LU-10629 ) on the directory. So, the above test results are correct.

            Jian, I wanted to make sure the "explicitly inheriting" issue was clear, as it explains why your test showed different results than the original bug report.  You did everything in the top-level.  So, to reproduce and see that "lfs getstripe" does not actually report the filesystem default, try something like this:

            # lfs getstripe -d /lfstmp
            stripe_count: 1 stripe_size: 1048576 stripe_offset: -1 pool: OLD
            # mkdir /lfstmp/toplevel
            # lfs getstripe -d /lfstmp/toplevel
            stripe_count: 1 stripe_size: 1048576 stripe_offset: -1 pool: OLD
               (that is the root level Explicit inheritance)
            
            # lfs setstripe -d /lfstmp/toplevel
            # lfs getstripe -d /lfstmp/toplevel
            stripe_count: 1 stripe_size: 1048576 stripe_offset: -1
            # mkdir /lfstmp/toplevel/should_be_old
            # lfs getstripe -d /lfstmp/toplevel/should_be_old/
            stripe_count: 1 stripe_size: 1048576 stripe_offset: -1
              (neither the top level or new dirs display "OLD", but they should)
            

            Hope this helps!
            -Nathan

            dauchy Nathan Dauchy (Inactive) added a comment - Jian, I wanted to make sure the "explicitly inheriting" issue was clear, as it explains why your test showed different results than the original bug report.  You did everything in the top-level.  So, to reproduce and see that "lfs getstripe" does not actually report the filesystem default, try something like this: # lfs getstripe -d /lfstmp stripe_count: 1 stripe_size: 1048576 stripe_offset: -1 pool: OLD # mkdir /lfstmp/toplevel # lfs getstripe -d /lfstmp/toplevel stripe_count: 1 stripe_size: 1048576 stripe_offset: -1 pool: OLD (that is the root level Explicit inheritance) # lfs setstripe -d /lfstmp/toplevel # lfs getstripe -d /lfstmp/toplevel stripe_count: 1 stripe_size: 1048576 stripe_offset: -1 # mkdir /lfstmp/toplevel/should_be_old # lfs getstripe -d /lfstmp/toplevel/should_be_old/ stripe_count: 1 stripe_size: 1048576 stripe_offset: -1 (neither the top level or new dirs display "OLD", but they should) Hope this helps! -Nathan

            After LU-11739, we shouldn't be explicitly inheriting the layout from the root directory anymore.

            However, I think the "lfs getstripe" on directories that do not have an actual LOV EA will only print the stripe_count, stripe_size, and stripe_index that is fetched from the /sys/fs/lustre/lov values.

            We need some way to fetch the actual default layout that will be used when new files will be created in that directory. We can't use the layout from the directory of the mountpoint, since that might be a subdirectory mount. One option would be to fetch the layout from the ROOT FID?

            adilger Andreas Dilger added a comment - After LU-11739 , we shouldn't be explicitly inheriting the layout from the root directory anymore. However, I think the "lfs getstripe" on directories that do not have an actual LOV EA will only print the stripe_count, stripe_size, and stripe_index that is fetched from the /sys/fs/lustre/lov values. We need some way to fetch the actual default layout that will be used when new files will be created in that directory. We can't use the layout from the directory of the mountpoint, since that might be a subdirectory mount. One option would be to fetch the layout from the ROOT FID?
            yujian Jian Yu added a comment -

            Here is the test result on Lustre 2.10.5_ddn6 and master branch:

            # lfs getstripe -d /mnt/lustre
            stripe_count:  1 stripe_size:   1048576 pattern:       0 stripe_offset: -1
            
            # mkdir /mnt/lustre/olddir
            # lfs getstripe -d /mnt/lustre/olddir
            stripe_count:  1 stripe_size:   1048576 pattern:       0 stripe_offset: -1
            
            # lfs setstripe -p OLD /mnt/lustre
            # lfs getstripe -d /mnt/lustre
            stripe_count:  1 stripe_size:   1048576 pattern:       raid0 stripe_offset: -1 pool:          OLD
            
            # lfs getstripe -d /mnt/lustre/olddir
            stripe_count:  1 stripe_size:   1048576 pattern:       0 stripe_offset: -1
            
            # mkdir /mnt/lustre/newdir
            # lfs getstripe -d /mnt/lustre/newdir
            stripe_count:  1 stripe_size:   1048576 pattern:       raid0 stripe_offset: -1 pool:          OLD
            

            The result is different from the description part of this ticket. Actually, running "lfs getstripe" on newly created directory after setting pool name on its parent directory can still show the pool name. Only the directory created before setting pool name on its parent directory has pool name printing issue.

            While running "lfs getstripe", the following codes produced the different outputs for "/mnt/lustre" and "/mnt/lustre/olddir":

            lov_dump_plain_user_lmm()
                    __u32 magic = *(__u32 *)&param->fp_lmd->lmd_lmm;
                    //......
                    if (magic == LOV_USER_MAGIC_V1) {                    <------ "/mnt/lustre/olddir" had LOV_USER_MAGIC_V1
                            lov_dump_user_lmm_v1v3(...);
                    } else {                                                                    <------ "/mnt/lustre" had LOV_USER_MAGIC_V3
                            char pool_name[LOV_MAXPOOLNAME + 1];
                            struct lov_user_ost_data_v1 *objects;
                            struct lov_user_md_v3 *lmmv3 = (void *)&param->fp_lmd->lmd_lmm;
            
                            snprintf(pool_name, sizeof(pool_name), "%s",
                                     lmmv3->lmm_pool_name);
                            objects = lmmv3->lmm_objects;
                            lov_dump_user_lmm_v1v3(...);
                    }
            

            After running "lfs setstripe -p OLD" on "/mnt/lustre", its magic was changed from LOV_USER_MAGIC_V1 to LOV_USER_MAGIC_V3 in lod_xattr_set(). However, the previously created "/mnt/lustre/olddir" still had magic of LOV_USER_MAGIC_V1 and was not changed.

            yujian Jian Yu added a comment - Here is the test result on Lustre 2.10.5_ddn6 and master branch: # lfs getstripe -d /mnt/lustre stripe_count: 1 stripe_size: 1048576 pattern: 0 stripe_offset: -1 # mkdir /mnt/lustre/olddir # lfs getstripe -d /mnt/lustre/olddir stripe_count: 1 stripe_size: 1048576 pattern: 0 stripe_offset: -1 # lfs setstripe -p OLD /mnt/lustre # lfs getstripe -d /mnt/lustre stripe_count: 1 stripe_size: 1048576 pattern: raid0 stripe_offset: -1 pool: OLD # lfs getstripe -d /mnt/lustre/olddir stripe_count: 1 stripe_size: 1048576 pattern: 0 stripe_offset: -1 # mkdir /mnt/lustre/newdir # lfs getstripe -d /mnt/lustre/newdir stripe_count: 1 stripe_size: 1048576 pattern: raid0 stripe_offset: -1 pool: OLD The result is different from the description part of this ticket. Actually, running "lfs getstripe" on newly created directory after setting pool name on its parent directory can still show the pool name. Only the directory created before setting pool name on its parent directory has pool name printing issue. While running "lfs getstripe", the following codes produced the different outputs for "/mnt/lustre" and "/mnt/lustre/olddir": lov_dump_plain_user_lmm() __u32 magic = *(__u32 *)&param->fp_lmd->lmd_lmm; //...... if (magic == LOV_USER_MAGIC_V1) { <------ "/mnt/lustre/olddir" had LOV_USER_MAGIC_V1 lov_dump_user_lmm_v1v3(...); } else { <------ "/mnt/lustre" had LOV_USER_MAGIC_V3 char pool_name[LOV_MAXPOOLNAME + 1]; struct lov_user_ost_data_v1 *objects; struct lov_user_md_v3 *lmmv3 = (void *)&param->fp_lmd->lmd_lmm; snprintf(pool_name, sizeof(pool_name), "%s" , lmmv3->lmm_pool_name); objects = lmmv3->lmm_objects; lov_dump_user_lmm_v1v3(...); } After running "lfs setstripe -p OLD" on "/mnt/lustre", its magic was changed from LOV_USER_MAGIC_V1 to LOV_USER_MAGIC_V3 in lod_xattr_set(). However, the previously created "/mnt/lustre/olddir" still had magic of LOV_USER_MAGIC_V1 and was not changed.

            People

              yujian Jian Yu
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: