Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2809

fix ll_setxattr() to always ignore ll_stripe_offset

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.4.0, Lustre 2.1.5
    • 3
    • 6799

    Description

      With the introduction of lum_layout_version field to lov_user_md, using "tar --xattr" to backup and restore the Lustre layout (LOV EA) will incorrectly pass the lum_layout_version value back to ll_setxattr() as lum_stripe_offset. This will incorrectly turn the layout version into the starting OST index, for cases where the version is within [1,max_ost_idx].

      This may cause, for example, all files that have been migrated once to be restored by tar onto OST1.

      Attachments

        Issue Links

          Activity

            [LU-2809] fix ll_setxattr() to always ignore ll_stripe_offset

            Also this solution won't last long because v4 version of layout is on its way, at that time, layout gen will no longer share field with others.

            In that case I agree with your solution. Thanks

            nedbass Ned Bass (Inactive) added a comment - Also this solution won't last long because v4 version of layout is on its way, at that time, layout gen will no longer share field with others. In that case I agree with your solution. Thanks

            getxattr should take the returning data as opaque unless the extended attributes are system defined. Ioctl and getxattr are already two separate interfaces so I tend to think it's okay to return different data, if we have defined the return value clearly.

            I worked out this patch because it'd be the simplest way to keep compatibility. Also this solution won't last long because v4 version of layout is on its way, at that time, layout gen will no longer share field with others.

            jay Jinshan Xiong (Inactive) added a comment - getxattr should take the returning data as opaque unless the extended attributes are system defined. Ioctl and getxattr are already two separate interfaces so I tend to think it's okay to return different data, if we have defined the return value clearly. I worked out this patch because it'd be the simplest way to keep compatibility. Also this solution won't last long because v4 version of layout is on its way, at that time, layout gen will no longer share field with others.

            Jinshan, my reservation about that approach is that you are removing functionality from the getxattr interface. On platforms that use function shipping, [gs]etxattr may be the only option because the needed ioctls are not supported. Also, having inconsistent semantics between ioctl and getxattr is confusing.

            Having said all that, I don't know whether applications on function-shipping clients would ever need to know they layout generation number. But in general it seems better to leave open the possibility in case we need it later.

            nedbass Ned Bass (Inactive) added a comment - Jinshan, my reservation about that approach is that you are removing functionality from the getxattr interface. On platforms that use function shipping, [gs] etxattr may be the only option because the needed ioctls are not supported. Also, having inconsistent semantics between ioctl and getxattr is confusing. Having said all that, I don't know whether applications on function-shipping clients would ever need to know they layout generation number. But in general it seems better to leave open the possibility in case we need it later.

            patch is at: http://review.whamcloud.com/5664

            In the above patch, layout_gen is cleared for getxattr

            jay Jinshan Xiong (Inactive) added a comment - patch is at: http://review.whamcloud.com/5664 In the above patch, layout_gen is cleared for getxattr

            hmm.. do you think it's a good idea to pack layout_gen into low 12bit of lmm_stripe_size. Usually we don't need to know the exact version of layout_gen because it's only used to detect layout change. That being said, we don't need to keep layout_gen when restoring a file.

            jay Jinshan Xiong (Inactive) added a comment - hmm.. do you think it's a good idea to pack layout_gen into low 12bit of lmm_stripe_size. Usually we don't need to know the exact version of layout_gen because it's only used to detect layout change. That being said, we don't need to keep layout_gen when restoring a file.

            Ned, yes this does seem like a good path forward. There have been requests to allow specifying the exact OST index for all of the stripes.

            adilger Andreas Dilger added a comment - Ned, yes this does seem like a good path forward. There have been requests to allow specifying the exact OST index for all of the stripes.
            emoly.liu Emoly Liu added a comment -

            Ned, thanks for reminder! We have "lfs swap_layouts".

            emoly.liu Emoly Liu added a comment - Ned, thanks for reminder! We have "lfs swap_layouts".
            nedbass Ned Bass (Inactive) added a comment - - edited

            Andreas, yes we could do this in the context on LU-2182. I was thinking we could handle the simple case first, which is to set the offset of stripe 0. llapi_layout_ost_index_set() would store the OST index of stripe N in lum.lmm_objects[N].l_ost_idx. For now, ll_setxattr would copy lmm_objects[0].l_ost_idx to lum.lum_stripe_offset if lmm_magic has the flag set. This approach would let us match the functionality that exists today without changing any server code. Eventually we can extend this to support stripes > 0 without changing the API, but that would be more invasive (i.e. requiring changes to lod). Does this approach sound reasonable?

            nedbass Ned Bass (Inactive) added a comment - - edited Andreas, yes we could do this in the context on LU-2182 . I was thinking we could handle the simple case first, which is to set the offset of stripe 0. llapi_layout_ost_index_set() would store the OST index of stripe N in lum.lmm_objects [N] .l_ost_idx. For now, ll_setxattr would copy lmm_objects [0] .l_ost_idx to lum.lum_stripe_offset if lmm_magic has the flag set. This approach would let us match the functionality that exists today without changing any server code. Eventually we can extend this to support stripes > 0 without changing the API, but that would be more invasive (i.e. requiring changes to lod). Does this approach sound reasonable?

            Emoly, you can reproduce the problem without modifying liblustreapi by swapping the layout of two files with the llapi_swap_layouts() function, which will force their generation numbers to increment. Then tarring and untarring them with --xattr will force their offset to be at index 1.

            nedbass Ned Bass (Inactive) added a comment - Emoly, you can reproduce the problem without modifying liblustreapi by swapping the layout of two files with the llapi_swap_layouts() function, which will force their generation numbers to increment. Then tarring and untarring them with --xattr will force their offset to be at index 1.

            Emoly, I think your reproducer exactly demonstrates the problem - if lum_layout_gen is non-zero it will be confused with lum_stripe_offset.

            Ned, I think your solution is interesting, and is also related to LOV_MAGIC_V[13]_DEF magic that already exists in the code. Should this new flag be implemented in the context of LU-2182?

            adilger Andreas Dilger added a comment - Emoly, I think your reproducer exactly demonstrates the problem - if lum_layout_gen is non-zero it will be confused with lum_stripe_offset. Ned, I think your solution is interesting, and is also related to LOV_MAGIC_V [13] _DEF magic that already exists in the code. Should this new flag be implemented in the context of LU-2182 ?

            People

              jay Jinshan Xiong (Inactive)
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: