Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12852

growing a PFL file with last stripe as -1 fails

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.14.0, Lustre 2.12.5
    • Lustre 2.10.8
    • None
    • 2
    • 9223372036854775807

    Description

      If a file/dir is striped with last stripe count set to -1 growing the file fails.

      $  lfs setstripe -E 256M -c 1 -E 16G -c 4 -E -1 -S 4M -c -1 pfldir
      $  echo hello > pfldir/test
      $ echo helpo >> pfldir/test 
      -bash: echo: write error: No space left on device
      
      $  lfs getstripe pfldir/test
      pfldir/test
        lcm_layout_gen:    3
        lcm_mirror_count:  1
        lcm_entry_count:   3
          lcme_id:             1
          lcme_mirror_id:      0
          lcme_flags:          init
          lcme_extent.e_start: 0
          lcme_extent.e_end:   268435456
            lmm_stripe_count:  1
            lmm_stripe_size:   1048576
            lmm_pattern:       raid0
            lmm_layout_gen:    0
            lmm_stripe_offset: 313
            lmm_objects:
            - 0: { l_ost_idx: 313, l_fid: [0x101390000:0x110967ab:0x0] }
      
          lcme_id:             2
          lcme_mirror_id:      0
          lcme_flags:          0
          lcme_extent.e_start: 268435456
          lcme_extent.e_end:   17179869184
            lmm_stripe_count:  4
            lmm_stripe_size:   1048576
            lmm_pattern:       raid0
            lmm_layout_gen:    0
            lmm_stripe_offset: -1
      
          lcme_id:             3
          lcme_mirror_id:      0
          lcme_flags:          0
          lcme_extent.e_start: 17179869184
          lcme_extent.e_end:   EOF
            lmm_stripe_count:  -1
            lmm_stripe_size:   4194304
            lmm_pattern:       raid0
            lmm_layout_gen:    0
            lmm_stripe_offset: -1
      
      $  lfs setstripe -E 256M -c 1 -E 16G -c 4 -E -1 -S 4M -c 10 pfldir
      $  echo hello > pfldir/test
      $  echo helpo >> pfldir/test 
      
      This worked.
      

      Attachments

        1. mdt.debug.gz
          147.23 MB
        2. mdt.pid.34027.out.gz
          167 kB
        3. r417i2n16.debug2.out.gz
          36 kB
        4. r417i2n16.out
          276 kB

        Activity

          [LU-12852] growing a PFL file with last stripe as -1 fails

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36947/
          Subject: LU-12852 pfl: restrict the stripe count correctly
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 6dc37759cfb22727ac5d776c38b72e8638563fd8

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36947/ Subject: LU-12852 pfl: restrict the stripe count correctly Project: fs/lustre-release Branch: master Current Patch Set: Commit: 6dc37759cfb22727ac5d776c38b72e8638563fd8

          Emoly Liu (emoly@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36947
          Subject: LU-12852 pfl: restrict the stripe count correctly
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 8ab6646313ba7c7391ed19bd705ca5e694b5823d

          gerrit Gerrit Updater added a comment - Emoly Liu (emoly@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36947 Subject: LU-12852 pfl: restrict the stripe count correctly Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 8ab6646313ba7c7391ed19bd705ca5e694b5823d
          pjones Peter Jones added a comment -

          Could you please create a patch to address this issue?

          pjones Peter Jones added a comment - Could you please create a patch to address this issue?

          It looks like there is already a function "lod_get_stripe_count()" that is supposed to be checking the maximum xattr size and restricting the stripe count to this limit. It may be that the calculation is slightly incorrect (e.g. not taking into account the xattr overhead), so changing this slightly would work:

          -        max_stripes = lov_mds_md_max_stripe_count(easize, LOV_MAGIC_V3);
          +        max_stripes = lov_mds_md_max_stripe_count(easize, LOV_MAGIC_V3) - 1;
          

          but I haven't tested this yet.

          adilger Andreas Dilger added a comment - It looks like there is already a function " lod_get_stripe_count() " that is supposed to be checking the maximum xattr size and restricting the stripe count to this limit. It may be that the calculation is slightly incorrect (e.g. not taking into account the xattr overhead), so changing this slightly would work: - max_stripes = lov_mds_md_max_stripe_count(easize, LOV_MAGIC_V3); + max_stripes = lov_mds_md_max_stripe_count(easize, LOV_MAGIC_V3) - 1; but I haven't tested this yet.

          "correct number of stripes" meaning what ever it can fit.  Like the non-PFL case. It shouldn't fail.

           

          mhanafi Mahmoud Hanafi added a comment - "correct number of stripes" meaning what ever it can fit.  Like the non-PFL case. It shouldn't fail.  

          Mahmoud, could you clarify what you mean by "correct number of stripes" in this case? Without the "ea_inode" feature, then PFL will just not have as much space to store stripes as a non-PFL file. Hopefully by "correct number of stripes" you mean "whatever will still fit into the remaining xattr space", which is probably about 150 in your case, but will vary based on the number and size of the previous components. If you enable the "ea_inode" feature then you would actually be able to store the full 342 stripes in the last component.

          adilger Andreas Dilger added a comment - Mahmoud, could you clarify what you mean by "correct number of stripes" in this case? Without the " ea_inode " feature, then PFL will just not have as much space to store stripes as a non-PFL file. Hopefully by "correct number of stripes" you mean "whatever will still fit into the remaining xattr space", which is probably about 150 in your case, but will vary based on the number and size of the previous components. If you enable the " ea_inode " feature then you would actually be able to store the full 342 stripes in the last component.

          I think this is a bug. PFL should create the correct number of stripes as with the non-PFL file. 

          mhanafi Mahmoud Hanafi added a comment - I think this is a bug. PFL should create the correct number of stripes as with the non-PFL file. 

          Part of the problem is that with PFL layouts, there is not room for the full 165 stripes to fit into the ~4KiB xattr space, because each component consumes some space (maybe 3 stripes worth each), and the layouts within those components also consumes space (1 and 4 stripes respectively). That means it would be possible to declare a layout that could use maybe 150 stripes in the third component without exceeding the 4KB xattr limit.

          Alternately, the ea_inode feature can be enabled on the MDT using the "tune2fs -O ea_inode /dev/<mdtdev>" in order to allow larger layouts (up to 2000 stripes). While this could potentially be done from the ldiskfs point of view while the MDT is mounted, the MDS code does not check for the maximum xattr size to change while it is mounted. Since this would need an MDT remount to take effect anyway, it may as well be done while the MDT is unmounted. In this case, the e2fsprogs should to be at least 1.44.5.wc1, but preferably the most recent version 1.45.2.wc1.

          adilger Andreas Dilger added a comment - Part of the problem is that with PFL layouts, there is not room for the full 165 stripes to fit into the ~4KiB xattr space, because each component consumes some space (maybe 3 stripes worth each), and the layouts within those components also consumes space (1 and 4 stripes respectively). That means it would be possible to declare a layout that could use maybe 150 stripes in the third component without exceeding the 4KB xattr limit. Alternately, the ea_inode feature can be enabled on the MDT using the " tune2fs -O ea_inode /dev/<mdtdev> " in order to allow larger layouts (up to 2000 stripes). While this could potentially be done from the ldiskfs point of view while the MDT is mounted, the MDS code does not check for the maximum xattr size to change while it is mounted. Since this would need an MDT remount to take effect anyway, it may as well be done while the MDT is unmounted. In this case, the e2fsprogs should to be at least 1.44.5.wc1, but preferably the most recent version 1.45.2.wc1.
          mhanafi Mahmoud Hanafi added a comment - - edited

          lfs setstripe -c -1 give 165 stripes.

          # lfs getstripe  /nobackupp2/mhanafi/WIDESTRIPE/test
          /nobackupp2/mhanafi/WIDESTRIPE/test
          lmm_stripe_count:  165
          lmm_stripe_size:   1048576
          lmm_pattern:       raid0
          lmm_layout_gen:    0
          lmm_stripe_offset: 242
                  obdidx           objid           objid           group
                     242       286064211     0x110cfe53                0
                     243       287444989     0x11220ffd                0
                     204       166993747      0x9f41f53                0
                     143       168014735      0xa03b38f                0
          
           

          This is what I expect with PFL, but it tries to set more than this.

          We don't have ea_inode feature enabled.

          Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery flex_bg dirdata sparse_super large_file huge_file uninit_bg dir_nlink extra_isize quota
          
           

           I thought I had tested  PFL in 2.10.6 and it used to work. We don't have 2.10.6 running anymore so I can't test.

          mhanafi Mahmoud Hanafi added a comment - - edited lfs setstripe -c -1 give 165 stripes. # lfs getstripe /nobackupp2/mhanafi/WIDESTRIPE/test /nobackupp2/mhanafi/WIDESTRIPE/test lmm_stripe_count: 165 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 242 obdidx objid objid group 242 286064211 0x110cfe53 0 243 287444989 0x11220ffd 0 204 166993747 0x9f41f53 0 143 168014735 0xa03b38f 0 This is what I expect with PFL, but it tries to set more than this. We don't have ea_inode feature enabled. Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery flex_bg dirdata sparse_super large_file huge_file uninit_bg dir_nlink extra_isize quota  I thought I had tested  PFL in 2.10.6 and it used to work. We don't have 2.10.6 running anymore so I can't test.

          Are you able to create a non-PFL file with "lfs setstripe -c -1" in this filesystem? With 342 OSTs this exceeds the normal 4KB limit for xattrs (160 stripes) unless the MDT has the "ea_inode" feature enabled.

          adilger Andreas Dilger added a comment - Are you able to create a non-PFL file with " lfs setstripe -c -1 " in this filesystem? With 342 OSTs this exceeds the normal 4KB limit for xattrs (160 stripes) unless the MDT has the " ea_inode " feature enabled.
          mhanafi Mahmoud Hanafi added a comment - Better debug mdt.pid.34027.out.gz r417i2n16.debug2.out.gz

          People

            emoly.liu Emoly Liu
            mhanafi Mahmoud Hanafi
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: