Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12852

growing a PFL file with last stripe as -1 fails

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.14.0, Lustre 2.12.5
    • Lustre 2.10.8
    • None
    • 2
    • 9223372036854775807

    Description

      If a file/dir is striped with last stripe count set to -1 growing the file fails.

      $  lfs setstripe -E 256M -c 1 -E 16G -c 4 -E -1 -S 4M -c -1 pfldir
      $  echo hello > pfldir/test
      $ echo helpo >> pfldir/test 
      -bash: echo: write error: No space left on device
      
      $  lfs getstripe pfldir/test
      pfldir/test
        lcm_layout_gen:    3
        lcm_mirror_count:  1
        lcm_entry_count:   3
          lcme_id:             1
          lcme_mirror_id:      0
          lcme_flags:          init
          lcme_extent.e_start: 0
          lcme_extent.e_end:   268435456
            lmm_stripe_count:  1
            lmm_stripe_size:   1048576
            lmm_pattern:       raid0
            lmm_layout_gen:    0
            lmm_stripe_offset: 313
            lmm_objects:
            - 0: { l_ost_idx: 313, l_fid: [0x101390000:0x110967ab:0x0] }
      
          lcme_id:             2
          lcme_mirror_id:      0
          lcme_flags:          0
          lcme_extent.e_start: 268435456
          lcme_extent.e_end:   17179869184
            lmm_stripe_count:  4
            lmm_stripe_size:   1048576
            lmm_pattern:       raid0
            lmm_layout_gen:    0
            lmm_stripe_offset: -1
      
          lcme_id:             3
          lcme_mirror_id:      0
          lcme_flags:          0
          lcme_extent.e_start: 17179869184
          lcme_extent.e_end:   EOF
            lmm_stripe_count:  -1
            lmm_stripe_size:   4194304
            lmm_pattern:       raid0
            lmm_layout_gen:    0
            lmm_stripe_offset: -1
      
      $  lfs setstripe -E 256M -c 1 -E 16G -c 4 -E -1 -S 4M -c 10 pfldir
      $  echo hello > pfldir/test
      $  echo helpo >> pfldir/test 
      
      This worked.
      

      Attachments

        1. mdt.debug.gz
          147.23 MB
        2. mdt.pid.34027.out.gz
          167 kB
        3. r417i2n16.debug2.out.gz
          36 kB
        4. r417i2n16.out
          276 kB

        Activity

          [LU-12852] growing a PFL file with last stripe as -1 fails
          pjones Peter Jones added a comment -

          Could you please create a patch to address this issue?

          pjones Peter Jones added a comment - Could you please create a patch to address this issue?

          It looks like there is already a function "lod_get_stripe_count()" that is supposed to be checking the maximum xattr size and restricting the stripe count to this limit. It may be that the calculation is slightly incorrect (e.g. not taking into account the xattr overhead), so changing this slightly would work:

          -        max_stripes = lov_mds_md_max_stripe_count(easize, LOV_MAGIC_V3);
          +        max_stripes = lov_mds_md_max_stripe_count(easize, LOV_MAGIC_V3) - 1;
          

          but I haven't tested this yet.

          adilger Andreas Dilger added a comment - It looks like there is already a function " lod_get_stripe_count() " that is supposed to be checking the maximum xattr size and restricting the stripe count to this limit. It may be that the calculation is slightly incorrect (e.g. not taking into account the xattr overhead), so changing this slightly would work: - max_stripes = lov_mds_md_max_stripe_count(easize, LOV_MAGIC_V3); + max_stripes = lov_mds_md_max_stripe_count(easize, LOV_MAGIC_V3) - 1; but I haven't tested this yet.

          "correct number of stripes" meaning what ever it can fit.  Like the non-PFL case. It shouldn't fail.

           

          mhanafi Mahmoud Hanafi added a comment - "correct number of stripes" meaning what ever it can fit.  Like the non-PFL case. It shouldn't fail.  

          Mahmoud, could you clarify what you mean by "correct number of stripes" in this case? Without the "ea_inode" feature, then PFL will just not have as much space to store stripes as a non-PFL file. Hopefully by "correct number of stripes" you mean "whatever will still fit into the remaining xattr space", which is probably about 150 in your case, but will vary based on the number and size of the previous components. If you enable the "ea_inode" feature then you would actually be able to store the full 342 stripes in the last component.

          adilger Andreas Dilger added a comment - Mahmoud, could you clarify what you mean by "correct number of stripes" in this case? Without the " ea_inode " feature, then PFL will just not have as much space to store stripes as a non-PFL file. Hopefully by "correct number of stripes" you mean "whatever will still fit into the remaining xattr space", which is probably about 150 in your case, but will vary based on the number and size of the previous components. If you enable the " ea_inode " feature then you would actually be able to store the full 342 stripes in the last component.

          I think this is a bug. PFL should create the correct number of stripes as with the non-PFL file. 

          mhanafi Mahmoud Hanafi added a comment - I think this is a bug. PFL should create the correct number of stripes as with the non-PFL file. 

          Part of the problem is that with PFL layouts, there is not room for the full 165 stripes to fit into the ~4KiB xattr space, because each component consumes some space (maybe 3 stripes worth each), and the layouts within those components also consumes space (1 and 4 stripes respectively). That means it would be possible to declare a layout that could use maybe 150 stripes in the third component without exceeding the 4KB xattr limit.

          Alternately, the ea_inode feature can be enabled on the MDT using the "tune2fs -O ea_inode /dev/<mdtdev>" in order to allow larger layouts (up to 2000 stripes). While this could potentially be done from the ldiskfs point of view while the MDT is mounted, the MDS code does not check for the maximum xattr size to change while it is mounted. Since this would need an MDT remount to take effect anyway, it may as well be done while the MDT is unmounted. In this case, the e2fsprogs should to be at least 1.44.5.wc1, but preferably the most recent version 1.45.2.wc1.

          adilger Andreas Dilger added a comment - Part of the problem is that with PFL layouts, there is not room for the full 165 stripes to fit into the ~4KiB xattr space, because each component consumes some space (maybe 3 stripes worth each), and the layouts within those components also consumes space (1 and 4 stripes respectively). That means it would be possible to declare a layout that could use maybe 150 stripes in the third component without exceeding the 4KB xattr limit. Alternately, the ea_inode feature can be enabled on the MDT using the " tune2fs -O ea_inode /dev/<mdtdev> " in order to allow larger layouts (up to 2000 stripes). While this could potentially be done from the ldiskfs point of view while the MDT is mounted, the MDS code does not check for the maximum xattr size to change while it is mounted. Since this would need an MDT remount to take effect anyway, it may as well be done while the MDT is unmounted. In this case, the e2fsprogs should to be at least 1.44.5.wc1, but preferably the most recent version 1.45.2.wc1.
          mhanafi Mahmoud Hanafi added a comment - - edited

          lfs setstripe -c -1 give 165 stripes.

          # lfs getstripe  /nobackupp2/mhanafi/WIDESTRIPE/test
          /nobackupp2/mhanafi/WIDESTRIPE/test
          lmm_stripe_count:  165
          lmm_stripe_size:   1048576
          lmm_pattern:       raid0
          lmm_layout_gen:    0
          lmm_stripe_offset: 242
                  obdidx           objid           objid           group
                     242       286064211     0x110cfe53                0
                     243       287444989     0x11220ffd                0
                     204       166993747      0x9f41f53                0
                     143       168014735      0xa03b38f                0
          
           

          This is what I expect with PFL, but it tries to set more than this.

          We don't have ea_inode feature enabled.

          Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery flex_bg dirdata sparse_super large_file huge_file uninit_bg dir_nlink extra_isize quota
          
           

           I thought I had tested  PFL in 2.10.6 and it used to work. We don't have 2.10.6 running anymore so I can't test.

          mhanafi Mahmoud Hanafi added a comment - - edited lfs setstripe -c -1 give 165 stripes. # lfs getstripe /nobackupp2/mhanafi/WIDESTRIPE/test /nobackupp2/mhanafi/WIDESTRIPE/test lmm_stripe_count: 165 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 242 obdidx objid objid group 242 286064211 0x110cfe53 0 243 287444989 0x11220ffd 0 204 166993747 0x9f41f53 0 143 168014735 0xa03b38f 0 This is what I expect with PFL, but it tries to set more than this. We don't have ea_inode feature enabled. Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery flex_bg dirdata sparse_super large_file huge_file uninit_bg dir_nlink extra_isize quota  I thought I had tested  PFL in 2.10.6 and it used to work. We don't have 2.10.6 running anymore so I can't test.

          Are you able to create a non-PFL file with "lfs setstripe -c -1" in this filesystem? With 342 OSTs this exceeds the normal 4KB limit for xattrs (160 stripes) unless the MDT has the "ea_inode" feature enabled.

          adilger Andreas Dilger added a comment - Are you able to create a non-PFL file with " lfs setstripe -c -1 " in this filesystem? With 342 OSTs this exceeds the normal 4KB limit for xattrs (160 stripes) unless the MDT has the " ea_inode " feature enabled.
          mhanafi Mahmoud Hanafi added a comment - Better debug mdt.pid.34027.out.gz r417i2n16.debug2.out.gz
          mhanafi Mahmoud Hanafi added a comment - - edited

          There are 342 OSTs. we don't have LU-9341.

          client nid is 10.151.11.62@o2ib

           r417i2n16.out

           

          mdt.debug.gz

           

           

          when you try lfs migrate you get error

           

           # lfs migrate  text.txt
          lfs migrate: cannot get group lock: No space left on device (28)
          error: lfs migrate: /nobackupp2/whzhu/text.txt: cannot get group lock: No space left on device
          r417i2n16 ~ # 
          mhanafi Mahmoud Hanafi added a comment - - edited There are 342 OSTs. we don't have LU-9341 . client nid is 10.151.11.62@o2ib   r417i2n16.out   mdt.debug.gz     when you try lfs migrate you get error   # lfs migrate text.txt lfs migrate: cannot get group lock: No space left on device (28) error: lfs migrate: /nobackupp2/whzhu/text.txt: cannot get group lock: No space left on device r417i2n16 ~ #

          Hi Mahmoud, could you please collect the console logs from the client and MDS around the time that the error is hit, if there is anything printed. If nothing interesting is shown, please collect "lctl dk" logs from the client and MDS around this time.

          I tested this with my local 2.10.6 client and didn't have any problems.

          $ lfs setstripe -E 32M -c 1 -S 1M -E 10G -c 4 -E -1 -c -1 -S 4M /myth/tmp/tmp/pfl
          $ echo hello > /myth/tmp/tmp/pfl
          $ echo hellop >> /myth/tmp/tmp/pfl
          $ lfs getstripe /myth/tmp/tmp/pfl2
          /myth/tmp/tmp/pfl2
            lcm_layout_gen:  4
            lcm_entry_count: 3
              lcme_id:             1
              lcme_flags:          init
              lcme_extent.e_start: 0
              lcme_extent.e_end:   33554432
                lmm_stripe_count:  1
                lmm_stripe_size:   1048576
                lmm_pattern:       1
                lmm_layout_gen:    0
                lmm_stripe_offset: 4
                lmm_objects:
                - 0: { l_ost_idx: 4, l_fid: [0x100040000:0x3121d7:0x0] }
          
              lcme_id:             2
              lcme_flags:          init
              lcme_extent.e_start: 33554432
              lcme_extent.e_end:   10737418240
                lmm_stripe_count:  4
                lmm_stripe_size:   1048576
                lmm_pattern:       1
                lmm_layout_gen:    0
                lmm_stripe_offset: 0
                lmm_objects:
                - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x22cb28:0x0] }
                - 1: { l_ost_idx: 1, l_fid: [0x100010000:0x1b19ec:0x0] }
                - 2: { l_ost_idx: 2, l_fid: [0x100020000:0x26cea4:0x0] }
                - 3: { l_ost_idx: 3, l_fid: [0x100030000:0x2223f3:0x0] }
          
              lcme_id:             3
              lcme_flags:          init
              lcme_extent.e_start: 10737418240
              lcme_extent.e_end:   EOF
                lmm_stripe_count:  5
                lmm_stripe_size:   4194304
                lmm_pattern:       1
                lmm_layout_gen:    0
                lmm_stripe_offset: 0
                lmm_objects:
                - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x22cb29:0x0] }
                - 1: { l_ost_idx: 1, l_fid: [0x100010000:0x1b19ed:0x0] }
                - 2: { l_ost_idx: 2, l_fid: [0x100020000:0x26cea5:0x0] }
                - 3: { l_ost_idx: 3, l_fid: [0x100030000:0x2223f4:0x0] }
                - 4: { l_ost_idx: 4, l_fid: [0x100040000:0x3121d8:0x0] }
          

          How many OSTs in the filesystem? Is there any chance that you have the patch https://review.whamcloud.com/35617 "LU-9341 lod: Add special O_APPEND striping" applied?

          adilger Andreas Dilger added a comment - Hi Mahmoud, could you please collect the console logs from the client and MDS around the time that the error is hit, if there is anything printed. If nothing interesting is shown, please collect " lctl dk " logs from the client and MDS around this time. I tested this with my local 2.10.6 client and didn't have any problems. $ lfs setstripe -E 32M -c 1 -S 1M -E 10G -c 4 -E -1 -c -1 -S 4M /myth/tmp/tmp/pfl $ echo hello > /myth/tmp/tmp/pfl $ echo hellop >> /myth/tmp/tmp/pfl $ lfs getstripe /myth/tmp/tmp/pfl2 /myth/tmp/tmp/pfl2 lcm_layout_gen: 4 lcm_entry_count: 3 lcme_id: 1 lcme_flags: init lcme_extent.e_start: 0 lcme_extent.e_end: 33554432 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: 1 lmm_layout_gen: 0 lmm_stripe_offset: 4 lmm_objects: - 0: { l_ost_idx: 4, l_fid: [0x100040000:0x3121d7:0x0] } lcme_id: 2 lcme_flags: init lcme_extent.e_start: 33554432 lcme_extent.e_end: 10737418240 lmm_stripe_count: 4 lmm_stripe_size: 1048576 lmm_pattern: 1 lmm_layout_gen: 0 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x22cb28:0x0] } - 1: { l_ost_idx: 1, l_fid: [0x100010000:0x1b19ec:0x0] } - 2: { l_ost_idx: 2, l_fid: [0x100020000:0x26cea4:0x0] } - 3: { l_ost_idx: 3, l_fid: [0x100030000:0x2223f3:0x0] } lcme_id: 3 lcme_flags: init lcme_extent.e_start: 10737418240 lcme_extent.e_end: EOF lmm_stripe_count: 5 lmm_stripe_size: 4194304 lmm_pattern: 1 lmm_layout_gen: 0 lmm_stripe_offset: 0 lmm_objects: - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x22cb29:0x0] } - 1: { l_ost_idx: 1, l_fid: [0x100010000:0x1b19ed:0x0] } - 2: { l_ost_idx: 2, l_fid: [0x100020000:0x26cea5:0x0] } - 3: { l_ost_idx: 3, l_fid: [0x100030000:0x2223f4:0x0] } - 4: { l_ost_idx: 4, l_fid: [0x100040000:0x3121d8:0x0] } How many OSTs in the filesystem? Is there any chance that you have the patch https://review.whamcloud.com/35617 " LU-9341 lod: Add special O_APPEND striping " applied?

          People

            emoly.liu Emoly Liu
            mhanafi Mahmoud Hanafi
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: