[LU-12852] growing a PFL file with last stripe as -1 fails Created: 12/Oct/19  Updated: 20/May/20  Resolved: 08/Feb/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.8
Fix Version/s: Lustre 2.14.0, Lustre 2.12.5

Type: Bug Priority: Critical
Reporter: Mahmoud Hanafi Assignee: Emoly Liu
Resolution: Fixed Votes: 0
Labels: None

Attachments: File mdt.debug.gz     File mdt.pid.34027.out.gz     File r417i2n16.debug2.out.gz     File r417i2n16.out    
Issue Links:
Related
Severity: 2
Rank (Obsolete): 9223372036854775807

 Description   

If a file/dir is striped with last stripe count set to -1 growing the file fails.

$  lfs setstripe -E 256M -c 1 -E 16G -c 4 -E -1 -S 4M -c -1 pfldir
$  echo hello > pfldir/test
$ echo helpo >> pfldir/test 
-bash: echo: write error: No space left on device

$  lfs getstripe pfldir/test
pfldir/test
  lcm_layout_gen:    3
  lcm_mirror_count:  1
  lcm_entry_count:   3
    lcme_id:             1
    lcme_mirror_id:      0
    lcme_flags:          init
    lcme_extent.e_start: 0
    lcme_extent.e_end:   268435456
      lmm_stripe_count:  1
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 313
      lmm_objects:
      - 0: { l_ost_idx: 313, l_fid: [0x101390000:0x110967ab:0x0] }

    lcme_id:             2
    lcme_mirror_id:      0
    lcme_flags:          0
    lcme_extent.e_start: 268435456
    lcme_extent.e_end:   17179869184
      lmm_stripe_count:  4
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: -1

    lcme_id:             3
    lcme_mirror_id:      0
    lcme_flags:          0
    lcme_extent.e_start: 17179869184
    lcme_extent.e_end:   EOF
      lmm_stripe_count:  -1
      lmm_stripe_size:   4194304
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: -1

$  lfs setstripe -E 256M -c 1 -E 16G -c 4 -E -1 -S 4M -c 10 pfldir
$  echo hello > pfldir/test
$  echo helpo >> pfldir/test 

This worked.


 Comments   
Comment by Andreas Dilger [ 12/Oct/19 ]

Hi Mahmoud, could you please collect the console logs from the client and MDS around the time that the error is hit, if there is anything printed. If nothing interesting is shown, please collect "lctl dk" logs from the client and MDS around this time.

I tested this with my local 2.10.6 client and didn't have any problems.

$ lfs setstripe -E 32M -c 1 -S 1M -E 10G -c 4 -E -1 -c -1 -S 4M /myth/tmp/tmp/pfl
$ echo hello > /myth/tmp/tmp/pfl
$ echo hellop >> /myth/tmp/tmp/pfl
$ lfs getstripe /myth/tmp/tmp/pfl2
/myth/tmp/tmp/pfl2
  lcm_layout_gen:  4
  lcm_entry_count: 3
    lcme_id:             1
    lcme_flags:          init
    lcme_extent.e_start: 0
    lcme_extent.e_end:   33554432
      lmm_stripe_count:  1
      lmm_stripe_size:   1048576
      lmm_pattern:       1
      lmm_layout_gen:    0
      lmm_stripe_offset: 4
      lmm_objects:
      - 0: { l_ost_idx: 4, l_fid: [0x100040000:0x3121d7:0x0] }

    lcme_id:             2
    lcme_flags:          init
    lcme_extent.e_start: 33554432
    lcme_extent.e_end:   10737418240
      lmm_stripe_count:  4
      lmm_stripe_size:   1048576
      lmm_pattern:       1
      lmm_layout_gen:    0
      lmm_stripe_offset: 0
      lmm_objects:
      - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x22cb28:0x0] }
      - 1: { l_ost_idx: 1, l_fid: [0x100010000:0x1b19ec:0x0] }
      - 2: { l_ost_idx: 2, l_fid: [0x100020000:0x26cea4:0x0] }
      - 3: { l_ost_idx: 3, l_fid: [0x100030000:0x2223f3:0x0] }

    lcme_id:             3
    lcme_flags:          init
    lcme_extent.e_start: 10737418240
    lcme_extent.e_end:   EOF
      lmm_stripe_count:  5
      lmm_stripe_size:   4194304
      lmm_pattern:       1
      lmm_layout_gen:    0
      lmm_stripe_offset: 0
      lmm_objects:
      - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x22cb29:0x0] }
      - 1: { l_ost_idx: 1, l_fid: [0x100010000:0x1b19ed:0x0] }
      - 2: { l_ost_idx: 2, l_fid: [0x100020000:0x26cea5:0x0] }
      - 3: { l_ost_idx: 3, l_fid: [0x100030000:0x2223f4:0x0] }
      - 4: { l_ost_idx: 4, l_fid: [0x100040000:0x3121d8:0x0] }

How many OSTs in the filesystem? Is there any chance that you have the patch https://review.whamcloud.com/35617 "LU-9341 lod: Add special O_APPEND striping" applied?

Comment by Mahmoud Hanafi [ 12/Oct/19 ]

There are 342 OSTs. we don't have LU-9341.

client nid is 10.151.11.62@o2ib

 r417i2n16.out

 

mdt.debug.gz

 

 

when you try lfs migrate you get error

 


 # lfs migrate  text.txt
lfs migrate: cannot get group lock: No space left on device (28)
error: lfs migrate: /nobackupp2/whzhu/text.txt: cannot get group lock: No space left on device
r417i2n16 ~ # 
Comment by Mahmoud Hanafi [ 12/Oct/19 ]

Better debug

mdt.pid.34027.out.gz

r417i2n16.debug2.out.gz

Comment by Andreas Dilger [ 12/Oct/19 ]

Are you able to create a non-PFL file with "lfs setstripe -c -1" in this filesystem? With 342 OSTs this exceeds the normal 4KB limit for xattrs (160 stripes) unless the MDT has the "ea_inode" feature enabled.

Comment by Mahmoud Hanafi [ 12/Oct/19 ]

lfs setstripe -c -1 give 165 stripes.

# lfs getstripe  /nobackupp2/mhanafi/WIDESTRIPE/test
/nobackupp2/mhanafi/WIDESTRIPE/test
lmm_stripe_count:  165
lmm_stripe_size:   1048576
lmm_pattern:       raid0
lmm_layout_gen:    0
lmm_stripe_offset: 242
        obdidx           objid           objid           group
           242       286064211     0x110cfe53                0
           243       287444989     0x11220ffd                0
           204       166993747      0x9f41f53                0
           143       168014735      0xa03b38f                0

 

This is what I expect with PFL, but it tries to set more than this.

We don't have ea_inode feature enabled.

Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery flex_bg dirdata sparse_super large_file huge_file uninit_bg dir_nlink extra_isize quota

 

 I thought I had tested  PFL in 2.10.6 and it used to work. We don't have 2.10.6 running anymore so I can't test.

Comment by Andreas Dilger [ 12/Oct/19 ]

Part of the problem is that with PFL layouts, there is not room for the full 165 stripes to fit into the ~4KiB xattr space, because each component consumes some space (maybe 3 stripes worth each), and the layouts within those components also consumes space (1 and 4 stripes respectively). That means it would be possible to declare a layout that could use maybe 150 stripes in the third component without exceeding the 4KB xattr limit.

Alternately, the ea_inode feature can be enabled on the MDT using the "tune2fs -O ea_inode /dev/<mdtdev>" in order to allow larger layouts (up to 2000 stripes). While this could potentially be done from the ldiskfs point of view while the MDT is mounted, the MDS code does not check for the maximum xattr size to change while it is mounted. Since this would need an MDT remount to take effect anyway, it may as well be done while the MDT is unmounted. In this case, the e2fsprogs should to be at least 1.44.5.wc1, but preferably the most recent version 1.45.2.wc1.

Comment by Mahmoud Hanafi [ 12/Oct/19 ]

I think this is a bug. PFL should create the correct number of stripes as with the non-PFL file. 

Comment by Andreas Dilger [ 15/Oct/19 ]

Mahmoud, could you clarify what you mean by "correct number of stripes" in this case? Without the "ea_inode" feature, then PFL will just not have as much space to store stripes as a non-PFL file. Hopefully by "correct number of stripes" you mean "whatever will still fit into the remaining xattr space", which is probably about 150 in your case, but will vary based on the number and size of the previous components. If you enable the "ea_inode" feature then you would actually be able to store the full 342 stripes in the last component.

Comment by Mahmoud Hanafi [ 16/Oct/19 ]

"correct number of stripes" meaning what ever it can fit.  Like the non-PFL case. It shouldn't fail.

 

Comment by Andreas Dilger [ 17/Oct/19 ]

It looks like there is already a function "lod_get_stripe_count()" that is supposed to be checking the maximum xattr size and restricting the stripe count to this limit. It may be that the calculation is slightly incorrect (e.g. not taking into account the xattr overhead), so changing this slightly would work:

-        max_stripes = lov_mds_md_max_stripe_count(easize, LOV_MAGIC_V3);
+        max_stripes = lov_mds_md_max_stripe_count(easize, LOV_MAGIC_V3) - 1;

but I haven't tested this yet.

Comment by Peter Jones [ 05/Dec/19 ]

Could you please create a patch to address this issue?

Comment by Gerrit Updater [ 06/Dec/19 ]

Emoly Liu (emoly@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36947
Subject: LU-12852 pfl: restrict the stripe count correctly
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8ab6646313ba7c7391ed19bd705ca5e694b5823d

Comment by Gerrit Updater [ 08/Feb/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36947/
Subject: LU-12852 pfl: restrict the stripe count correctly
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 6dc37759cfb22727ac5d776c38b72e8638563fd8

Comment by Peter Jones [ 08/Feb/20 ]

Landed for 2.14

Comment by Gerrit Updater [ 10/Feb/20 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37512
Subject: LU-12852 pfl: restrict the stripe count correctly
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 6a95331bcb4d4f0dd9c7c8a152509e1a6652042c

Comment by Gerrit Updater [ 01/May/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37512/
Subject: LU-12852 pfl: restrict the stripe count correctly
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 454771e39c556ac5a4b290d2bbf603dc7f308fdf

Generated at Sat Feb 10 02:56:12 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.