[LU-14549] getxattr for lustre.lov and trusted.lov sometimes wrong after mirror split Created: 24/Mar/21 Updated: 16/Jul/21 Resolved: 27/May/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | John Hammond | Assignee: | Zhenyu Xu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | bjhpflr | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
4/10 times running the following I see the old layout returned from getfattr: rm -f f0 echo XXX > f0 lfs mirror extend -N -p fast f0 cat f0 lfs mirror split --delete --pool=fast f0 getfattr -n trusted.lov -e text --only-values f0 | hexdump -C lfs getstripe f0 # rm -f f0
# echo XXX > f0
# lfs mirror extend -N -p fast f0
# cat f0
XXX
# lfs mirror split --delete --pool=fast f0
# getfattr -n trusted.lov -e text --only-values f0 | hexdump -C
00000000 d0 0b d6 0b 20 01 00 00 02 00 00 00 01 00 02 00 |.... ...........|
00000010 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000020 01 00 01 00 10 00 00 00 00 00 00 00 00 00 00 00 |................|
00000030 ff ff ff ff ff ff ff ff 80 00 00 00 38 00 00 00 |............8...|
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000050 01 00 02 00 10 00 00 00 00 00 00 00 00 00 00 00 |................|
00000060 ff ff ff ff ff ff ff ff b8 00 00 00 48 00 00 00 |............H...|
00000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000080 d0 0b d1 0b 01 00 00 00 00 00 00 00 00 00 00 00 |................|
00000090 00 00 00 00 00 00 00 00 00 00 10 00 01 00 00 00 |................|
000000a0 23 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |#...............|
000000b0 00 00 00 00 01 00 00 00 d0 0b d3 0b 01 00 00 00 |................|
000000c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000000d0 00 00 10 00 01 00 00 00 66 61 73 74 00 00 00 00 |........fast....|
000000e0 00 00 00 00 00 00 00 00 24 00 00 00 00 00 00 00 |........$.......|
000000f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000120
# lfs getstripe f0
f0
lcm_layout_gen: 3
lcm_mirror_count: 1
lcm_entry_count: 1
lcme_id: 65537
lcme_mirror_id: 1
lcme_flags: init
lcme_extent.e_start: 0
lcme_extent.e_end: EOF
lmm_stripe_count: 1
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 1
lmm_objects:
- 0: { l_ost_idx: 1, l_fid: [0x100010000:0x23:0x0] }
|
| Comments |
| Comment by Gerrit Updater [ 17/May/21 ] |
|
Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/43716 |
| Comment by Andreas Dilger [ 17/May/21 ] |
|
Strangely, I was testing this on my local filesystem (2.12 client, 2.14 server, without the 43718 patch) the layout generation was reset to "1" after "lfs mirror extend": $ lfs getstripe /myth/tmp/adilger/flr-multi
lcm_layout_gen: 8
lcm_mirror_count: 2
lcm_entry_count: 4
lcme_id: 65537
lcme_mirror_id: 1
lmm_layout_gen: 0
:
lcme_id: 65538
lcme_mirror_id: 1
lmm_layout_gen: 0
:
lcme_id: 65539
lcme_mirror_id: 1
lmm_layout_gen: 0
:
lcme_id: 131076
lcme_mirror_id: 2
:
$ lfs mirror extend -N -c 1 /myth/tmp/adilger/flr-multi
$ getfattr -n lustre.lov --only-values /myth/tmp/adilger/flr-multi | od -tx4 |& awk '/000000 / { print "0x"$4; exit; }'
0000000 0bd60bd0 000002a8 00000001 00050001
$ $ lfs getstripe /myth/tmp/adilger/flr-multi
/myth/tmp/adilger/flr-multi
lcm_layout_gen: 1
lcm_mirror_count: 3
lcm_entry_count: 5
somehow the lcm_layout_gen went from 8 to 1 on a file that hadn't been modified in a long time (created 2021-04-07, and both client and server have been rebooted since then)? If I add new mirrors to that file now, lcm_layout_gen is being incremented properly, but it seems like it isn't being read from disk the first time? |
| Comment by Zhenyu Xu [ 19/May/21 ] |
|
My theory of layout gen went from 8 to 1 is this, in current LOD code, the layout merge increases the layout generation and write to LOVEA lod_declare_layout_merge ... 3379 /* fixup layout information */ 3380 lod_obj_inc_layout_gen(lo); 3381 lcm->lcm_layout_gen = cpu_to_le32(lo->ldo_layout_gen); 3382 lcm->lcm_size = cpu_to_le32(size); 3383 lcm->lcm_entry_count = cpu_to_le16(cur_entry_count + merge_entry_count); 3384 lcm->lcm_mirror_count = cpu_to_le16(mirror_count); 3385 if ((le16_to_cpu(lcm->lcm_flags) & LCM_FL_FLR_MASK) == LCM_FL_NONE) 3386 lcm->lcm_flags = cpu_to_le32(LCM_FL_RDONLY); 3387 3388 rc = lod_striping_reload(env, lo, buf); 3389 if (rc) 3390 GOTO(out, rc); 3391 where the lod_object (@lo) could possibly be not initialized and line 3380 increases the layout_gen to 1 and line 3388 set the lod_object's in-memory layout. While patch https://review.whamcloud.com/43472/ fixes this as it set the lod_object's in-memory layout first, then increases its layout generation lod_declare_layout_merege in #43472 3409 /* fixup layout information */ 3410 lcm->lcm_size = cpu_to_le32(size); 3411 lcm->lcm_entry_count = cpu_to_le16(cur_entry_count + merge_entry_count); 3412 lcm->lcm_mirror_count = cpu_to_le16(mirror_count); 3413 if ((le16_to_cpu(lcm->lcm_flags) & LCM_FL_FLR_MASK) == LCM_FL_NONE) 3414 lcm->lcm_flags = cpu_to_le32(LCM_FL_RDONLY); 3415 3416 rc = lod_striping_reload(env, lo, buf); 3417 if (rc) 3418 GOTO(out, rc); 3419 3420 lod_obj_inc_layout_gen(lo); 3421 lcm->lcm_layout_gen = cpu_to_le32(lo->ldo_layout_gen); 3422 |
| Comment by Gerrit Updater [ 27/May/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43716/ |
| Comment by Peter Jones [ 27/May/21 ] |
|
Landed for 2.15 |