[LU-17261] stat(2) should be able to use a good replica Created: 03/Nov/23  Updated: 11/Jan/24

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Alex Zhuravlev Assignee: Alex Zhuravlev
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

a replica representation in LOV EA can be broken like this one:

lcm_layout_gen: 7
lcm_mirror_count: 2
lcm_entry_count: 2
lcme_id: 65538
lcme_mirror_id: 1
lcme_flags: init,stale
lcme_extent.e_start: 134217728
lcme_extent.e_end: 1073741824
lmm_stripe_count: 16
lmm_stripe_size: 16777216
lmm_pattern: 40000001
lmm_layout_gen: 1
lmm_stripe_offset: 4294967295
lmm_objects:
- 0: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
- 1: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
- 2: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
- 3: { l_ost_idx: 5, l_fid: [0xbc0000406:0x42fce0eb:0x0] }
- 4: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
- 5: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
- 6: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
- 7: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
- 8: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
- 9: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
- 10: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
- 11: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
- 12: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
- 13: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
- 14: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
- 15: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
 
lcme_id: 131073
lcme_mirror_id: 2
lcme_flags: init
lcme_extent.e_start: 0
lcme_extent.e_end: EOF
lmm_stripe_count: 16
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 5
lmm_pool: hdd-pool
lmm_objects:
- 0: { l_ost_idx: 5, l_fid: [0xbc0000406:0x42feb0aa:0x0] }
- 1: { l_ost_idx: 8, l_fid: [0x8c0000402:0x3bf10cb:0x0] }
- 2: { l_ost_idx: 15, l_fid: [0x9c0000402:0x11f1d8f:0x0] }
- 3: { l_ost_idx: 13, l_fid: [0x900000402:0x77529c35:0x0] }
- 4: { l_ost_idx: 0, l_fid: [0x300000403:0x3beded4:0x0] }
- 5: { l_ost_idx: 7, l_fid: [0xa80000402:0x11e9898:0x0] }
- 6: { l_ost_idx: 12, l_fid: [0x880000402:0x3bef34f:0x0] }
- 7: { l_ost_idx: 10, l_fid: [0xa40000402:0x11d9d9d:0x0] }
- 8: { l_ost_idx: 14, l_fid: [0xa00000402:0x11e4d68:0x0] }
- 9: { l_ost_idx: 2, l_fid: [0xb80000402:0x11d545a:0x0] }
- 10: { l_ost_idx: 6, l_fid: [0xb40000400:0x11f22d9:0x0] }
- 11: { l_ost_idx: 4, l_fid: [0x2c0000403:0x4016eb6:0x0] }
- 12: { l_ost_idx: 9, l_fid: [0x940000402:0xaf7b184a:0x0] }
- 13: { l_ost_idx: 11, l_fid: [0x980000402:0x11dc273:0x0] }
- 14: { l_ost_idx: 1, l_fid: [0xac0000404:0x3015313c:0x0] }
- 15: { l_ost_idx: 3, l_fid: [0xb00000400:0x11dd9bb:0x0] } 

but regular stat should use any valid replica, not return an error once a bogus one is met.



 Comments   
Comment by Andreas Dilger [ 03/Nov/23 ]

I think we prefer to stat both mirrors, so that the blocks count is updated. However, it is OK to stat only one copy of the other is broken.

Comment by Alex Zhuravlev [ 03/Nov/23 ]

this is what happens literally:

00020000:00000001:4.0:1698862926.643457:0:3746:0:(lov_ea.c:251:lsme_unpack()) Process leaving via out_lsme (rc=18446744073709551594 : -22 : 0xffffffffffffffea)
00020000:00000010:4.0:1698862926.643458:0:3746:0:(lov_ea.c:290:lsme_unpack()) slab-freed '(lsme->lsme_oinfo[i])': 112 at ffff9d87eab4c480.
00020000:00000010:4.0:1698862926.643459:0:3746:0:(lov_ea.c:292:lsme_unpack()) kfreed 'lsme': 128 at ffff9d87eab4ce80.
00020000:00000001:4.0:1698862926.643460:0:3746:0:(lov_ea.c:487:lsm_unpackmd_comp_md_v1()) Process leaving via out_lsm (rc=18446744073709551594 : -22 : 0xffffffffffffffea)
00020000:00000010:4.0:1698862926.643461:0:3746:0:(lov_ea.c:522:lsm_unpackmd_comp_md_v1()) kfreed 'lsm': 72 at ffff9d8e3015c360.
00020000:00000001:4.0:1698862926.643462:0:3746:0:(lov_ea.c:524:lsm_unpackmd_comp_md_v1()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
00020000:00000001:4.0:1698862926.643463:0:3746:0:(lov_pack.c:345:lov_unpackmd()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
00020000:00000001:4.0:1698862926.643464:0:3746:0:(lov_object.c:1299:lov_object_init()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
00020000:00000001:4.0:1698862926.643467:0:3746:0:(lov_object.c:1386:lov_object_delete()) Process entered
00020000:00000002:4.0:1698862926.643467:0:3746:0:(lov_object.c:1125:lov_conf_freeze()) To take share lov(ffff9d8c2ce92170) owner           (null)/ffff9d82c6d80000
00020000:00000001:5.0:1698862926.643490:0:3746:0:(lov_object.c:1191:lov_layout_wait()) Process entered
00020000:00000001:5.0:1698862926.643491:0:3746:0:(lov_object.c:1201:lov_layout_wait()) Process leaving (rc=0 : 0 : 0)
00020000:00000002:5.0:1698862926.643492:0:3746:0:(lov_object.c:1133:lov_conf_thaw()) To release share lov(ffff9d8c2ce92170) owner           (null)/ffff9d82c6d80000
00020000:00000001:5.0:1698862926.643493:0:3746:0:(lov_object.c:1388:lov_object_delete()) Process leaving
00020000:00000001:5.0:1698862926.643494:0:3746:0:(lov_object.c:1395:lov_object_free()) Process entered
00020000:00000002:5.0:1698862926.643494:0:3746:0:(lov_object.c:1125:lov_conf_freeze()) To take share lov(ffff9d8c2ce92170) owner           (null)/ffff9d82c6d80000
00020000:00000002:5.0:1698862926.643495:0:3746:0:(lov_object.c:1133:lov_conf_thaw()) To release share lov(ffff9d8c2ce92170) owner           (null)/ffff9d82c6d80000
00020000:00000010:5.0:1698862926.643497:0:3746:0:(lov_object.c:1398:lov_object_free()) slab-freed '(lov)': 184 at ffff9d8c2ce92170.
00020000:00000001:5.0:1698862926.643499:0:3746:0:(lov_object.c:1399:lov_object_free()) Process leaving
00000080:00000010:5.0:1698862926.643500:0:3746:0:(vvp_object.c:272:vvp_object_free()) slab-freed '(vob)': 176 at ffff9d92e4f2cdc0.
00000020:00000001:5.0:1698862926.643502:0:3746:0:(lu_object.c:814:lu_object_find_at()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
00000080:00020000:5.0:1698862926.643504:0:3746:0:(lcommon_cl.c:197:cl_file_inode_init()) nbp19: failed to initialize cl_object [0x3c005662a:0xc4f1:0x0]: rc = -22
00000020:00001000:5.0:1698862926.646200:0:3746:0:(cl_object.c:832:cl_env_put()) 1@ffff9d86304433f0
00000080:00000001:5.0:1698862926.646203:0:3746:0:(llite_lib.c:2260:ll_read_inode2()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)

the very first error leads to an user visible error.

Comment by Gerrit Updater [ 05/Nov/23 ]

"Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52996
Subject: LU-17261 lov: ignore broken components
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 716fc5db86bebaee42c6c61ebc5d1098d741c453

Comment by Andreas Dilger [ 29/Nov/23 ]

I tried to reproduce the exact situation seen in the broken files with an added test cases in patch https://review.whamcloud.com/53276 "LU-17261 tests: delete partial broken mirror is OK" with mixed success.

When LFSCK is re-attaching the orphan OST object to the mirrored file it created a different layout than seen in the original system. Originally the layout component looks like:

      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_objects:
      -   0: { l_ost_idx:   0, l_fid: [0x280000401:0x3:0x0] }
      -   1: { l_ost_idx:   1, l_fid: [0x2c0000401:0x4:0x0] }
      lmm_pattern:       40000001
      lmm_layout_gen:    1
      lmm_objects:
      -   0: { l_ost_idx:   0, l_fid: [0x280000401:0x3:0x0] }
      -   1: { l_ost_idx:   0, l_fid: [0x100000000:0x0:0x0] }

This used "LOV_PATTERN_F_HOLE = 0x40000000" and a different FID for the missing OST objects, instead of "l_ost_idx: -1, l_fid: [0:0x0:0x0]" seen on the other files. The usage of LOV_PATTERN_F_HOLE was added for PFL files in commit v2_9_55_0-27-ge2cdf469b0 and for plain files in commit v2_5_59_0-34-g754bf71c65.

So it isn't clear why the mirrors were recreated by LFSCK with "l_ost_idx: -1" in the original case?

Comment by Gerrit Updater [ 20/Dec/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52996/
Subject: LU-17261 lov: ignore broken components
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 902fe290e51dccdee89380fb725ae6e3c1802e2b

Generated at Sat Feb 10 03:33:58 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.