[LU-17261] stat(2) should be able to use a good replica Created: 03/Nov/23 Updated: 11/Jan/24 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Alex Zhuravlev | Assignee: | Alex Zhuravlev |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
a replica representation in LOV EA can be broken like this one:
lcm_layout_gen: 7
lcm_mirror_count: 2
lcm_entry_count: 2
lcme_id: 65538
lcme_mirror_id: 1
lcme_flags: init,stale
lcme_extent.e_start: 134217728
lcme_extent.e_end: 1073741824
lmm_stripe_count: 16
lmm_stripe_size: 16777216
lmm_pattern: 40000001
lmm_layout_gen: 1
lmm_stripe_offset: 4294967295
lmm_objects:
- 0: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
- 1: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
- 2: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
- 3: { l_ost_idx: 5, l_fid: [0xbc0000406:0x42fce0eb:0x0] }
- 4: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
- 5: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
- 6: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
- 7: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
- 8: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
- 9: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
- 10: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
- 11: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
- 12: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
- 13: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
- 14: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
- 15: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
lcme_id: 131073
lcme_mirror_id: 2
lcme_flags: init
lcme_extent.e_start: 0
lcme_extent.e_end: EOF
lmm_stripe_count: 16
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 5
lmm_pool: hdd-pool
lmm_objects:
- 0: { l_ost_idx: 5, l_fid: [0xbc0000406:0x42feb0aa:0x0] }
- 1: { l_ost_idx: 8, l_fid: [0x8c0000402:0x3bf10cb:0x0] }
- 2: { l_ost_idx: 15, l_fid: [0x9c0000402:0x11f1d8f:0x0] }
- 3: { l_ost_idx: 13, l_fid: [0x900000402:0x77529c35:0x0] }
- 4: { l_ost_idx: 0, l_fid: [0x300000403:0x3beded4:0x0] }
- 5: { l_ost_idx: 7, l_fid: [0xa80000402:0x11e9898:0x0] }
- 6: { l_ost_idx: 12, l_fid: [0x880000402:0x3bef34f:0x0] }
- 7: { l_ost_idx: 10, l_fid: [0xa40000402:0x11d9d9d:0x0] }
- 8: { l_ost_idx: 14, l_fid: [0xa00000402:0x11e4d68:0x0] }
- 9: { l_ost_idx: 2, l_fid: [0xb80000402:0x11d545a:0x0] }
- 10: { l_ost_idx: 6, l_fid: [0xb40000400:0x11f22d9:0x0] }
- 11: { l_ost_idx: 4, l_fid: [0x2c0000403:0x4016eb6:0x0] }
- 12: { l_ost_idx: 9, l_fid: [0x940000402:0xaf7b184a:0x0] }
- 13: { l_ost_idx: 11, l_fid: [0x980000402:0x11dc273:0x0] }
- 14: { l_ost_idx: 1, l_fid: [0xac0000404:0x3015313c:0x0] }
- 15: { l_ost_idx: 3, l_fid: [0xb00000400:0x11dd9bb:0x0] }
but regular stat should use any valid replica, not return an error once a bogus one is met. |
| Comments |
| Comment by Andreas Dilger [ 03/Nov/23 ] |
|
I think we prefer to stat both mirrors, so that the blocks count is updated. However, it is OK to stat only one copy of the other is broken. |
| Comment by Alex Zhuravlev [ 03/Nov/23 ] |
|
this is what happens literally: 00020000:00000001:4.0:1698862926.643457:0:3746:0:(lov_ea.c:251:lsme_unpack()) Process leaving via out_lsme (rc=18446744073709551594 : -22 : 0xffffffffffffffea) 00020000:00000010:4.0:1698862926.643458:0:3746:0:(lov_ea.c:290:lsme_unpack()) slab-freed '(lsme->lsme_oinfo[i])': 112 at ffff9d87eab4c480. 00020000:00000010:4.0:1698862926.643459:0:3746:0:(lov_ea.c:292:lsme_unpack()) kfreed 'lsme': 128 at ffff9d87eab4ce80. 00020000:00000001:4.0:1698862926.643460:0:3746:0:(lov_ea.c:487:lsm_unpackmd_comp_md_v1()) Process leaving via out_lsm (rc=18446744073709551594 : -22 : 0xffffffffffffffea) 00020000:00000010:4.0:1698862926.643461:0:3746:0:(lov_ea.c:522:lsm_unpackmd_comp_md_v1()) kfreed 'lsm': 72 at ffff9d8e3015c360. 00020000:00000001:4.0:1698862926.643462:0:3746:0:(lov_ea.c:524:lsm_unpackmd_comp_md_v1()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea) 00020000:00000001:4.0:1698862926.643463:0:3746:0:(lov_pack.c:345:lov_unpackmd()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea) 00020000:00000001:4.0:1698862926.643464:0:3746:0:(lov_object.c:1299:lov_object_init()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea) 00020000:00000001:4.0:1698862926.643467:0:3746:0:(lov_object.c:1386:lov_object_delete()) Process entered 00020000:00000002:4.0:1698862926.643467:0:3746:0:(lov_object.c:1125:lov_conf_freeze()) To take share lov(ffff9d8c2ce92170) owner (null)/ffff9d82c6d80000 00020000:00000001:5.0:1698862926.643490:0:3746:0:(lov_object.c:1191:lov_layout_wait()) Process entered 00020000:00000001:5.0:1698862926.643491:0:3746:0:(lov_object.c:1201:lov_layout_wait()) Process leaving (rc=0 : 0 : 0) 00020000:00000002:5.0:1698862926.643492:0:3746:0:(lov_object.c:1133:lov_conf_thaw()) To release share lov(ffff9d8c2ce92170) owner (null)/ffff9d82c6d80000 00020000:00000001:5.0:1698862926.643493:0:3746:0:(lov_object.c:1388:lov_object_delete()) Process leaving 00020000:00000001:5.0:1698862926.643494:0:3746:0:(lov_object.c:1395:lov_object_free()) Process entered 00020000:00000002:5.0:1698862926.643494:0:3746:0:(lov_object.c:1125:lov_conf_freeze()) To take share lov(ffff9d8c2ce92170) owner (null)/ffff9d82c6d80000 00020000:00000002:5.0:1698862926.643495:0:3746:0:(lov_object.c:1133:lov_conf_thaw()) To release share lov(ffff9d8c2ce92170) owner (null)/ffff9d82c6d80000 00020000:00000010:5.0:1698862926.643497:0:3746:0:(lov_object.c:1398:lov_object_free()) slab-freed '(lov)': 184 at ffff9d8c2ce92170. 00020000:00000001:5.0:1698862926.643499:0:3746:0:(lov_object.c:1399:lov_object_free()) Process leaving 00000080:00000010:5.0:1698862926.643500:0:3746:0:(vvp_object.c:272:vvp_object_free()) slab-freed '(vob)': 176 at ffff9d92e4f2cdc0. 00000020:00000001:5.0:1698862926.643502:0:3746:0:(lu_object.c:814:lu_object_find_at()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea) 00000080:00020000:5.0:1698862926.643504:0:3746:0:(lcommon_cl.c:197:cl_file_inode_init()) nbp19: failed to initialize cl_object [0x3c005662a:0xc4f1:0x0]: rc = -22 00000020:00001000:5.0:1698862926.646200:0:3746:0:(cl_object.c:832:cl_env_put()) 1@ffff9d86304433f0 00000080:00000001:5.0:1698862926.646203:0:3746:0:(llite_lib.c:2260:ll_read_inode2()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea) the very first error leads to an user visible error. |
| Comment by Gerrit Updater [ 05/Nov/23 ] |
|
"Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52996 |
| Comment by Andreas Dilger [ 29/Nov/23 ] |
|
I tried to reproduce the exact situation seen in the broken files with an added test cases in patch https://review.whamcloud.com/53276 "LU-17261 tests: delete partial broken mirror is OK" with mixed success. When LFSCK is re-attaching the orphan OST object to the mirrored file it created a different layout than seen in the original system. Originally the layout component looks like: lmm_pattern: raid0
lmm_layout_gen: 0
lmm_objects:
- 0: { l_ost_idx: 0, l_fid: [0x280000401:0x3:0x0] }
- 1: { l_ost_idx: 1, l_fid: [0x2c0000401:0x4:0x0] }
lmm_pattern: 40000001
lmm_layout_gen: 1
lmm_objects:
- 0: { l_ost_idx: 0, l_fid: [0x280000401:0x3:0x0] }
- 1: { l_ost_idx: 0, l_fid: [0x100000000:0x0:0x0] }
This used "LOV_PATTERN_F_HOLE = 0x40000000" and a different FID for the missing OST objects, instead of "l_ost_idx: -1, l_fid: [0:0x0:0x0]" seen on the other files. The usage of LOV_PATTERN_F_HOLE was added for PFL files in commit v2_9_55_0-27-ge2cdf469b0 and for plain files in commit v2_5_59_0-34-g754bf71c65. So it isn't clear why the mirrors were recreated by LFSCK with "l_ost_idx: -1" in the original case? |
| Comment by Gerrit Updater [ 20/Dec/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52996/ |