[LU-11238] sanity-flr test 47 fails with “component 131075 objects allocated on 0, shouldn't on OST0” Created: 13/Aug/18 Updated: 29/Mar/21 Resolved: 28/Aug/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0 |
| Fix Version/s: | Lustre 2.12.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | Zhenyu Xu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | DNE | ||
| Environment: |
DNE |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
sanity-flr test_47 started failing on August 10, 2018 with Lustre version 2.11.53.66 build #3776. So far, we only see this failure for DNE testing. From the full test session logs at https://testing.whamcloud.com/test_sets/3f6ba1b6-9dd4-11e8-87f3-52540065bddc, the following is the test_log for this test == sanity-flr test 47: Verify mirror obj alloc ======================================================= 19:07:06 (1533928026)
striped dir -i3 -c2 /mnt/lustre/d47.sanity-flr
3+0 records in
3+0 records out
3145728 bytes (3.1 MB) copied, 0.0718565 s, 43.8 MB/s
/mnt/lustre/d47.sanity-flr/f47.sanity-flr
lcm_layout_gen: 8
lcm_mirror_count: 2
lcm_entry_count: 4
lcme_id: 65537
lcme_mirror_id: 1
lcme_flags: init,prefer
lcme_extent.e_start: 0
lcme_extent.e_end: 2097152
lmm_stripe_count: 1
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 0
lmm_objects:
- 0: { l_ost_idx: 0, l_fid: [0x100000000:0x4063:0x0] }
lcme_id: 65538
lcme_mirror_id: 1
lcme_flags: init,prefer
lcme_extent.e_start: 2097152
lcme_extent.e_end: EOF
lmm_stripe_count: 2
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 1
lmm_objects:
- 0: { l_ost_idx: 1, l_fid: [0x100010000:0x4083:0x0] }
- 1: { l_ost_idx: 2, l_fid: [0x100020000:0x4043:0x0] }
lcme_id: 131075
lcme_mirror_id: 2
lcme_flags: init
lcme_extent.e_start: 0
lcme_extent.e_end: 2097152
lmm_stripe_count: 1
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 0
lmm_objects:
- 0: { l_ost_idx: 0, l_fid: [0x100000000:0x4064:0x0] }
lcme_id: 131076
lcme_mirror_id: 2
lcme_flags: init
lcme_extent.e_start: 2097152
lcme_extent.e_end: EOF
lmm_stripe_count: 1
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 3
lmm_objects:
- 0: { l_ost_idx: 3, l_fid: [0x100030000:0x3fd9:0x0] }
sanity-flr test_47: @@@@@@ FAIL: component 131075 objects allocated on 0, shouldn't on OST0
Here are logs for a few of these sanity-flr failures: |
| Comments |
| Comment by Andreas Dilger [ 13/Aug/18 ] |
|
This test was added in patch https://review.whamcloud.com/32404 landed on July 24: LU-9007 lod: improve obj alloc for FLR file
* add lod_layout_component::llc_ost_indices to track the map
of dt_object to its OST index.
* add lod_device::lod_avoid to collect information of objects on other
mirrors which overlapped the target component
* lod_should_avoid_ost() use the avoid guidance information to avoid
allocating objects on the same OST for different mirrors.
Change-Id: Ib7e155e4b02c2e25d3955aa9a4acff7569ab7d8f
But it seems related to the landing of patch https://review.whamcloud.com/32813 on August 9: LU-9007 lod: get rid of comp ost in use array
Use lod_layout_component::llc_ost_indices to serve the same purpose.
Change-Id: I66c89fe6349b48b89593e34e9e985ec6ea5a1758
If we can't find a quick fix, we should revert that patch. |
| Comment by Gerrit Updater [ 14/Aug/18 ] |
|
Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/32995 |
| Comment by Gerrit Updater [ 28/Aug/18 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32995/ |
| Comment by Peter Jones [ 28/Aug/18 ] |
|
Landed for 2.12 |
| Comment by Alex Zhuravlev [ 29/Mar/21 ] |
|
see this happening very frequently on a local setup:
== sanity-flr test 47: Verify mirror obj alloc ======================================================= 15:46:28 (1617032788)
Lustre: DEBUG MARKER: == sanity-flr test 47: Verify mirror obj alloc ======================================================= 15:46:28 (1617032788)
striped dir -i1 -c2 -H crush /mnt/lustre/d47.sanity-flr
3+0 records in
3+0 records out
3145728 bytes (3.1 MB, 3.0 MiB) copied, 0.0429161 s, 73.3 MB/s
3+0 records in
3+0 records out
3145728 bytes (3.1 MB, 3.0 MiB) copied, 0.0219156 s, 144 MB/s
/mnt/lustre/d47.sanity-flr/f47.sanity-flr
lcm_layout_gen: 12
lcm_mirror_count: 4
lcm_entry_count: 8
lcme_id: 65537
lcme_mirror_id: 1
lcme_flags: init
lcme_extent.e_start: 0
lcme_extent.e_end: 1048576
lmm_stripe_count: 1
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 0
lmm_objects:
- 0: { l_ost_idx: 0, l_fid: [0x280000400:0xab:0x0] }
lcme_id: 65538
lcme_mirror_id: 1
lcme_flags: init
lcme_extent.e_start: 1048576
lcme_extent.e_end: EOF
lmm_stripe_count: 1
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 1
lmm_objects:
- 0: { l_ost_idx: 1, l_fid: [0x2c0000400:0xeb:0x0] }
lcme_id: 131075
lcme_mirror_id: 2
lcme_flags: init
lcme_extent.e_start: 0
lcme_extent.e_end: 1048576
lmm_stripe_count: 1
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 2
lmm_objects:
- 0: { l_ost_idx: 2, l_fid: [0x300000400:0x6c:0x0] }
lcme_id: 131076
lcme_mirror_id: 2
lcme_flags: init
lcme_extent.e_start: 1048576
lcme_extent.e_end: EOF
lmm_stripe_count: 1
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 2
lmm_objects:
- 0: { l_ost_idx: 2, l_fid: [0x300000400:0x6e:0x0] }
lcme_id: 196613
lcme_mirror_id: 3
lcme_flags: init
lcme_extent.e_start: 0
lcme_extent.e_end: 1048576
lmm_stripe_count: 1
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 2
lmm_objects:
- 0: { l_ost_idx: 2, l_fid: [0x300000400:0x6d:0x0] }
lcme_id: 196614
lcme_mirror_id: 3
lcme_flags: init
lcme_extent.e_start: 1048576
lcme_extent.e_end: EOF
lmm_stripe_count: 1
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 3
lmm_objects:
- 0: { l_ost_idx: 3, l_fid: [0x340000400:0xc:0x0] }
lcme_id: 262151
lcme_mirror_id: 4
lcme_flags: init
lcme_extent.e_start: 0
lcme_extent.e_end: 1048576
lmm_stripe_count: 1
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 3
lmm_objects:
- 0: { l_ost_idx: 3, l_fid: [0x340000400:0xb:0x0] }
lcme_id: 262152
lcme_mirror_id: 4
lcme_flags: init
lcme_extent.e_start: 1048576
lcme_extent.e_end: EOF
lmm_stripe_count: 1
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 0
lmm_objects:
- 0: { l_ost_idx: 0, l_fid: [0x280000400:0xac:0x0] }
sanity-flr test_47: @@@@@@ FAIL: component 65537, 131075, 196613 have objects allocated on duplicated OSTs
should re-open the ticket? |