[LU-11238] sanity-flr test 47 fails with “component 131075 objects allocated on 0, shouldn't on OST0” Created: 13/Aug/18  Updated: 29/Mar/21  Resolved: 28/Aug/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: Lustre 2.12.0

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: DNE
Environment:

DNE


Issue Links:
Related
is related to LU-9007 Improved object allocator for FLR com... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanity-flr test_47 started failing on August 10, 2018 with Lustre version 2.11.53.66 build #3776. So far, we only see this failure for DNE testing.

From the full test session logs at https://testing.whamcloud.com/test_sets/3f6ba1b6-9dd4-11e8-87f3-52540065bddc, the following is the test_log for this test

== sanity-flr test 47: Verify mirror obj alloc ======================================================= 19:07:06 (1533928026)
striped dir -i3 -c2 /mnt/lustre/d47.sanity-flr
3+0 records in
3+0 records out
3145728 bytes (3.1 MB) copied, 0.0718565 s, 43.8 MB/s
/mnt/lustre/d47.sanity-flr/f47.sanity-flr
  lcm_layout_gen:    8
  lcm_mirror_count:  2
  lcm_entry_count:   4
    lcme_id:             65537
    lcme_mirror_id:      1
    lcme_flags:          init,prefer
    lcme_extent.e_start: 0
    lcme_extent.e_end:   2097152
      lmm_stripe_count:  1
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 0
      lmm_objects:
      - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x4063:0x0] }

    lcme_id:             65538
    lcme_mirror_id:      1
    lcme_flags:          init,prefer
    lcme_extent.e_start: 2097152
    lcme_extent.e_end:   EOF
      lmm_stripe_count:  2
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 1
      lmm_objects:
      - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x4083:0x0] }
      - 1: { l_ost_idx: 2, l_fid: [0x100020000:0x4043:0x0] }

    lcme_id:             131075
    lcme_mirror_id:      2
    lcme_flags:          init
    lcme_extent.e_start: 0
    lcme_extent.e_end:   2097152
      lmm_stripe_count:  1
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 0
      lmm_objects:
      - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x4064:0x0] }

    lcme_id:             131076
    lcme_mirror_id:      2
    lcme_flags:          init
    lcme_extent.e_start: 2097152
    lcme_extent.e_end:   EOF
      lmm_stripe_count:  1
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 3
      lmm_objects:
      - 0: { l_ost_idx: 3, l_fid: [0x100030000:0x3fd9:0x0] }

 sanity-flr test_47: @@@@@@ FAIL: component 131075 objects allocated on 0,  shouldn't on OST0 

Here are logs for a few of these sanity-flr failures:
https://testing.whamcloud.com/test_sets/7470cf3e-9c95-11e8-b0aa-52540065bddc
https://testing.whamcloud.com/test_sets/3f6ba1b6-9dd4-11e8-87f3-52540065bddc
https://testing.whamcloud.com/test_sets/c116620a-9df7-11e8-b0aa-52540065bddc



 Comments   
Comment by Andreas Dilger [ 13/Aug/18 ]

This test was added in patch https://review.whamcloud.com/32404 landed on July 24:

 LU-9007 lod: improve obj alloc for FLR file
    
    * add lod_layout_component::llc_ost_indices to track the map
      of dt_object to its OST index.
    * add lod_device::lod_avoid to collect information of objects on other
      mirrors which overlapped the target component
    * lod_should_avoid_ost() use the avoid guidance information to avoid
      allocating objects on the same OST for different mirrors.
    
    Change-Id: Ib7e155e4b02c2e25d3955aa9a4acff7569ab7d8f

But it seems related to the landing of patch https://review.whamcloud.com/32813 on August 9:

LU-9007 lod: get rid of comp ost in use array
    
    Use lod_layout_component::llc_ost_indices to serve the same purpose.
    
    Change-Id: I66c89fe6349b48b89593e34e9e985ec6ea5a1758

If we can't find a quick fix, we should revert that patch.

Comment by Gerrit Updater [ 14/Aug/18 ]

Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/32995
Subject: LU-11238 lod: refine obj avoid collect for FLR
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0685e3715e0b0d512164862c49632c7ad24e27cd

Comment by Gerrit Updater [ 28/Aug/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32995/
Subject: LU-11238 lod: refine obj avoid collect for FLR
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: fd774a0dc30b4078956ff6022f5d008b287c276d

Comment by Peter Jones [ 28/Aug/18 ]

Landed for 2.12

Comment by Alex Zhuravlev [ 29/Mar/21 ]

see this happening very frequently on a local setup:

== sanity-flr test 47: Verify mirror obj alloc ======================================================= 15:46:28 (1617032788)
Lustre: DEBUG MARKER: == sanity-flr test 47: Verify mirror obj alloc ======================================================= 15:46:28 (1617032788)
striped dir -i1 -c2 -H crush /mnt/lustre/d47.sanity-flr
3+0 records in
3+0 records out
3145728 bytes (3.1 MB, 3.0 MiB) copied, 0.0429161 s, 73.3 MB/s
3+0 records in
3+0 records out
3145728 bytes (3.1 MB, 3.0 MiB) copied, 0.0219156 s, 144 MB/s
/mnt/lustre/d47.sanity-flr/f47.sanity-flr
  lcm_layout_gen:    12
  lcm_mirror_count:  4
  lcm_entry_count:   8
    lcme_id:             65537
    lcme_mirror_id:      1
    lcme_flags:          init
    lcme_extent.e_start: 0
    lcme_extent.e_end:   1048576
      lmm_stripe_count:  1
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 0
      lmm_objects:
      - 0: { l_ost_idx: 0, l_fid: [0x280000400:0xab:0x0] }

    lcme_id:             65538
    lcme_mirror_id:      1
    lcme_flags:          init
    lcme_extent.e_start: 1048576
    lcme_extent.e_end:   EOF
      lmm_stripe_count:  1
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 1
      lmm_objects:
      - 0: { l_ost_idx: 1, l_fid: [0x2c0000400:0xeb:0x0] }

    lcme_id:             131075
    lcme_mirror_id:      2
    lcme_flags:          init
    lcme_extent.e_start: 0
    lcme_extent.e_end:   1048576
      lmm_stripe_count:  1
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 2
      lmm_objects:
      - 0: { l_ost_idx: 2, l_fid: [0x300000400:0x6c:0x0] }

    lcme_id:             131076
    lcme_mirror_id:      2
    lcme_flags:          init
    lcme_extent.e_start: 1048576
    lcme_extent.e_end:   EOF
      lmm_stripe_count:  1
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 2
      lmm_objects:
      - 0: { l_ost_idx: 2, l_fid: [0x300000400:0x6e:0x0] }

    lcme_id:             196613
    lcme_mirror_id:      3
    lcme_flags:          init
    lcme_extent.e_start: 0
    lcme_extent.e_end:   1048576
      lmm_stripe_count:  1
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 2
      lmm_objects:
      - 0: { l_ost_idx: 2, l_fid: [0x300000400:0x6d:0x0] }

    lcme_id:             196614
    lcme_mirror_id:      3
    lcme_flags:          init
    lcme_extent.e_start: 1048576
    lcme_extent.e_end:   EOF
      lmm_stripe_count:  1
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 3
      lmm_objects:
      - 0: { l_ost_idx: 3, l_fid: [0x340000400:0xc:0x0] }

    lcme_id:             262151
    lcme_mirror_id:      4
    lcme_flags:          init
    lcme_extent.e_start: 0
    lcme_extent.e_end:   1048576
      lmm_stripe_count:  1
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 3
      lmm_objects:
      - 0: { l_ost_idx: 3, l_fid: [0x340000400:0xb:0x0] }

    lcme_id:             262152
    lcme_mirror_id:      4
    lcme_flags:          init
    lcme_extent.e_start: 1048576
    lcme_extent.e_end:   EOF
      lmm_stripe_count:  1
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 0
      lmm_objects:
      - 0: { l_ost_idx: 0, l_fid: [0x280000400:0xac:0x0] }

 sanity-flr test_47: @@@@@@ FAIL: component 65537, 131075, 196613 have objects  allocated on duplicated OSTs 

should re-open the ticket?

Generated at Sat Feb 10 02:42:09 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.