Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17261

stat(2) should be able to use a good replica

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      a replica representation in LOV EA can be broken like this one:

      lcm_layout_gen: 7
      lcm_mirror_count: 2
      lcm_entry_count: 2
      lcme_id: 65538
      lcme_mirror_id: 1
      lcme_flags: init,stale
      lcme_extent.e_start: 134217728
      lcme_extent.e_end: 1073741824
      lmm_stripe_count: 16
      lmm_stripe_size: 16777216
      lmm_pattern: 40000001
      lmm_layout_gen: 1
      lmm_stripe_offset: 4294967295
      lmm_objects:
      - 0: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
      - 1: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
      - 2: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
      - 3: { l_ost_idx: 5, l_fid: [0xbc0000406:0x42fce0eb:0x0] }
      - 4: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
      - 5: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
      - 6: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
      - 7: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
      - 8: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
      - 9: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
      - 10: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
      - 11: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
      - 12: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
      - 13: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
      - 14: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
      - 15: { l_ost_idx: -1, l_fid: [0:0x0:0x0] }
       
      lcme_id: 131073
      lcme_mirror_id: 2
      lcme_flags: init
      lcme_extent.e_start: 0
      lcme_extent.e_end: EOF
      lmm_stripe_count: 16
      lmm_stripe_size: 1048576
      lmm_pattern: raid0
      lmm_layout_gen: 0
      lmm_stripe_offset: 5
      lmm_pool: hdd-pool
      lmm_objects:
      - 0: { l_ost_idx: 5, l_fid: [0xbc0000406:0x42feb0aa:0x0] }
      - 1: { l_ost_idx: 8, l_fid: [0x8c0000402:0x3bf10cb:0x0] }
      - 2: { l_ost_idx: 15, l_fid: [0x9c0000402:0x11f1d8f:0x0] }
      - 3: { l_ost_idx: 13, l_fid: [0x900000402:0x77529c35:0x0] }
      - 4: { l_ost_idx: 0, l_fid: [0x300000403:0x3beded4:0x0] }
      - 5: { l_ost_idx: 7, l_fid: [0xa80000402:0x11e9898:0x0] }
      - 6: { l_ost_idx: 12, l_fid: [0x880000402:0x3bef34f:0x0] }
      - 7: { l_ost_idx: 10, l_fid: [0xa40000402:0x11d9d9d:0x0] }
      - 8: { l_ost_idx: 14, l_fid: [0xa00000402:0x11e4d68:0x0] }
      - 9: { l_ost_idx: 2, l_fid: [0xb80000402:0x11d545a:0x0] }
      - 10: { l_ost_idx: 6, l_fid: [0xb40000400:0x11f22d9:0x0] }
      - 11: { l_ost_idx: 4, l_fid: [0x2c0000403:0x4016eb6:0x0] }
      - 12: { l_ost_idx: 9, l_fid: [0x940000402:0xaf7b184a:0x0] }
      - 13: { l_ost_idx: 11, l_fid: [0x980000402:0x11dc273:0x0] }
      - 14: { l_ost_idx: 1, l_fid: [0xac0000404:0x3015313c:0x0] }
      - 15: { l_ost_idx: 3, l_fid: [0xb00000400:0x11dd9bb:0x0] } 
      

      but regular stat should use any valid replica, not return an error once a bogus one is met.

      Attachments

        1. debug.txt.gz
          439 kB
          Andreas Dilger

        Issue Links

          Activity

            [LU-17261] stat(2) should be able to use a good replica
            pjones Peter Jones added a comment -

            Everything appears to be merged for 2.16

            pjones Peter Jones added a comment - Everything appears to be merged for 2.16
            pjones Peter Jones added a comment -

            Still test patch left to land

            pjones Peter Jones added a comment - Still test patch left to land
            pjones Peter Jones added a comment -

            Merged for 2.16

            pjones Peter Jones added a comment - Merged for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/54544/
            Subject: LU-17261 lov: unlink can handle bogus striping
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 4ae823762db40d790ddd00c29e969b5c8e376430

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/54544/ Subject: LU-17261 lov: unlink can handle bogus striping Project: fs/lustre-release Branch: master Current Patch Set: Commit: 4ae823762db40d790ddd00c29e969b5c8e376430

            with the patch above I see that unlink can handle such a file with broken ostidx.

            bzzz Alex Zhuravlev added a comment - with the patch above I see that unlink can handle such a file with broken ostidx.

            "Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54544
            Subject: LU-17261 tests: unlink can handle bogus striping
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: c9a4f962c8c6832169b29b84de6b1d0714b0cf57

            gerrit Gerrit Updater added a comment - "Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54544 Subject: LU-17261 tests: unlink can handle bogus striping Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c9a4f962c8c6832169b29b84de6b1d0714b0cf57

            Well, I used LOV_V1_INSANE_STRIPE_COUNT=65532 in my proposed code above. This covers all "valid" OST numbers but excludes "-1=4294967295" that is seen in this case.

            adilger Andreas Dilger added a comment - Well, I used LOV_V1_INSANE_STRIPE_COUNT=65532 in my proposed code above. This covers all "valid" OST numbers but excludes " -1=4294967295 " that is seen in this case.

            what would be the "border" for an OST index to become "invalid" ? i.e. what index we consider "potentially good" and wait for a corresponding OST to appear and what index we declare "impossible" and proceed a special way?

            bzzz Alex Zhuravlev added a comment - what would be the "border" for an OST index to become "invalid" ? i.e. what index we consider "potentially good" and wait for a corresponding OST to appear and what index we declare "impossible" and proceed a special way?

            The "wait 30s while client connects to new OST" delay is because of the OST index check in lmv_tgt_retry(), and it would be gone with the proposed change above.

            adilger Andreas Dilger added a comment - The "wait 30s while client connects to new OST" delay is because of the OST index check in lmv_tgt_retry() , and it would be gone with the proposed change above.

            I did a simple test with unlink:

            == sanity-flr test 210b: handle broken mirrored lovea (unlink) ========================================================== 07:30:01 (1711179001)
            before dd
            lustre-OST0000_UUID      1818580        1524     1700672   1% /mnt/lustre[OST:0] 
            lustre-OST0001_UUID      1818580        1524     1700672   1% /mnt/lustre[OST:1] 
            [   28.527289] Lustre: DEBUG MARKER: == sanity-flr test 210b: handle broken mirrored lovea (unlink) ========================================================== 07:30:01 (1711179001)
            20+0 records in
            20+0 records out
            20971520 bytes (21 MB, 20 MiB) copied, 0.032342 s, 648 MB/s
            after dd
            lustre-OST0000_UUID      1818580       22004     1659592   2% /mnt/lustre[OST:0] 
            lustre-OST0001_UUID      1818580        1524     1700672   1% /mnt/lustre[OST:1] 
            fail_loc=0x1428
            [   31.585672] Lustre: *** cfs_fail_loc=1428, val=0***
            [   31.670467] Lustre: 6901:0:(lov_ea.c:299:lsme_unpack()) lustre-clilov_UUID: FID 0x280000401:2 OST index -1 more than OST count 2
            [   31.670469] Lustre: lustre-clilov_UUID: wait 30s while client connects to new OST
            [   61.791678] Lustre: 6901:0:(lov_pack.c:57:lov_dump_lmm_common()) objid 0x3:1026, magic 0x0bd10bd0, pattern 0x1
            [   61.791683] Lustre: 6901:0:(lov_pack.c:61:lov_dump_lmm_common()) stripe_size 4194304, stripe_count 1, layout_gen 0
            [   61.791685] Lustre: 6901:0:(lov_pack.c:81:lov_dump_lmm_objects()) stripe 0 idx 4294967295 subobj 0x280000401:2
            after mirror extend
            [   91.951586] Lustre: 6901:0:(lov_pack.c:57:lov_dump_lmm_common()) objid 0x3:1026, magic 0x0bd10bd0, pattern 0x1
            [   91.951591] Lustre: 6901:0:(lov_pack.c:61:lov_dump_lmm_common()) stripe_size 4194304, stripe_count 1, layout_gen 0
            [   91.951594] Lustre: 6901:0:(lov_pack.c:81:lov_dump_lmm_objects()) stripe 0 idx 4294967295 subobj 0x280000401:2
            [  121.071619] Lustre: 6905:0:(lov_pack.c:57:lov_dump_lmm_common()) objid 0x3:1026, magic 0x0bd10bd0, pattern 0x1
            [  121.071624] Lustre: 6905:0:(lov_pack.c:61:lov_dump_lmm_common()) stripe_size 4194304, stripe_count 1, layout_gen 0
            [  121.071627] Lustre: 6905:0:(lov_pack.c:81:lov_dump_lmm_objects()) stripe 0 idx 4294967295 subobj 0x280000401:2
            now list directory
            total 0
            ===
            [  151.231585] Lustre: 6905:0:(lov_pack.c:57:lov_dump_lmm_common()) objid 0x3:1026, magic 0x0bd10bd0, pattern 0x1
            [  151.231590] Lustre: 6905:0:(lov_pack.c:61:lov_dump_lmm_common()) stripe_size 4194304, stripe_count 1, layout_gen 0
            [  151.231592] Lustre: 6905:0:(lov_pack.c:81:lov_dump_lmm_objects()) stripe 0 idx 4294967295 subobj 0x280000401:2
            Waiting for MDT destroys to complete
            after removal
            lustre-OST0000_UUID      1818580        1524     1700672   1% /mnt/lustre[OST:0] 
            lustre-OST0001_UUID      1818580        1524     1700672   1% /mnt/lustre[OST:1] 
            PASS 210b (134s)
            

            that seems to work (at least the space is back), but takes very long. not sure whether this is OK.

            bzzz Alex Zhuravlev added a comment - I did a simple test with unlink: == sanity-flr test 210b: handle broken mirrored lovea (unlink) ========================================================== 07:30:01 (1711179001) before dd lustre-OST0000_UUID 1818580 1524 1700672 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 1818580 1524 1700672 1% /mnt/lustre[OST:1] [ 28.527289] Lustre: DEBUG MARKER: == sanity-flr test 210b: handle broken mirrored lovea (unlink) ========================================================== 07:30:01 (1711179001) 20+0 records in 20+0 records out 20971520 bytes (21 MB, 20 MiB) copied, 0.032342 s, 648 MB/s after dd lustre-OST0000_UUID 1818580 22004 1659592 2% /mnt/lustre[OST:0] lustre-OST0001_UUID 1818580 1524 1700672 1% /mnt/lustre[OST:1] fail_loc=0x1428 [ 31.585672] Lustre: *** cfs_fail_loc=1428, val=0*** [ 31.670467] Lustre: 6901:0:(lov_ea.c:299:lsme_unpack()) lustre-clilov_UUID: FID 0x280000401:2 OST index -1 more than OST count 2 [ 31.670469] Lustre: lustre-clilov_UUID: wait 30s while client connects to new OST [ 61.791678] Lustre: 6901:0:(lov_pack.c:57:lov_dump_lmm_common()) objid 0x3:1026, magic 0x0bd10bd0, pattern 0x1 [ 61.791683] Lustre: 6901:0:(lov_pack.c:61:lov_dump_lmm_common()) stripe_size 4194304, stripe_count 1, layout_gen 0 [ 61.791685] Lustre: 6901:0:(lov_pack.c:81:lov_dump_lmm_objects()) stripe 0 idx 4294967295 subobj 0x280000401:2 after mirror extend [ 91.951586] Lustre: 6901:0:(lov_pack.c:57:lov_dump_lmm_common()) objid 0x3:1026, magic 0x0bd10bd0, pattern 0x1 [ 91.951591] Lustre: 6901:0:(lov_pack.c:61:lov_dump_lmm_common()) stripe_size 4194304, stripe_count 1, layout_gen 0 [ 91.951594] Lustre: 6901:0:(lov_pack.c:81:lov_dump_lmm_objects()) stripe 0 idx 4294967295 subobj 0x280000401:2 [ 121.071619] Lustre: 6905:0:(lov_pack.c:57:lov_dump_lmm_common()) objid 0x3:1026, magic 0x0bd10bd0, pattern 0x1 [ 121.071624] Lustre: 6905:0:(lov_pack.c:61:lov_dump_lmm_common()) stripe_size 4194304, stripe_count 1, layout_gen 0 [ 121.071627] Lustre: 6905:0:(lov_pack.c:81:lov_dump_lmm_objects()) stripe 0 idx 4294967295 subobj 0x280000401:2 now list directory total 0 === [ 151.231585] Lustre: 6905:0:(lov_pack.c:57:lov_dump_lmm_common()) objid 0x3:1026, magic 0x0bd10bd0, pattern 0x1 [ 151.231590] Lustre: 6905:0:(lov_pack.c:61:lov_dump_lmm_common()) stripe_size 4194304, stripe_count 1, layout_gen 0 [ 151.231592] Lustre: 6905:0:(lov_pack.c:81:lov_dump_lmm_objects()) stripe 0 idx 4294967295 subobj 0x280000401:2 Waiting for MDT destroys to complete after removal lustre-OST0000_UUID 1818580 1524 1700672 1% /mnt/lustre[OST:0] lustre-OST0001_UUID 1818580 1524 1700672 1% /mnt/lustre[OST:1] PASS 210b (134s) that seems to work (at least the space is back), but takes very long. not sure whether this is OK.

            People

              bzzz Alex Zhuravlev
              bzzz Alex Zhuravlev
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: