Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14692

deprecate use of OST FID SEQ 0 for MDT0000

Details

    • 9223372036854775807

    Description

      Since Lustre 2.4.0 and DNE1, it has been possible to create OST objects using a different FID SEQ range for each MDT, to avoid contention during MDT object precreation.

      Objects that are created by MDT0000 are put into FID SEQ 0 (O/0/d*) on all OSTs and have a filename that is the decimal FID OID in ASCII. However, SEQ=0 objects are remapped to IDIF FID SEQ (0x100000000 | (ost_idx << 16)) so that they are unique across all OSTs.

      Objects that are created by other MDTs (or MDT0000 after 2^48 objects are created in SEQ 0) use a unique SEQ in the FID_SEQ_NORMAL range (> 0x200000400), and use a filename that is the hexadecimal FID OID in ASCII.

      For compatibility with pre-DNE MDTs and OSTs, the use of SEQ=0 by MDT0000 was kept until now, but there has not been a reason to keep this compatibility for new filesystems. It would be better to have MDT0000 assigned a "regular" FID SEQ range at startup, so that the SEQ=0 compatibility can eventually be removed. That would ensure OST objects have "proper and unique" FIDs, and avoid the complexity of mapping between the old SEQ=0 48-bit OID values and the IDIF FIDs.

      Older filesystems using SEQ=0 would eventually delete old objects in this range and/or could be forced to migrate to using new objects to clean up the remaining usage, if necessary.

      Attachments

        1. serial.txt
          778 kB
        2. stdout.txt
          484 kB

        Issue Links

          Activity

            [LU-14692] deprecate use of OST FID SEQ 0 for MDT0000

            The test already has something similar:

            # make sure new superblock labels are sync'd before disabling writes
            sync_all_data
            sleep 5
            

            so adding a file create on all OSTs is reasonable.

            adilger Andreas Dilger added a comment - The test already has something similar: # make sure new superblock labels are sync'd before disabling writes sync_all_data sleep 5 so adding a file create on all OSTs is reasonable.

            Can the test be updated to do something simple like "lfs setstripe -i -1 $DIR/$tfile.tmp" to force the sequence update before the replay barrier?

            adilger Andreas Dilger added a comment - Can the test be updated to do something simple like " lfs setstripe -i -1 $DIR/$tfile.tmp " to force the sequence update before the replay barrier?
            dongyang Dongyang Li added a comment -

            From the console log:

            [ 6684.760425] Lustre: Mounted lustre-client
            [ 6686.111594] Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x00000002c0000400-0x0000000300000400]:1:ost
            [ 6686.127442] Lustre: Skipped 2 previous similar messages
            [ 6686.127744] Lustre: cli-lustre-OST0001-super: Allocated super-sequence [0x00000002c0000400-0x0000000300000400]:1:ost]
            [ 6686.127895] Lustre: Skipped 1 previous similar message
            [ 6691.011345] Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000
            [ 6691.028634] Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0000
            [ 6691.211977] Lustre: lustre-OST0001-osc-MDT0000: update sequence from 0x100010000 to 0x2c0000401
            [ 6692.967973] systemd[1]: mnt-lustre\x2dmds1.mount: Succeeded.
            [ 6693.003490] Lustre: Failing over lustre-MDT0000
            

            The sequence update from 0x100010000 to 0x2c0000401 was lost after replay_barrier.

            dongyang Dongyang Li added a comment - From the console log: [ 6684.760425] Lustre: Mounted lustre-client [ 6686.111594] Lustre: ctl-lustre-MDT0000: super -sequence allocation rc = 0 [0x00000002c0000400-0x0000000300000400]:1:ost [ 6686.127442] Lustre: Skipped 2 previous similar messages [ 6686.127744] Lustre: cli-lustre-OST0001- super : Allocated super -sequence [0x00000002c0000400-0x0000000300000400]:1:ost] [ 6686.127895] Lustre: Skipped 1 previous similar message [ 6691.011345] Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000 [ 6691.028634] Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0000 [ 6691.211977] Lustre: lustre-OST0001-osc-MDT0000: update sequence from 0x100010000 to 0x2c0000401 [ 6692.967973] systemd[1]: mnt-lustre\x2dmds1.mount: Succeeded. [ 6693.003490] Lustre: Failing over lustre-MDT0000 The sequence update from 0x100010000 to 0x2c0000401 was lost after replay_barrier.

            it's always conf-sanity/84, stdout/console are attached.

            bzzz Alex Zhuravlev added a comment - it's always conf-sanity/84, stdout/console are attached.

            Alex, what test was running when the failure was hit? There was some discussion about this issue with Dongyang, basically that replay_barrier is discarding the SEQ update (which is sync on the server and otherwise atomic) because the underlying storage was marked read-only.

            The open question was whether this LASSERT() should be relaxed to handle the case of write loss (e.g. due to controller cache failure) at the same time as SEQ rollover? The SEQ rollover definitely is going to happen more often now (once per 32M OST objects vs. once per 4B objects), but if the storage is losing sync writes then there are a lot of things that will go badly.

            adilger Andreas Dilger added a comment - Alex, what test was running when the failure was hit? There was some discussion about this issue with Dongyang, basically that replay_barrier is discarding the SEQ update (which is sync on the server and otherwise atomic) because the underlying storage was marked read-only. The open question was whether this LASSERT() should be relaxed to handle the case of write loss (e.g. due to controller cache failure) at the same time as SEQ rollover? The SEQ rollover definitely is going to happen more often now (once per 32M OST objects vs. once per 4B objects), but if the storage is losing sync writes then there are a lot of things that will go badly.
            dongyang Dongyang Li added a comment -

            Alex, could you share the vmcore-dmesg from the crash?
            I wonder if the change to "normal SEQ" happened after replay_barrier, when mdt starts again for recovery, it will see the old IDIF seq from disk.

            dongyang Dongyang Li added a comment - Alex, could you share the vmcore-dmesg from the crash? I wonder if the change to "normal SEQ" happened after replay_barrier, when mdt starts again for recovery, it will see the old IDIF seq from disk.

            not sure, but I haven't seen the following problem before the last wave of landings which include LU-14692:

            LustreError: 343158:0:(osp_internal.h:530:osp_fid_diff()) ASSERTION( fid_seq(fid1) == fid_seq(fid2) ) failed: fid1:[0x2c0000401:0x2:0x0], fid2:[0x100010000:0x1:0x0] in conf-sanity / 84
            ...
            PID: 343158  TASK: ffff8b7824a605c0  CPU: 1   COMMAND: "tgt_recover_0"
             #0 [ffff8b783d273578] panic at ffffffff8f0b9786
                /tmp/kernel/kernel/panic.c: 299
             #1 [ffff8b783d2735f8] osp_create at ffffffffc1198895 [osp]
                /home/lustre/master-mine/lustre/osp/osp_internal.h: 529
             #2 [ffff8b783d273680] lod_sub_create at ffffffffc113534e [lod]
                /home/lustre/master-mine/lustre/include/dt_object.h: 2333
             #3 [ffff8b783d2736f0] lod_striped_create at ffffffffc112076b [lod]
                /home/lustre/master-mine/lustre/lod/lod_object.c: 6338
             #4 [ffff8b783d273760] lod_xattr_set at ffffffffc1128200 [lod]
                /home/lustre/master-mine/lustre/lod/lod_object.c: 5068
             #5 [ffff8b783d273810] mdd_create_object at ffffffffc0f76a93 [mdd]
                /home/lustre/master-mine/lustre/include/dt_object.h: 2832
             #6 [ffff8b783d273940] mdd_create at ffffffffc0f81f98 [mdd]
                /home/lustre/master-mine/lustre/mdd/mdd_dir.c: 2827
             #7 [ffff8b783d273a40] mdt_reint_open at ffffffffc1038328 [mdt]
                /home/lustre/master-mine/lustre/mdt/mdt_open.c: 1574
             #8 [ffff8b783d273bf8] mdt_reint_rec at ffffffffc102731f [mdt]
                /home/lustre/master-mine/lustre/mdt/mdt_reint.c: 3240
             #9 [ffff8b783d273c20] mdt_reint_internal at ffffffffc0ff6ef6 [mdt]
                /home/lustre/master-mine/libcfs/include/libcfs/libcfs_debug.h: 155
            #10 [ffff8b783d273c58] mdt_intent_open at ffffffffc1002982 [mdt]
                /home/lustre/master-mine/lustre/mdt/mdt_handler.c: 4826
            #11 [ffff8b783d273c98] mdt_intent_policy at ffffffffc0fffe79 [mdt]
                /home/lustre/master-mine/lustre/mdt/mdt_handler.c: 4971
            #12 [ffff8b783d273cf8] ldlm_lock_enqueue at ffffffffc08bdbdf [ptlrpc]
                /home/lustre/master-mine/lustre/ptlrpc/../../lustre/ldlm/ldlm_lock.c: 1794
            #13 [ffff8b783d273d60] ldlm_handle_enqueue0 at ffffffffc08e5046 [ptlrpc]
                /home/lustre/master-mine/lustre/ptlrpc/../../lustre/ldlm/ldlm_lockd.c: 1441
            #14 [ffff8b783d273dd8] tgt_enqueue at ffffffffc091fd1f [ptlrpc]
                /home/lustre/master-mine/lustre/ptlrpc/../../lustre/target/tgt_handler.c: 1446
            #15 [ffff8b783d273df0] tgt_request_handle at ffffffffc0926147 [ptlrpc]
                /home/lustre/master-mine/lustre/include/lu_target.h: 645
            #16 [ffff8b783d273e68] handle_recovery_req at ffffffffc08c8c3c [ptlrpc]
                /home/lustre/master-mine/lustre/ptlrpc/../../lustre/ldlm/ldlm_lib.c: 2418
            #17 [ffff8b783d273e98] target_recovery_thread at ffffffffc08d1300 [ptlrpc]
                /home/lustre/master-mine/lustre/ptlrpc/../../lustre/ldlm/ldlm_lib.c: 2677
            #18 [ffff8b783d273f10] kthread at ffffffff8f0d5199
                /tmp/kernel/kernel/kthread.c: 340
            
            bzzz Alex Zhuravlev added a comment - not sure, but I haven't seen the following problem before the last wave of landings which include LU-14692 : LustreError: 343158:0:(osp_internal.h:530:osp_fid_diff()) ASSERTION( fid_seq(fid1) == fid_seq(fid2) ) failed: fid1:[0x2c0000401:0x2:0x0], fid2:[0x100010000:0x1:0x0] in conf-sanity / 84 ... PID: 343158 TASK: ffff8b7824a605c0 CPU: 1 COMMAND: "tgt_recover_0" #0 [ffff8b783d273578] panic at ffffffff8f0b9786 /tmp/kernel/kernel/panic.c: 299 #1 [ffff8b783d2735f8] osp_create at ffffffffc1198895 [osp] /home/lustre/master-mine/lustre/osp/osp_internal.h: 529 #2 [ffff8b783d273680] lod_sub_create at ffffffffc113534e [lod] /home/lustre/master-mine/lustre/include/dt_object.h: 2333 #3 [ffff8b783d2736f0] lod_striped_create at ffffffffc112076b [lod] /home/lustre/master-mine/lustre/lod/lod_object.c: 6338 #4 [ffff8b783d273760] lod_xattr_set at ffffffffc1128200 [lod] /home/lustre/master-mine/lustre/lod/lod_object.c: 5068 #5 [ffff8b783d273810] mdd_create_object at ffffffffc0f76a93 [mdd] /home/lustre/master-mine/lustre/include/dt_object.h: 2832 #6 [ffff8b783d273940] mdd_create at ffffffffc0f81f98 [mdd] /home/lustre/master-mine/lustre/mdd/mdd_dir.c: 2827 #7 [ffff8b783d273a40] mdt_reint_open at ffffffffc1038328 [mdt] /home/lustre/master-mine/lustre/mdt/mdt_open.c: 1574 #8 [ffff8b783d273bf8] mdt_reint_rec at ffffffffc102731f [mdt] /home/lustre/master-mine/lustre/mdt/mdt_reint.c: 3240 #9 [ffff8b783d273c20] mdt_reint_internal at ffffffffc0ff6ef6 [mdt] /home/lustre/master-mine/libcfs/include/libcfs/libcfs_debug.h: 155 #10 [ffff8b783d273c58] mdt_intent_open at ffffffffc1002982 [mdt] /home/lustre/master-mine/lustre/mdt/mdt_handler.c: 4826 #11 [ffff8b783d273c98] mdt_intent_policy at ffffffffc0fffe79 [mdt] /home/lustre/master-mine/lustre/mdt/mdt_handler.c: 4971 #12 [ffff8b783d273cf8] ldlm_lock_enqueue at ffffffffc08bdbdf [ptlrpc] /home/lustre/master-mine/lustre/ptlrpc/../../lustre/ldlm/ldlm_lock.c: 1794 #13 [ffff8b783d273d60] ldlm_handle_enqueue0 at ffffffffc08e5046 [ptlrpc] /home/lustre/master-mine/lustre/ptlrpc/../../lustre/ldlm/ldlm_lockd.c: 1441 #14 [ffff8b783d273dd8] tgt_enqueue at ffffffffc091fd1f [ptlrpc] /home/lustre/master-mine/lustre/ptlrpc/../../lustre/target/tgt_handler.c: 1446 #15 [ffff8b783d273df0] tgt_request_handle at ffffffffc0926147 [ptlrpc] /home/lustre/master-mine/lustre/include/lu_target.h: 645 #16 [ffff8b783d273e68] handle_recovery_req at ffffffffc08c8c3c [ptlrpc] /home/lustre/master-mine/lustre/ptlrpc/../../lustre/ldlm/ldlm_lib.c: 2418 #17 [ffff8b783d273e98] target_recovery_thread at ffffffffc08d1300 [ptlrpc] /home/lustre/master-mine/lustre/ptlrpc/../../lustre/ldlm/ldlm_lib.c: 2677 #18 [ffff8b783d273f10] kthread at ffffffff8f0d5199 /tmp/kernel/kernel/kthread.c: 340
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/45822/
            Subject: LU-14692 osp: deprecate IDIF sequence for MDT0000
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 6d2e7d191a7b27cde62b605dbed14488cfd4d410

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/45822/ Subject: LU-14692 osp: deprecate IDIF sequence for MDT0000 Project: fs/lustre-release Branch: master Current Patch Set: Commit: 6d2e7d191a7b27cde62b605dbed14488cfd4d410

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49720/
            Subject: LU-14692 tests: restore sanity/312 to always_except
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 8767d2e44110fc19e624e963d5ebc788409339d3

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49720/ Subject: LU-14692 tests: restore sanity/312 to always_except Project: fs/lustre-release Branch: master Current Patch Set: Commit: 8767d2e44110fc19e624e963d5ebc788409339d3

            "Li Dongyang <dongyangli@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49754
            Subject: LU-14692 tests: allow FID_SEQ_NORMAL for MDT0000
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set: 1
            Commit: 6b69a998e14917656556e62c6a4e4f33f80e2b4b

            gerrit Gerrit Updater added a comment - "Li Dongyang <dongyangli@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49754 Subject: LU-14692 tests: allow FID_SEQ_NORMAL for MDT0000 Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: 6b69a998e14917656556e62c6a4e4f33f80e2b4b

            People

              dongyang Dongyang Li
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: