Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14692

deprecate use of OST FID SEQ 0 for MDT0000

Details

    • 9223372036854775807

    Description

      Since Lustre 2.4.0 and DNE1, it has been possible to create OST objects using a different FID SEQ range for each MDT, to avoid contention during MDT object precreation.

      Objects that are created by MDT0000 are put into FID SEQ 0 (O/0/d*) on all OSTs and have a filename that is the decimal FID OID in ASCII. However, SEQ=0 objects are remapped to IDIF FID SEQ (0x100000000 | (ost_idx << 16)) so that they are unique across all OSTs.

      Objects that are created by other MDTs (or MDT0000 after 2^48 objects are created in SEQ 0) use a unique SEQ in the FID_SEQ_NORMAL range (> 0x200000400), and use a filename that is the hexadecimal FID OID in ASCII.

      For compatibility with pre-DNE MDTs and OSTs, the use of SEQ=0 by MDT0000 was kept until now, but there has not been a reason to keep this compatibility for new filesystems. It would be better to have MDT0000 assigned a "regular" FID SEQ range at startup, so that the SEQ=0 compatibility can eventually be removed. That would ensure OST objects have "proper and unique" FIDs, and avoid the complexity of mapping between the old SEQ=0 48-bit OID values and the IDIF FIDs.

      Older filesystems using SEQ=0 would eventually delete old objects in this range and/or could be forced to migrate to using new objects to clean up the remaining usage, if necessary.

      Attachments

        1. serial.txt
          778 kB
        2. stdout.txt
          484 kB

        Issue Links

          Activity

            [LU-14692] deprecate use of OST FID SEQ 0 for MDT0000

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50477/
            Subject: LU-14692 tests: wait for osp in conf-sanity/84
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: a9b7d73964b8b655c6c628820464342309f11356

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50477/ Subject: LU-14692 tests: wait for osp in conf-sanity/84 Project: fs/lustre-release Branch: master Current Patch Set: Commit: a9b7d73964b8b655c6c628820464342309f11356

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49754/
            Subject: LU-14692 tests: allow FID_SEQ_NORMAL for MDT0000
            Project: fs/lustre-release
            Branch: b2_15
            Current Patch Set:
            Commit: 1a337b4a5b138eb2846ed12b25f5e1725a647670

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49754/ Subject: LU-14692 tests: allow FID_SEQ_NORMAL for MDT0000 Project: fs/lustre-release Branch: b2_15 Current Patch Set: Commit: 1a337b4a5b138eb2846ed12b25f5e1725a647670

            I hit the same issue that bzzz in (test replay-single 70c):
            https://testing.whamcloud.com/test_sets/cbcbb9b2-656c-44bd-b324-31c9dc39539e

            I have opened a new ticket for this: LU-16692

            eaujames Etienne Aujames added a comment - I hit the same issue that bzzz in (test replay-single 70c): https://testing.whamcloud.com/test_sets/cbcbb9b2-656c-44bd-b324-31c9dc39539e I have opened a new ticket for this: LU-16692

            "Li Dongyang <dongyangli@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50477
            Subject: LU-14692 tests: wait for osp in conf-sanity/84
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 2816476614a92ba675418c7434001d946c8ec81e

            gerrit Gerrit Updater added a comment - "Li Dongyang <dongyangli@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50477 Subject: LU-14692 tests: wait for osp in conf-sanity/84 Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 2816476614a92ba675418c7434001d946c8ec81e
            dongyang Dongyang Li added a comment -

            I will update conf-sanity/84.
            Alex, the new crash is a different issue, mostly because landing of https://review.whamcloud.com/c/fs/lustre-release/+/38424/
            Now, the patch introduces a SEQ width of 16384 in Maloo, so the SEQ change will happen more frequently and randomly.
            To make sure SEQ change doesn't happen after replay_barrier, the patch from 38424 actually has force_new_seq, to change the SEQ for test suites like replay-single starts. It did change the SEQ from the log,
            but I think the seq width of 16384 is not enough for the whole replay-single, given we have only 2 OSTs, more objects will be created for each OST.

            I think there are 2 things we could do: use force_new_seq for every replay_barrier, which I think is a bit too heavy, or we could enlarge the default 16384 SEQ width according to number of OSTs.

            Note we don't really need force_new_seq for conf-sanity/84, the changing of IDIF seq to normal seq happens as soon as osp connects, we just need to wait for that before using replay_barrier.

            dongyang Dongyang Li added a comment - I will update conf-sanity/84. Alex, the new crash is a different issue, mostly because landing of https://review.whamcloud.com/c/fs/lustre-release/+/38424/ Now, the patch introduces a SEQ width of 16384 in Maloo, so the SEQ change will happen more frequently and randomly. To make sure SEQ change doesn't happen after replay_barrier, the patch from 38424 actually has force_new_seq, to change the SEQ for test suites like replay-single starts. It did change the SEQ from the log, but I think the seq width of 16384 is not enough for the whole replay-single, given we have only 2 OSTs, more objects will be created for each OST. I think there are 2 things we could do: use force_new_seq for every replay_barrier, which I think is a bit too heavy, or we could enlarge the default 16384 SEQ width according to number of OSTs. Note we don't really need force_new_seq for conf-sanity/84, the changing of IDIF seq to normal seq happens as soon as osp connects, we just need to wait for that before using replay_barrier.

            this time in Maloo: https://testing.whamcloud.com/test_sets/b36df675-87ec-4fb5-9c8b-57add55397ec

            [11176.501594] LustreError: 567675:0:(osp_internal.h:538:osp_fid_diff()) ASSERTION( fid_seq(fid1) == fid_seq(fid2) ) failed: fid1:[0x280000bd1:0x2c6d:0x0], fid2:[0x280000bd0:0x2c6c:0x0]

            bzzz Alex Zhuravlev added a comment - this time in Maloo: https://testing.whamcloud.com/test_sets/b36df675-87ec-4fb5-9c8b-57add55397ec [11176.501594] LustreError: 567675:0:(osp_internal.h:538:osp_fid_diff()) ASSERTION( fid_seq(fid1) == fid_seq(fid2) ) failed: fid1: [0x280000bd1:0x2c6d:0x0] , fid2: [0x280000bd0:0x2c6c:0x0]

            The test already has something similar:

            # make sure new superblock labels are sync'd before disabling writes
            sync_all_data
            sleep 5
            

            so adding a file create on all OSTs is reasonable.

            adilger Andreas Dilger added a comment - The test already has something similar: # make sure new superblock labels are sync'd before disabling writes sync_all_data sleep 5 so adding a file create on all OSTs is reasonable.

            Can the test be updated to do something simple like "lfs setstripe -i -1 $DIR/$tfile.tmp" to force the sequence update before the replay barrier?

            adilger Andreas Dilger added a comment - Can the test be updated to do something simple like " lfs setstripe -i -1 $DIR/$tfile.tmp " to force the sequence update before the replay barrier?
            dongyang Dongyang Li added a comment -

            From the console log:

            [ 6684.760425] Lustre: Mounted lustre-client
            [ 6686.111594] Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x00000002c0000400-0x0000000300000400]:1:ost
            [ 6686.127442] Lustre: Skipped 2 previous similar messages
            [ 6686.127744] Lustre: cli-lustre-OST0001-super: Allocated super-sequence [0x00000002c0000400-0x0000000300000400]:1:ost]
            [ 6686.127895] Lustre: Skipped 1 previous similar message
            [ 6691.011345] Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000
            [ 6691.028634] Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0000
            [ 6691.211977] Lustre: lustre-OST0001-osc-MDT0000: update sequence from 0x100010000 to 0x2c0000401
            [ 6692.967973] systemd[1]: mnt-lustre\x2dmds1.mount: Succeeded.
            [ 6693.003490] Lustre: Failing over lustre-MDT0000
            

            The sequence update from 0x100010000 to 0x2c0000401 was lost after replay_barrier.

            dongyang Dongyang Li added a comment - From the console log: [ 6684.760425] Lustre: Mounted lustre-client [ 6686.111594] Lustre: ctl-lustre-MDT0000: super -sequence allocation rc = 0 [0x00000002c0000400-0x0000000300000400]:1:ost [ 6686.127442] Lustre: Skipped 2 previous similar messages [ 6686.127744] Lustre: cli-lustre-OST0001- super : Allocated super -sequence [0x00000002c0000400-0x0000000300000400]:1:ost] [ 6686.127895] Lustre: Skipped 1 previous similar message [ 6691.011345] Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000 [ 6691.028634] Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0000 [ 6691.211977] Lustre: lustre-OST0001-osc-MDT0000: update sequence from 0x100010000 to 0x2c0000401 [ 6692.967973] systemd[1]: mnt-lustre\x2dmds1.mount: Succeeded. [ 6693.003490] Lustre: Failing over lustre-MDT0000 The sequence update from 0x100010000 to 0x2c0000401 was lost after replay_barrier.

            it's always conf-sanity/84, stdout/console are attached.

            bzzz Alex Zhuravlev added a comment - it's always conf-sanity/84, stdout/console are attached.

            People

              dongyang Dongyang Li
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: