Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16692

replay-single: test_70c osp_fid_diff()) ASSERTION( fid_seq(fid1) == fid_seq(fid2) )

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for eaujames <eaujames@ddn.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/cbcbb9b2-656c-44bd-b324-31c9dc39539e

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-reviews/93413 - 4.18.0-348.7.1.el8_5.x86_64
      servers: https://build.whamcloud.com/job/lustre-reviews/93413 - 4.18.0-348.23.1.el8_lustre.x86_64

      The MDS crash with the following stack:

      [11121.479852] LustreError: 567908:0:(osp_internal.h:538:osp_fid_diff()) ASSERTION( fid_seq(fid1) == fid_seq(fid2) ) failed: fid1:[0x280000bd1:0x2da4:0x0], fid2:[0x280000bd0:0x2da3:0x0]
      [11121.482473] LustreError: 567908:0:(osp_internal.h:538:osp_fid_diff()) LBUG
      [11121.483602] Pid: 567908, comm: tgt_recover_0 4.18.0-348.23.1.el8_lustre.x86_64 #1 SMP Thu Mar 2 00:54:25 UTC 2023
      [11121.485233] Call Trace TBD:
      [11121.485974] [<0>] libcfs_call_trace+0x6f/0x90 [libcfs]
      [11121.486859] [<0>] lbug_with_loc+0x43/0x80 [libcfs]
      [11121.487726] [<0>] osp_create+0x1dc/0xba0 [osp]
      [11121.488585] [<0>] lod_sub_create+0x24a/0x4e0 [lod]
      [11121.489407] [<0>] lod_striped_create+0x1a6/0x590 [lod]
      [11121.490282] [<0>] lod_xattr_set+0x3f1/0x1300 [lod]
      [11121.491160] [<0>] mdo_xattr_set+0x101/0x470 [mdd]
      [11121.491967] [<0>] mdd_create_object+0x3cc/0x920 [mdd]
      [11121.492821] [<0>] mdd_create+0xe62/0x1a30 [mdd]
      [11121.493769] [<0>] mdt_reint_open+0x2d1a/0x3180 [mdt]
      [11121.494617] [<0>] mdt_reint_rec+0x117/0x270 [mdt]
      [11121.495442] [<0>] mdt_reint_internal+0x4bf/0x7d0 [mdt]
      [11121.496318] [<0>] mdt_intent_open+0x137/0x420 [mdt]
      [11121.497164] [<0>] mdt_intent_opc+0x12c/0xbf0 [mdt]
      [11121.497980] [<0>] mdt_intent_policy+0x207/0x3a0 [mdt]
      [11121.499224] [<0>] ldlm_lock_enqueue+0x47b/0xb20 [ptlrpc]
      [11121.500246] [<0>] ldlm_handle_enqueue0+0x634/0x1520 [ptlrpc]
      [11121.501234] [<0>] tgt_enqueue+0xa4/0x210 [ptlrpc]
      [11121.502079] [<0>] tgt_request_handle+0xcbf/0x1920 [ptlrpc]
      [11121.503032] [<0>] handle_recovery_req+0x140/0x270 [ptlrpc]
      [11121.504009] [<0>] replay_request_or_update.isra.31+0x2fa/0xa20 [ptlrpc]
      [11121.505135] [<0>] target_recovery_thread+0x736/0x1320 [ptlrpc]
      [11121.506146] [<0>] kthread+0x116/0x130
      [11121.506798] [<0>] ret_from_fork+0x35/0x40
      

      A comment of Dongyang Li in LU-14692 say that this could be linked to https://review.whamcloud.com/c/fs/lustre-release/+/38424/ ("LU-11912 ofd: reduce LUSTRE_DATA_SEQ_MAX_WIDTH").

      Attachments

        Issue Links

          Activity

            [LU-16692] replay-single: test_70c osp_fid_diff()) ASSERTION( fid_seq(fid1) == fid_seq(fid2) )
            pjones Peter Jones added a comment -

            All patches merged for 2.16

            pjones Peter Jones added a comment - All patches merged for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/54840/
            Subject: LU-16692 tests: force_new_seq_all interop version checking
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: f539e6fbcaef1eefcc9a365fea23400d132cefb8

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/54840/ Subject: LU-16692 tests: force_new_seq_all interop version checking Project: fs/lustre-release Branch: master Current Patch Set: Commit: f539e6fbcaef1eefcc9a365fea23400d132cefb8

            "Li Dongyang <dongyangli@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54840
            Subject: LU-16692 tests: force_new_seq_all interop version checking
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 944c6d7017c08cc81d72b43cc4fc73a820111dd1

            gerrit Gerrit Updater added a comment - "Li Dongyang <dongyangli@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54840 Subject: LU-16692 tests: force_new_seq_all interop version checking Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 944c6d7017c08cc81d72b43cc4fc73a820111dd1
            dongyang Dongyang Li added a comment -

            It's tricky because there's no specific test case triggers this assert, it's random depends on the test environment config.
            That's why we had the force_new_seq_all in the beginning of the test suites to rectify this.
            This particular assert

            osp_precreate_rollover_new_seq()) ASSERTION( fid_seq(fid) != fid_seq(last_fid) ) failed: fid [0x240000bd0:0x1:0x0], last_fid [0x240000bd0:0x4924:0x0]
            

            was the reason we added force_new_seq to replay-ost-single.sh

            I guess we need to add the force_new_seq back with version check:

                    (( MDS1_VERSION >= $(version_code v2_15_61-226-gf00d2467fc) )) ||
                            force_new_seq_all
            

            to those test suites.

            dongyang Dongyang Li added a comment - It's tricky because there's no specific test case triggers this assert, it's random depends on the test environment config. That's why we had the force_new_seq_all in the beginning of the test suites to rectify this. This particular assert osp_precreate_rollover_new_seq()) ASSERTION( fid_seq(fid) != fid_seq(last_fid) ) failed: fid [0x240000bd0:0x1:0x0], last_fid [0x240000bd0:0x4924:0x0] was the reason we added force_new_seq to replay-ost-single.sh I guess we need to add the force_new_seq back with version check: (( MDS1_VERSION >= $(version_code v2_15_61-226-gf00d2467fc) )) || force_new_seq_all to those test suites.

            Dongyang, now that this patch has landed, I've seen an interop test failure triggering the removed assertion running against an old server (e.g. large-scale test_3a, https://testing.whamcloud.com/test_sets/891ed21b-4a9e-454b-836d-a82455ce177d):

            osp_precreate_rollover_new_seq()) ASSERTION( fid_seq(fid) != fid_seq(last_fid) ) failed: fid [0x240000bd0:0x1:0x0], last_fid [0x240000bd0:0x4924:0x0]
            

            Can you please add a check into the tests that used to call force_new_seq_all() so that they don't start failing randomly during interop testing:

                    (( MDS1_VERSION >= $(version_code v2_15_61-226-gf00d2467fc) )) ||
                            skip "need MDS >= 2.15.61.226 to avoid SEQ rollover assertion"
            
            adilger Andreas Dilger added a comment - Dongyang, now that this patch has landed, I've seen an interop test failure triggering the removed assertion running against an old server (e.g. large-scale test_3a, https://testing.whamcloud.com/test_sets/891ed21b-4a9e-454b-836d-a82455ce177d): osp_precreate_rollover_new_seq()) ASSERTION( fid_seq(fid) != fid_seq(last_fid) ) failed: fid [0x240000bd0:0x1:0x0], last_fid [0x240000bd0:0x4924:0x0] Can you please add a check into the tests that used to call force_new_seq_all() so that they don't start failing randomly during interop testing: (( MDS1_VERSION >= $(version_code v2_15_61-226-gf00d2467fc) )) || skip "need MDS >= 2.15.61.226 to avoid SEQ rollover assertion"
            pjones Peter Jones added a comment -

            Merged for 2.16

            pjones Peter Jones added a comment - Merged for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/54433/
            Subject: LU-16692 tests: remove force_new_seq from some test suites
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 9ef186b71b350127e7cfb67be5729f9e0bd39c79

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/54433/ Subject: LU-16692 tests: remove force_new_seq from some test suites Project: fs/lustre-release Branch: master Current Patch Set: Commit: 9ef186b71b350127e7cfb67be5729f9e0bd39c79

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/54020/
            Subject: LU-16692 osp: do not assert on seq got over network
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: f00d2467fc7c5ebd8a313683e039bf945a4b7094

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/54020/ Subject: LU-16692 osp: do not assert on seq got over network Project: fs/lustre-release Branch: master Current Patch Set: Commit: f00d2467fc7c5ebd8a313683e039bf945a4b7094

            "Li Dongyang <dongyangli@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54433
            Subject: LU-16692 tests: remove force_new_seq from some test suites
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 56b5836a74870452792d3ce5ab649cecb6f37baf

            gerrit Gerrit Updater added a comment - "Li Dongyang <dongyangli@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54433 Subject: LU-16692 tests: remove force_new_seq from some test suites Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 56b5836a74870452792d3ce5ab649cecb6f37baf

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/54087/
            Subject: LU-16692 osp: osp_fid_diff vs rollover_new_seq race
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: bc256c25631960e1386f3359bb6c85cfe6481fb7

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/54087/ Subject: LU-16692 osp: osp_fid_diff vs rollover_new_seq race Project: fs/lustre-release Branch: master Current Patch Set: Commit: bc256c25631960e1386f3359bb6c85cfe6481fb7

            People

              dongyang Dongyang Li
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: