Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17638

sanity-lnet test_0: Failed to export global yaml 139

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for S Buisson <sbuisson@ddn.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/f56ad3b7-3cb1-4525-8857-832acd3b7a67

      test_0 failed with the following error:

      Failed to export global yaml 139
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-reviews/103051 - 5.14.21-150500.55.39-default
      servers: https://build.whamcloud.com/job/lustre-reviews/103051 - 4.18.0-513.9.1.el8_lustre.x86_64

      Many other sanity-lnet tests failed with this same error during this session.

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity-lnet test_0 - Failed to export global yaml 139

      Attachments

        Issue Links

          Activity

            [LU-17638] sanity-lnet test_0: Failed to export global yaml 139
            simmonsja James A Simmons added a comment - - edited

            Peter don't close the ticket once the current patch lands which is a hot fix. I'm working on a proper fix. We can reduce the ticket from being a blocker tho.

            simmonsja James A Simmons added a comment - - edited Peter don't close the ticket once the current patch lands which is a hot fix. I'm working on a proper fix. We can reduce the ticket from being a blocker tho.
            simmonsja James A Simmons added a comment - - edited

            The crash is happening the user land code so we can just do a partial revert for lnetctl.c until we figure it out. I have theory about the failures from my debugging patches last night since they seem to be passing. I think you need to run all the test to see the failure. If you run sanity-lnet I bet it passes.

            simmonsja James A Simmons added a comment - - edited The crash is happening the user land code so we can just do a partial revert for lnetctl.c until we figure it out. I have theory about the failures from my debugging patches last night since they seem to be passing. I think you need to run all the test to see the failure. If you run sanity-lnet I bet it passes.

            "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54436
            Subject: LU-17638 util: remove newer lnetctl export handling
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: bd583ccbe0e8f7c096fc8a3e64699cbd779d6521

            gerrit Gerrit Updater added a comment - "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54436 Subject: LU-17638 util: remove newer lnetctl export handling Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: bd583ccbe0e8f7c096fc8a3e64699cbd779d6521

            "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54435
            Subject: LU-17638 lnet: debug export breakage
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 132e3031d1429598e4a75efbb76d20d2e3213e29

            gerrit Gerrit Updater added a comment - "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54435 Subject: LU-17638 lnet: debug export breakage Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 132e3031d1429598e4a75efbb76d20d2e3213e29
            gerrit Gerrit Updater added a comment - - edited

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54432
            Subject: LU-17638 revert: "LU-9680 lnet: add NLM_F_DUMP_FILTERED support"
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: ce6012610e0b1f80a26d1675af552d0b086a3638

            gerrit Gerrit Updater added a comment - - edited "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54432 Subject: LU-17638 revert: " LU-9680 lnet: add NLM_F_DUMP_FILTERED support" Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: ce6012610e0b1f80a26d1675af552d0b086a3638
            gerrit Gerrit Updater added a comment - - edited

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54429
            Subject: LU-17638 tests: test sanity-lnet on 65e0802f2a
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: cb05cf7068ccfc1068275299bf7cdd9c84a64176

            gerrit Gerrit Updater added a comment - - edited "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54429 Subject: LU-17638 tests: test sanity-lnet on 65e0802f2a Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: cb05cf7068ccfc1068275299bf7cdd9c84a64176

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54428
            Subject: LU-17638 tests: test sanity-lnet on 7f8cde3b77
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 1f6c8aaa05f5234b5bf79e7f61bb8f143979892b

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54428 Subject: LU-17638 tests: test sanity-lnet on 7f8cde3b77 Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 1f6c8aaa05f5234b5bf79e7f61bb8f143979892b

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54427
            Subject: LU-17638 tests: test sanity-lnet on 0a0e881d88
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 44a20cdef374d78033b74874561bc111cd1722b6

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54427 Subject: LU-17638 tests: test sanity-lnet on 0a0e881d88 Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 44a20cdef374d78033b74874561bc111cd1722b6

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54426
            Subject: LU-17638 tests: test sanity-lnet on 11d851c51c
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 83369e60602298d9265f14307f4f2f1a340b7979

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54426 Subject: LU-17638 tests: test sanity-lnet on 11d851c51c Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 83369e60602298d9265f14307f4f2f1a340b7979

            My recommendation to debug this quickly would be to push test patches with parents at different locations within the recent landings to bisect the failure. The recent landings are:

            4ef6bcb0cd LU-17600 lnet: delete lbstats and lnetunload                  *BROKEN*
            bf2257d47f LU-16752 test: improve sanity 413a/b reliability              UNLIKELY
            53f1c60bb7 LU-16695 llite: remove O_APPEND check for sync                UNLIKELY
            65e0802f2a LU-17566 mdt: remove duplicate call to mdt_init_ucred_reint() UNLIKELY
            7101742b45 LU-17490 tests: verify fanotify works for lustre              UNLIKELY
            5b07dce19b LU-17434 lmv: add exclude list for remote dir                 UNLIKELY
            414467762f LU-17175 gss: start lsvcgssd from l_getauth                   UNLIKELY
            7e1fb1a296 LU-17179 tests: check the system is clean                     UNLIKELY
            7f8cde3b77 LU-9859 lnet: move CPT handling to LNet                  *****
            fd4c531bbb LU-14361 statahead: add connect flag check for batch RPC      UNLIKELY
            76325fbb0d LU-8066 obdclass: fix module load locking.               *****
            ea0a446576 LU-6142 mdd: Fix style issues for mdd_dir.c                   UNLIKELY
            ab7e5929f1 LU-6142 lfsck: Fix style issues for lfsck_striped_dir.c       UNLIKELY
            5dc34116fe LU-6142 lfsck: Fix style issues for lfsck_namespace.c         UNLIKELY
            595e10784d LU-6142 lfsck: Fix style issues for lfsck_engine.c            UNLIKELY
            b0c1ede625 LU-6142 lfsck: Fix style issues under lustre/lfsck            UNLIKELY
            0a0e881d88 LU-17578 lnet: fix &the_lnet.ln_mt_peerNIRecovq race     *****
            d3ef8f6993 LU-9680 lnet: add NLM_F_DUMP_FILTERED support            *****
            1546a179a2 LU-13814 osc: Remove oap_request                              UNLIKELY
            11d851c51c LU-10391 obd: Update lmd_parse to handle IPv6 NIDs       *****
            e502638050 LU-16011 lnet: use preallocate bulk for server           *****
            f45a0288b0 LU-17611 utils: fix wrong static declarations                 *WORKS*
            

            This leaves us with half a dozen likely candidates, and a dozen unlikely patches early in the series, so it probably makes sense to push patches on 11d851c51c, 0a0e881d88, and 7f8cde3b77 to see which one breaks and then narrow it down from there.

            adilger Andreas Dilger added a comment - My recommendation to debug this quickly would be to push test patches with parents at different locations within the recent landings to bisect the failure. The recent landings are: 4ef6bcb0cd LU-17600 lnet: delete lbstats and lnetunload *BROKEN* bf2257d47f LU-16752 test: improve sanity 413a/b reliability              UNLIKELY 53f1c60bb7 LU-16695 llite: remove O_APPEND check for sync UNLIKELY 65e0802f2a LU-17566 mdt: remove duplicate call to mdt_init_ucred_reint() UNLIKELY 7101742b45 LU-17490 tests: verify fanotify works for lustre              UNLIKELY 5b07dce19b LU-17434 lmv: add exclude list for remote dir UNLIKELY 414467762f LU-17175 gss: start lsvcgssd from l_getauth UNLIKELY 7e1fb1a296 LU-17179 tests: check the system is clean                     UNLIKELY 7f8cde3b77 LU-9859 lnet: move CPT handling to LNet ***** fd4c531bbb LU-14361 statahead: add connect flag check for batch RPC UNLIKELY 76325fbb0d LU-8066 obdclass: fix module load locking. ***** ea0a446576 LU-6142 mdd: Fix style issues for mdd_dir.c UNLIKELY ab7e5929f1 LU-6142 lfsck: Fix style issues for lfsck_striped_dir.c     UNLIKELY 5dc34116fe LU-6142 lfsck: Fix style issues for lfsck_namespace.c UNLIKELY 595e10784d LU-6142 lfsck: Fix style issues for lfsck_engine.c UNLIKELY b0c1ede625 LU-6142 lfsck: Fix style issues under lustre/lfsck UNLIKELY 0a0e881d88 LU-17578 lnet: fix &the_lnet.ln_mt_peerNIRecovq race ***** d3ef8f6993 LU-9680 lnet: add NLM_F_DUMP_FILTERED support ***** 1546a179a2 LU-13814 osc: Remove oap_request UNLIKELY 11d851c51c LU-10391 obd: Update lmd_parse to handle IPv6 NIDs ***** e502638050 LU-16011 lnet: use preallocate bulk for server ***** f45a0288b0 LU-17611 utils: fix wrong static declarations *WORKS* This leaves us with half a dozen likely candidates, and a dozen unlikely patches early in the series, so it probably makes sense to push patches on 11d851c51c , 0a0e881d88 , and 7f8cde3b77 to see which one breaks and then narrow it down from there.

            Is there a recently-landed patch that can be reverted? It looks like this is causing every review test session to fail sanity-lnet every time.

            Given that we didn't see this during the pre-landing review testing (which runs sanity-lnet even if a patch is marked "trivial"), then I suspect it is a bad interaction between two separate patches that are modifying related code, but not overlapping.

            adilger Andreas Dilger added a comment - Is there a recently-landed patch that can be reverted? It looks like this is causing every review test session to fail sanity-lnet every time. Given that we didn't see this during the pre-landing review testing (which runs sanity-lnet even if a patch is marked "trivial"), then I suspect it is a bad interaction between two separate patches that are modifying related code, but not overlapping.

            People

              simmonsja James A Simmons
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: