Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8287

sanity-sec test_16: mgs and c0 idmap mismatch, 10 attempts

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.9.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Andreas Dilger <andreas.dilger@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/c7f684c6-2a8e-11e6-acf3-5254006e85c2.

      This has failed quite a number of times, but was overshadowed by LU-8279 also causing a large number of sanity-sec failures.

      The sub-test test_16 failed with the following error:

      mgs and c0 idmap mismatch, 10 attempts
      

      It looks like the OST rebooted for some reason, since the modules are not loaded and the dmesg log is empty:

      trevis-54vm8: opening /dev/lnet failed: No such device
      trevis-54vm8: hint: the kernel modules may not be loaded
      trevis-54vm8: IOC_LIBCFS_GET_NI error 19: No such device
      /usr/lib64/lustre/tests/sanity-sec.sh: line 981: [: ==: unary operator expected
      

      Info required for matching: sanity-sec 16

      Attachments

        Issue Links

          Activity

            [LU-8287] sanity-sec test_16: mgs and c0 idmap mismatch, 10 attempts
            standan Saurabh Tandan (Inactive) added a comment - - edited Reopening as this issue is still seen on master. https://testing.hpdd.intel.com/test_sets/abdb13dc-a627-11e6-964e-5254006e85c2

            The root cause in that case is due to the OSS kernel panic'ing, you can see that the Lustre modules are no longer loaded:
            onyx-35vm4: opening /dev/lnet failed: No such device
            onyx-35vm4: hint: the kernel modules may not be loaded
            onyx-35vm4: IOC_LIBCFS_GET_NI error 19: No such device

            Unfortunately the console logs don't appear to have been saved, is there a way to make sure these are saved?

            kit.westneat Kit Westneat (Inactive) added a comment - The root cause in that case is due to the OSS kernel panic'ing, you can see that the Lustre modules are no longer loaded: onyx-35vm4: opening /dev/lnet failed: No such device onyx-35vm4: hint: the kernel modules may not be loaded onyx-35vm4: IOC_LIBCFS_GET_NI error 19: No such device Unfortunately the console logs don't appear to have been saved, is there a way to make sure these are saved?

            This issue is still seen on master, build# 3468 for Full EL7.2 Server/EL7.2 Client - ZFS causing following tests to fail.
            https://testing.hpdd.intel.com/test_sets/e2e0de84-a258-11e6-bf05-5254006e85c2

            standan Saurabh Tandan (Inactive) added a comment - This issue is still seen on master, build# 3468 for Full EL7.2 Server/EL7.2 Client - ZFS causing following tests to fail. https://testing.hpdd.intel.com/test_sets/e2e0de84-a258-11e6-bf05-5254006e85c2
            pjones Peter Jones added a comment -

            Landed for 2.9

            pjones Peter Jones added a comment - Landed for 2.9

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20954/
            Subject: LU-8287 nodemap: don't stop config lock when target stops
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 45ce1044cd7b94621e1161cd23c600f8e1c18317

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20954/ Subject: LU-8287 nodemap: don't stop config lock when target stops Project: fs/lustre-release Branch: master Current Patch Set: Commit: 45ce1044cd7b94621e1161cd23c600f8e1c18317
            yujian Jian Yu added a comment - One more failure instance on master: https://testing.hpdd.intel.com/test_sets/7d5b308e-594f-11e6-b5b1-5254006e85c2
            yong.fan nasf (Inactive) added a comment - Another failure instance on master: https://testing.hpdd.intel.com/test_sets/aa77a6ac-53c2-11e6-a39e-5254006e85c2
            yujian Jian Yu added a comment - One more failure instance on master branch: https://testing.hpdd.intel.com/test_sets/6b55672c-4fa6-11e6-bf87-5254006e85c2
            yujian Jian Yu added a comment -

            This is affecting patch review testing on master branch.

            yujian Jian Yu added a comment - This is affecting patch review testing on master branch.
            yujian Jian Yu added a comment - More failure instance on master branch: https://testing.hpdd.intel.com/test_sets/d6c0079e-403b-11e6-acf3-5254006e85c2

            Kit Westneat (kit.westneat@gmail.com) uploaded a new patch: http://review.whamcloud.com/20954
            Subject: LU-8287 nodemap: don't stop config lock when target stops
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 59939efdad721ef3e48a7104e020b627383ffa88

            gerrit Gerrit Updater added a comment - Kit Westneat (kit.westneat@gmail.com) uploaded a new patch: http://review.whamcloud.com/20954 Subject: LU-8287 nodemap: don't stop config lock when target stops Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 59939efdad721ef3e48a7104e020b627383ffa88

            People

              kit.westneat Kit Westneat (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: