Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1895

Failover Test failure on test suite mmp, subtest test_5

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.12.0, Lustre 2.10.6
    • Lustre 2.3.0, (12)
      Lustre 2.4.0, Lustre 2.1.4, Lustre 2.4.1, Lustre 2.6.0, Lustre 2.7.0, Lustre 2.8.0, Lustre 2.9.0, Lustre 2.10.0, Lustre 2.10.1, Lustre 2.11.0, Lustre 2.12.0, Lustre 2.10.5
    • 3
    • 4082

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/da5b8f26-f92d-11e1-a1b8-52540035b04c.

      The sub-test test_5 failed with the following error:

      test_5 failed with 1

      Lustre: DEBUG MARKER: == mmp test 5: mount during unmount of the first filesystem ========================================== 07:54:47 (1347029687)
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark Mounting \/dev\/lvm-MDS\/P1 on mds1...
      Lustre: DEBUG MARKER: Mounting /dev/lvm-MDS/P1 on mds1...
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark Unmounting \/dev\/lvm-MDS\/P1 on mds1...
      Lustre: DEBUG MARKER: Unmounting /dev/lvm-MDS/P1 on mds1...
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark Mounting \/dev\/lvm-MDS\/P1 on mds1failover...
      Lustre: DEBUG MARKER: Mounting /dev/lvm-MDS/P1 on mds1failover...
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark Mounting \/dev\/lvm-OSS\/P1 on ost1...
      Lustre: DEBUG MARKER: Mounting /dev/lvm-OSS/P1 on ost1...
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark Unmounting \/dev\/lvm-OSS\/P1 on ost1...
      Lustre: DEBUG MARKER: Unmounting /dev/lvm-OSS/P1 on ost1...
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark Mounting \/dev\/lvm-OSS\/P1 on ost1failover...
      Lustre: DEBUG MARKER: Mounting /dev/lvm-OSS/P1 on ost1failover...
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark  mmp test_5: @@@@@@ FAIL: mount during unmount of the first filesystem should fail
      

      Attachments

        Issue Links

          Activity

            [LU-1895] Failover Test failure on test suite mmp, subtest test_5
            jcasper James Casper (Inactive) added a comment - 2.9.57, b3575: https://testing.hpdd.intel.com/test_sessions/1f3eb5f9-bb00-4ea9-be10-4318eabbff04
            standan Saurabh Tandan (Inactive) added a comment - - edited Another instance found on b2_8 for failover testing , build# 6. https://testing.hpdd.intel.com/test_sessions/0aed3028-da39-11e5-a8a6-5254006e85c2 https://testing.hpdd.intel.com/test_sessions/eaf85780-d65e-11e5-afe8-5254006e85c2 https://testing.hpdd.intel.com/test_sessions/eb9f29ec-d8da-11e5-83e2-5254006e85c2 https://testing.hpdd.intel.com/test_sessions/2f0aa9f6-d5a5-11e5-9cc2-5254006e85c2 https://testing.hpdd.intel.com/test_sessions/c5a8e44c-d9c7-11e5-85dd-5254006e85c2

            Another instance found for hardfailover : EL6.7 Server/Client, tag 2.7.66, master build 3314
            https://testing.hpdd.intel.com/test_sessions/7c5e8006-cb2d-11e5-b3e8-5254006e85c2

            Another instance found for hardfailover : EL7 Server/Client, tag 2.7.66, master build 3314
            https://testing.hpdd.intel.com/test_sessions/8d13249a-ca8f-11e5-9609-5254006e85c2

            Another instance found for hardfailover : EL7 Server/SLES11 SP3 Client, tag 2.7.66, master build 3316
            https://testing.hpdd.intel.com/test_sessions/2fbf67e4-cd4c-11e5-b1fa-5254006e85c2

            standan Saurabh Tandan (Inactive) added a comment - Another instance found for hardfailover : EL6.7 Server/Client, tag 2.7.66, master build 3314 https://testing.hpdd.intel.com/test_sessions/7c5e8006-cb2d-11e5-b3e8-5254006e85c2 Another instance found for hardfailover : EL7 Server/Client, tag 2.7.66, master build 3314 https://testing.hpdd.intel.com/test_sessions/8d13249a-ca8f-11e5-9609-5254006e85c2 Another instance found for hardfailover : EL7 Server/SLES11 SP3 Client, tag 2.7.66, master build 3316 https://testing.hpdd.intel.com/test_sessions/2fbf67e4-cd4c-11e5-b1fa-5254006e85c2
            standan Saurabh Tandan (Inactive) added a comment - - edited

            Another instance found for hardfailover: EL7 Server/Client
            https://testing.hpdd.intel.com/test_sets/3506468a-bc00-11e5-a592-5254006e85c2

            Another instance found for hardfailover: EL7 Server/SLES11 SP3 Client
            build# 3303
            https://testing.hpdd.intel.com/test_sets/900a5b64-bb2b-11e5-b3d5-5254006e85c2

            standan Saurabh Tandan (Inactive) added a comment - - edited Another instance found for hardfailover: EL7 Server/Client https://testing.hpdd.intel.com/test_sets/3506468a-bc00-11e5-a592-5254006e85c2 Another instance found for hardfailover: EL7 Server/SLES11 SP3 Client build# 3303 https://testing.hpdd.intel.com/test_sets/900a5b64-bb2b-11e5-b3d5-5254006e85c2
            standan Saurabh Tandan (Inactive) added a comment - - edited

            Another instance found for hardfailover: EL6.7 Server/Client
            https://testing.hpdd.intel.com/test_sets/47839e28-bc93-11e5-8f65-5254006e85c2

            Another instance found for hardfailover: EL6.7 Server/SLES11 SP3 Clients
            https://testing.hpdd.intel.com/test_sets/96492436-ba4c-11e5-9a07-5254006e85c2

            standan Saurabh Tandan (Inactive) added a comment - - edited Another instance found for hardfailover: EL6.7 Server/Client https://testing.hpdd.intel.com/test_sets/47839e28-bc93-11e5-8f65-5254006e85c2 Another instance found for hardfailover: EL6.7 Server/SLES11 SP3 Clients https://testing.hpdd.intel.com/test_sets/96492436-ba4c-11e5-9a07-5254006e85c2
            sarah Sarah Liu added a comment - hit this in 2.7.0-rc2 testing https://testing.hpdd.intel.com/test_sets/0ee08f32-c157-11e4-9d86-5254006e85c2
            sarah Sarah Liu added a comment - another instance in 2.6 https://maloo.whamcloud.com/test_sets/e8249cd2-a418-11e3-8c6a-52540035b04c
            sarah Sarah Liu added a comment -

            also seen in tag-2.3.54 testing:

            server and client: lustre-master build # 1837 RHEL6 ldiskfs

            https://maloo.whamcloud.com/test_sets/a6b7e1f4-8411-11e3-a854-52540035b04c

            sarah Sarah Liu added a comment - also seen in tag-2.3.54 testing: server and client: lustre-master build # 1837 RHEL6 ldiskfs https://maloo.whamcloud.com/test_sets/a6b7e1f4-8411-11e3-a854-52540035b04c
            yujian Jian Yu added a comment -

            Lustre Tag: v2_4_1_RC2
            Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/45/
            Distro/Arch: RHEL6.4/x86_64
            Testgroup: failover

            mmp test 5 failed again:
            https://maloo.whamcloud.com/test_sets/21323846-194e-11e3-bb73-52540035b04c

            yujian Jian Yu added a comment - Lustre Tag: v2_4_1_RC2 Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/45/ Distro/Arch: RHEL6.4/x86_64 Testgroup: failover mmp test 5 failed again: https://maloo.whamcloud.com/test_sets/21323846-194e-11e3-bb73-52540035b04c
            yujian Jian Yu added a comment -

            Lustre Tag: v2_4_0_RC1
            Lustre Build: http://build.whamcloud.com/job/lustre-master/1501/
            Distro/Arch: RHEL6.4/x86_64
            Test Group: failover

            The issue occurred again:
            https://maloo.whamcloud.com/test_sets/b0bf3f88-c00f-11e2-8398-52540035b04c

            Unmounting /dev/lvm-OSS/P1 on ost1 (client-31vm4):

            11:55:51:Lustre: DEBUG MARKER: Unmounting /dev/lvm-OSS/P1 on ost1...
            11:55:52:Lustre: DEBUG MARKER: grep -c /mnt/ost1' ' /proc/mounts
            11:55:52:Lustre: DEBUG MARKER: /usr/sbin/lctl mark Mounting \/dev\/lvm-OSS\/P1 on ost1failover...
            11:55:52:Lustre: DEBUG MARKER: Mounting /dev/lvm-OSS/P1 on ost1failover...
            11:55:52:Lustre: DEBUG MARKER: umount -d /mnt/ost1
            11:55:52:Lustre: Failing over lustre-OST0000
            11:55:52:Lustre: server umount lustre-OST0000 complete

            Mounting /dev/lvm-OSS/P1 on ost1failover (client-31vm8):

            11:55:52:Mounting /dev/lvm-OSS/P1 on ost1failover...
            11:55:52:CMD: client-31vm4 grep -c /mnt/ost1' ' /proc/mounts
            11:55:52:Stopping /mnt/ost1 (opts on client-31vm4
            11:55:52:CMD: client-31vm4 umount -d /mnt/ost1
            11:55:52:CMD: client-31vm8 mkdir -p /mnt/ost1failover
            11:55:53:CMD: client-31vm8 test -b /dev/lvm-OSS/P1
            11:55:53:CMD: client-31vm4 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
            11:55:53:Starting ost1failover: /dev/lvm-OSS/P1 /mnt/ost1failover
            11:55:53:CMD: client-31vm8 mkdir -p /mnt/ost1failover; mount -t lustre /dev/lvm-OSS/P1 /mnt/ost1failover
            11:55:53:CMD: client-31vm8 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin::/sbin:/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh set_default_debug \"vfstrace rpctrace dlmtrace neterror ha config ioctl super\" \"all -lnet -lnd -pinger\" 48
            11:55:54:CMD: client-31vm8 e2label /dev/lvm-OSS/P1 2>/dev/null
            11:55:54:Started lustre-OST0000

            This is a test script issue.

            yujian Jian Yu added a comment - Lustre Tag: v2_4_0_RC1 Lustre Build: http://build.whamcloud.com/job/lustre-master/1501/ Distro/Arch: RHEL6.4/x86_64 Test Group: failover The issue occurred again: https://maloo.whamcloud.com/test_sets/b0bf3f88-c00f-11e2-8398-52540035b04c Unmounting /dev/lvm-OSS/P1 on ost1 (client-31vm4): 11:55:51:Lustre: DEBUG MARKER: Unmounting /dev/lvm-OSS/P1 on ost1... 11:55:52:Lustre: DEBUG MARKER: grep -c /mnt/ost1' ' /proc/mounts 11:55:52:Lustre: DEBUG MARKER: /usr/sbin/lctl mark Mounting \/dev\/lvm-OSS\/P1 on ost1failover... 11:55:52:Lustre: DEBUG MARKER: Mounting /dev/lvm-OSS/P1 on ost1failover... 11:55:52:Lustre: DEBUG MARKER: umount -d /mnt/ost1 11:55:52:Lustre: Failing over lustre-OST0000 11:55:52 :Lustre: server umount lustre-OST0000 complete Mounting /dev/lvm-OSS/P1 on ost1failover (client-31vm8): 11:55:52:Mounting /dev/lvm-OSS/P1 on ost1failover... 11:55:52:CMD: client-31vm4 grep -c /mnt/ost1' ' /proc/mounts 11:55:52:Stopping /mnt/ost1 (opts on client-31vm4 11:55:52:CMD: client-31vm4 umount -d /mnt/ost1 11:55:52:CMD: client-31vm8 mkdir -p /mnt/ost1failover 11:55:53:CMD: client-31vm8 test -b /dev/lvm-OSS/P1 11:55:53:CMD: client-31vm4 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' 11:55:53:Starting ost1failover: /dev/lvm-OSS/P1 /mnt/ost1failover 11:55:53 :CMD: client-31vm8 mkdir -p /mnt/ost1failover; mount -t lustre /dev/lvm-OSS/P1 /mnt/ost1failover 11:55:53:CMD: client-31vm8 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin::/sbin:/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh set_default_debug \"vfstrace rpctrace dlmtrace neterror ha config ioctl super\" \"all -lnet -lnd -pinger\" 48 11:55:54:CMD: client-31vm8 e2label /dev/lvm-OSS/P1 2>/dev/null 11:55:54:Started lustre-OST0000 This is a test script issue.

            Yu Jian,
            it looks like this is potentially a race in the test case? We don't run mmp.sh test_5 very often - only in "failover" mode. I was working on http://review.whamcloud.com/3715 to try and get these tests to run in more cases, but I haven't completed it. In theory, we should be able to run the MMP tests all the time, since the VM nodes are all on the same hosts.

            We're reducing this from a blocker, since I think the problem is in the test case and not in the code. Please change it back if you disagree.

            adilger Andreas Dilger added a comment - Yu Jian, it looks like this is potentially a race in the test case? We don't run mmp.sh test_5 very often - only in "failover" mode. I was working on http://review.whamcloud.com/3715 to try and get these tests to run in more cases, but I haven't completed it. In theory, we should be able to run the MMP tests all the time, since the VM nodes are all on the same hosts. We're reducing this from a blocker, since I think the problem is in the test case and not in the code. Please change it back if you disagree.

            People

              adilger Andreas Dilger
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: