Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1895

Failover Test failure on test suite mmp, subtest test_5

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.12.0, Lustre 2.10.6
    • Lustre 2.3.0, (12)
      Lustre 2.4.0, Lustre 2.1.4, Lustre 2.4.1, Lustre 2.6.0, Lustre 2.7.0, Lustre 2.8.0, Lustre 2.9.0, Lustre 2.10.0, Lustre 2.10.1, Lustre 2.11.0, Lustre 2.12.0, Lustre 2.10.5
    • 3
    • 4082

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/da5b8f26-f92d-11e1-a1b8-52540035b04c.

      The sub-test test_5 failed with the following error:

      test_5 failed with 1

      Lustre: DEBUG MARKER: == mmp test 5: mount during unmount of the first filesystem ========================================== 07:54:47 (1347029687)
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark Mounting \/dev\/lvm-MDS\/P1 on mds1...
      Lustre: DEBUG MARKER: Mounting /dev/lvm-MDS/P1 on mds1...
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark Unmounting \/dev\/lvm-MDS\/P1 on mds1...
      Lustre: DEBUG MARKER: Unmounting /dev/lvm-MDS/P1 on mds1...
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark Mounting \/dev\/lvm-MDS\/P1 on mds1failover...
      Lustre: DEBUG MARKER: Mounting /dev/lvm-MDS/P1 on mds1failover...
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark Mounting \/dev\/lvm-OSS\/P1 on ost1...
      Lustre: DEBUG MARKER: Mounting /dev/lvm-OSS/P1 on ost1...
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark Unmounting \/dev\/lvm-OSS\/P1 on ost1...
      Lustre: DEBUG MARKER: Unmounting /dev/lvm-OSS/P1 on ost1...
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark Mounting \/dev\/lvm-OSS\/P1 on ost1failover...
      Lustre: DEBUG MARKER: Mounting /dev/lvm-OSS/P1 on ost1failover...
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark  mmp test_5: @@@@@@ FAIL: mount during unmount of the first filesystem should fail
      

      Attachments

        Issue Links

          Activity

            [LU-1895] Failover Test failure on test suite mmp, subtest test_5
            standan Saurabh Tandan (Inactive) added a comment - - edited

            Another instance found for hardfailover: EL6.7 Server/Client
            https://testing.hpdd.intel.com/test_sets/47839e28-bc93-11e5-8f65-5254006e85c2

            Another instance found for hardfailover: EL6.7 Server/SLES11 SP3 Clients
            https://testing.hpdd.intel.com/test_sets/96492436-ba4c-11e5-9a07-5254006e85c2

            standan Saurabh Tandan (Inactive) added a comment - - edited Another instance found for hardfailover: EL6.7 Server/Client https://testing.hpdd.intel.com/test_sets/47839e28-bc93-11e5-8f65-5254006e85c2 Another instance found for hardfailover: EL6.7 Server/SLES11 SP3 Clients https://testing.hpdd.intel.com/test_sets/96492436-ba4c-11e5-9a07-5254006e85c2
            sarah Sarah Liu added a comment - hit this in 2.7.0-rc2 testing https://testing.hpdd.intel.com/test_sets/0ee08f32-c157-11e4-9d86-5254006e85c2
            sarah Sarah Liu added a comment - another instance in 2.6 https://maloo.whamcloud.com/test_sets/e8249cd2-a418-11e3-8c6a-52540035b04c
            sarah Sarah Liu added a comment -

            also seen in tag-2.3.54 testing:

            server and client: lustre-master build # 1837 RHEL6 ldiskfs

            https://maloo.whamcloud.com/test_sets/a6b7e1f4-8411-11e3-a854-52540035b04c

            sarah Sarah Liu added a comment - also seen in tag-2.3.54 testing: server and client: lustre-master build # 1837 RHEL6 ldiskfs https://maloo.whamcloud.com/test_sets/a6b7e1f4-8411-11e3-a854-52540035b04c
            yujian Jian Yu added a comment -

            Lustre Tag: v2_4_1_RC2
            Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/45/
            Distro/Arch: RHEL6.4/x86_64
            Testgroup: failover

            mmp test 5 failed again:
            https://maloo.whamcloud.com/test_sets/21323846-194e-11e3-bb73-52540035b04c

            yujian Jian Yu added a comment - Lustre Tag: v2_4_1_RC2 Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/45/ Distro/Arch: RHEL6.4/x86_64 Testgroup: failover mmp test 5 failed again: https://maloo.whamcloud.com/test_sets/21323846-194e-11e3-bb73-52540035b04c
            yujian Jian Yu added a comment -

            Lustre Tag: v2_4_0_RC1
            Lustre Build: http://build.whamcloud.com/job/lustre-master/1501/
            Distro/Arch: RHEL6.4/x86_64
            Test Group: failover

            The issue occurred again:
            https://maloo.whamcloud.com/test_sets/b0bf3f88-c00f-11e2-8398-52540035b04c

            Unmounting /dev/lvm-OSS/P1 on ost1 (client-31vm4):

            11:55:51:Lustre: DEBUG MARKER: Unmounting /dev/lvm-OSS/P1 on ost1...
            11:55:52:Lustre: DEBUG MARKER: grep -c /mnt/ost1' ' /proc/mounts
            11:55:52:Lustre: DEBUG MARKER: /usr/sbin/lctl mark Mounting \/dev\/lvm-OSS\/P1 on ost1failover...
            11:55:52:Lustre: DEBUG MARKER: Mounting /dev/lvm-OSS/P1 on ost1failover...
            11:55:52:Lustre: DEBUG MARKER: umount -d /mnt/ost1
            11:55:52:Lustre: Failing over lustre-OST0000
            11:55:52:Lustre: server umount lustre-OST0000 complete

            Mounting /dev/lvm-OSS/P1 on ost1failover (client-31vm8):

            11:55:52:Mounting /dev/lvm-OSS/P1 on ost1failover...
            11:55:52:CMD: client-31vm4 grep -c /mnt/ost1' ' /proc/mounts
            11:55:52:Stopping /mnt/ost1 (opts on client-31vm4
            11:55:52:CMD: client-31vm4 umount -d /mnt/ost1
            11:55:52:CMD: client-31vm8 mkdir -p /mnt/ost1failover
            11:55:53:CMD: client-31vm8 test -b /dev/lvm-OSS/P1
            11:55:53:CMD: client-31vm4 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
            11:55:53:Starting ost1failover: /dev/lvm-OSS/P1 /mnt/ost1failover
            11:55:53:CMD: client-31vm8 mkdir -p /mnt/ost1failover; mount -t lustre /dev/lvm-OSS/P1 /mnt/ost1failover
            11:55:53:CMD: client-31vm8 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin::/sbin:/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh set_default_debug \"vfstrace rpctrace dlmtrace neterror ha config ioctl super\" \"all -lnet -lnd -pinger\" 48
            11:55:54:CMD: client-31vm8 e2label /dev/lvm-OSS/P1 2>/dev/null
            11:55:54:Started lustre-OST0000

            This is a test script issue.

            yujian Jian Yu added a comment - Lustre Tag: v2_4_0_RC1 Lustre Build: http://build.whamcloud.com/job/lustre-master/1501/ Distro/Arch: RHEL6.4/x86_64 Test Group: failover The issue occurred again: https://maloo.whamcloud.com/test_sets/b0bf3f88-c00f-11e2-8398-52540035b04c Unmounting /dev/lvm-OSS/P1 on ost1 (client-31vm4): 11:55:51:Lustre: DEBUG MARKER: Unmounting /dev/lvm-OSS/P1 on ost1... 11:55:52:Lustre: DEBUG MARKER: grep -c /mnt/ost1' ' /proc/mounts 11:55:52:Lustre: DEBUG MARKER: /usr/sbin/lctl mark Mounting \/dev\/lvm-OSS\/P1 on ost1failover... 11:55:52:Lustre: DEBUG MARKER: Mounting /dev/lvm-OSS/P1 on ost1failover... 11:55:52:Lustre: DEBUG MARKER: umount -d /mnt/ost1 11:55:52:Lustre: Failing over lustre-OST0000 11:55:52 :Lustre: server umount lustre-OST0000 complete Mounting /dev/lvm-OSS/P1 on ost1failover (client-31vm8): 11:55:52:Mounting /dev/lvm-OSS/P1 on ost1failover... 11:55:52:CMD: client-31vm4 grep -c /mnt/ost1' ' /proc/mounts 11:55:52:Stopping /mnt/ost1 (opts on client-31vm4 11:55:52:CMD: client-31vm4 umount -d /mnt/ost1 11:55:52:CMD: client-31vm8 mkdir -p /mnt/ost1failover 11:55:53:CMD: client-31vm8 test -b /dev/lvm-OSS/P1 11:55:53:CMD: client-31vm4 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' 11:55:53:Starting ost1failover: /dev/lvm-OSS/P1 /mnt/ost1failover 11:55:53 :CMD: client-31vm8 mkdir -p /mnt/ost1failover; mount -t lustre /dev/lvm-OSS/P1 /mnt/ost1failover 11:55:53:CMD: client-31vm8 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin::/sbin:/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh set_default_debug \"vfstrace rpctrace dlmtrace neterror ha config ioctl super\" \"all -lnet -lnd -pinger\" 48 11:55:54:CMD: client-31vm8 e2label /dev/lvm-OSS/P1 2>/dev/null 11:55:54:Started lustre-OST0000 This is a test script issue.

            Yu Jian,
            it looks like this is potentially a race in the test case? We don't run mmp.sh test_5 very often - only in "failover" mode. I was working on http://review.whamcloud.com/3715 to try and get these tests to run in more cases, but I haven't completed it. In theory, we should be able to run the MMP tests all the time, since the VM nodes are all on the same hosts.

            We're reducing this from a blocker, since I think the problem is in the test case and not in the code. Please change it back if you disagree.

            adilger Andreas Dilger added a comment - Yu Jian, it looks like this is potentially a race in the test case? We don't run mmp.sh test_5 very often - only in "failover" mode. I was working on http://review.whamcloud.com/3715 to try and get these tests to run in more cases, but I haven't completed it. In theory, we should be able to run the MMP tests all the time, since the VM nodes are all on the same hosts. We're reducing this from a blocker, since I think the problem is in the test case and not in the code. Please change it back if you disagree.

            Per analysis from Andreas, reducing from blocker.

            jlevi Jodi Levi (Inactive) added a comment - Per analysis from Andreas, reducing from blocker.
            yujian Jian Yu added a comment - Lustre Branch: master Distro/Arch: RHEL6.4/x86_64 Test Group: failover Lustre Build: http://build.whamcloud.com/job/lustre-master/1486 https://maloo.whamcloud.com/test_sets/9192eca4-bb19-11e2-8824-52540035b04c Lustre Build: http://build.whamcloud.com/job/lustre-master/1481 https://maloo.whamcloud.com/test_sets/e7c55d7c-b8f9-11e2-891d-52540035b04c
            sarah Sarah Liu added a comment -

            mmp passed in the latest tag 2.3.62 failover testing:

            https://maloo.whamcloud.com/test_sets/9e700da0-8b3d-11e2-965f-52540035b04c

            sarah Sarah Liu added a comment - mmp passed in the latest tag 2.3.62 failover testing: https://maloo.whamcloud.com/test_sets/9e700da0-8b3d-11e2-965f-52540035b04c
            yujian Jian Yu added a comment -

            Lustre Tag: v2_1_4_RC1
            Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/159/
            Distro/Arch: RHEL6.3/x86_64

            This issue occurred again: https://maloo.whamcloud.com/test_sets/1ce16246-4bb6-11e2-aa80-52540035b04c

            yujian Jian Yu added a comment - Lustre Tag: v2_1_4_RC1 Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/159/ Distro/Arch: RHEL6.3/x86_64 This issue occurred again: https://maloo.whamcloud.com/test_sets/1ce16246-4bb6-11e2-aa80-52540035b04c

            People

              adilger Andreas Dilger
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: