[LU-1895] Failover Test failure on test suite mmp, subtest test_5 Created: 11/Sep/12  Updated: 11/Sep/18  Resolved: 15/Aug/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0, Lustre 2.4.0, Lustre 2.1.4, Lustre 2.4.1, Lustre 2.6.0, Lustre 2.7.0, Lustre 2.8.0, Lustre 2.9.0, Lustre 2.10.0, Lustre 2.10.1, Lustre 2.11.0, Lustre 2.12.0, Lustre 2.10.5
Fix Version/s: Lustre 2.12.0, Lustre 2.10.6

Type: Bug Priority: Major
Reporter: Maloo Assignee: Andreas Dilger
Resolution: Fixed Votes: 0
Labels: yuc2

Severity: 3
Rank (Obsolete): 4082

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/da5b8f26-f92d-11e1-a1b8-52540035b04c.

The sub-test test_5 failed with the following error:

test_5 failed with 1

Lustre: DEBUG MARKER: == mmp test 5: mount during unmount of the first filesystem ========================================== 07:54:47 (1347029687)
Lustre: DEBUG MARKER: /usr/sbin/lctl mark Mounting \/dev\/lvm-MDS\/P1 on mds1...
Lustre: DEBUG MARKER: Mounting /dev/lvm-MDS/P1 on mds1...
Lustre: DEBUG MARKER: /usr/sbin/lctl mark Unmounting \/dev\/lvm-MDS\/P1 on mds1...
Lustre: DEBUG MARKER: Unmounting /dev/lvm-MDS/P1 on mds1...
Lustre: DEBUG MARKER: /usr/sbin/lctl mark Mounting \/dev\/lvm-MDS\/P1 on mds1failover...
Lustre: DEBUG MARKER: Mounting /dev/lvm-MDS/P1 on mds1failover...
Lustre: DEBUG MARKER: /usr/sbin/lctl mark Mounting \/dev\/lvm-OSS\/P1 on ost1...
Lustre: DEBUG MARKER: Mounting /dev/lvm-OSS/P1 on ost1...
Lustre: DEBUG MARKER: /usr/sbin/lctl mark Unmounting \/dev\/lvm-OSS\/P1 on ost1...
Lustre: DEBUG MARKER: Unmounting /dev/lvm-OSS/P1 on ost1...
Lustre: DEBUG MARKER: /usr/sbin/lctl mark Mounting \/dev\/lvm-OSS\/P1 on ost1failover...
Lustre: DEBUG MARKER: Mounting /dev/lvm-OSS/P1 on ost1failover...
Lustre: DEBUG MARKER: /usr/sbin/lctl mark  mmp test_5: @@@@@@ FAIL: mount during unmount of the first filesystem should fail


 Comments   
Comment by Jian Yu [ 12/Oct/12 ]

Lustre Tag: v2_3_0_RC2
Lustre Build: http://build.whamcloud.com/job/lustre-b2_3/32
Distro/Arch: RHEL6.3/x86_64(server), RHEL5.8/x86_64(client)
Test Group: failover

The issue occurred again: https://maloo.whamcloud.com/test_sets/37aa5cec-146e-11e2-af8d-52540035b04c

Comment by Jian Yu [ 21/Dec/12 ]

Lustre Tag: v2_1_4_RC1
Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/159/
Distro/Arch: RHEL6.3/x86_64

This issue occurred again: https://maloo.whamcloud.com/test_sets/1ce16246-4bb6-11e2-aa80-52540035b04c

Comment by Sarah Liu [ 19/Mar/13 ]

mmp passed in the latest tag 2.3.62 failover testing:

https://maloo.whamcloud.com/test_sets/9e700da0-8b3d-11e2-965f-52540035b04c

Comment by Jian Yu [ 13/May/13 ]

Lustre Branch: master
Distro/Arch: RHEL6.4/x86_64
Test Group: failover

Lustre Build: http://build.whamcloud.com/job/lustre-master/1486
https://maloo.whamcloud.com/test_sets/9192eca4-bb19-11e2-8824-52540035b04c

Lustre Build: http://build.whamcloud.com/job/lustre-master/1481
https://maloo.whamcloud.com/test_sets/e7c55d7c-b8f9-11e2-891d-52540035b04c

Comment by Jodi Levi (Inactive) [ 13/May/13 ]

Per analysis from Andreas, reducing from blocker.

Comment by Andreas Dilger [ 13/May/13 ]

Yu Jian,
it looks like this is potentially a race in the test case? We don't run mmp.sh test_5 very often - only in "failover" mode. I was working on http://review.whamcloud.com/3715 to try and get these tests to run in more cases, but I haven't completed it. In theory, we should be able to run the MMP tests all the time, since the VM nodes are all on the same hosts.

We're reducing this from a blocker, since I think the problem is in the test case and not in the code. Please change it back if you disagree.

Comment by Jian Yu [ 20/May/13 ]

Lustre Tag: v2_4_0_RC1
Lustre Build: http://build.whamcloud.com/job/lustre-master/1501/
Distro/Arch: RHEL6.4/x86_64
Test Group: failover

The issue occurred again:
https://maloo.whamcloud.com/test_sets/b0bf3f88-c00f-11e2-8398-52540035b04c

Unmounting /dev/lvm-OSS/P1 on ost1 (client-31vm4):

11:55:51:Lustre: DEBUG MARKER: Unmounting /dev/lvm-OSS/P1 on ost1...
11:55:52:Lustre: DEBUG MARKER: grep -c /mnt/ost1' ' /proc/mounts
11:55:52:Lustre: DEBUG MARKER: /usr/sbin/lctl mark Mounting \/dev\/lvm-OSS\/P1 on ost1failover...
11:55:52:Lustre: DEBUG MARKER: Mounting /dev/lvm-OSS/P1 on ost1failover...
11:55:52:Lustre: DEBUG MARKER: umount -d /mnt/ost1
11:55:52:Lustre: Failing over lustre-OST0000
11:55:52:Lustre: server umount lustre-OST0000 complete

Mounting /dev/lvm-OSS/P1 on ost1failover (client-31vm8):

11:55:52:Mounting /dev/lvm-OSS/P1 on ost1failover...
11:55:52:CMD: client-31vm4 grep -c /mnt/ost1' ' /proc/mounts
11:55:52:Stopping /mnt/ost1 (opts on client-31vm4
11:55:52:CMD: client-31vm4 umount -d /mnt/ost1
11:55:52:CMD: client-31vm8 mkdir -p /mnt/ost1failover
11:55:53:CMD: client-31vm8 test -b /dev/lvm-OSS/P1
11:55:53:CMD: client-31vm4 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
11:55:53:Starting ost1failover: /dev/lvm-OSS/P1 /mnt/ost1failover
11:55:53:CMD: client-31vm8 mkdir -p /mnt/ost1failover; mount -t lustre /dev/lvm-OSS/P1 /mnt/ost1failover
11:55:53:CMD: client-31vm8 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin::/sbin:/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh set_default_debug \"vfstrace rpctrace dlmtrace neterror ha config ioctl super\" \"all -lnet -lnd -pinger\" 48
11:55:54:CMD: client-31vm8 e2label /dev/lvm-OSS/P1 2>/dev/null
11:55:54:Started lustre-OST0000

This is a test script issue.

Comment by Jian Yu [ 09/Sep/13 ]

Lustre Tag: v2_4_1_RC2
Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/45/
Distro/Arch: RHEL6.4/x86_64
Testgroup: failover

mmp test 5 failed again:
https://maloo.whamcloud.com/test_sets/21323846-194e-11e3-bb73-52540035b04c

Comment by Sarah Liu [ 30/Jan/14 ]

also seen in tag-2.3.54 testing:

server and client: lustre-master build # 1837 RHEL6 ldiskfs

https://maloo.whamcloud.com/test_sets/a6b7e1f4-8411-11e3-a854-52540035b04c

Comment by Sarah Liu [ 17/Mar/14 ]

another instance in 2.6
https://maloo.whamcloud.com/test_sets/e8249cd2-a418-11e3-8c6a-52540035b04c

Comment by Sarah Liu [ 03/Mar/15 ]

hit this in 2.7.0-rc2 testing

https://testing.hpdd.intel.com/test_sets/0ee08f32-c157-11e4-9d86-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 20/Jan/16 ]

Another instance found for hardfailover: EL6.7 Server/Client
https://testing.hpdd.intel.com/test_sets/47839e28-bc93-11e5-8f65-5254006e85c2

Another instance found for hardfailover: EL6.7 Server/SLES11 SP3 Clients
https://testing.hpdd.intel.com/test_sets/96492436-ba4c-11e5-9a07-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 20/Jan/16 ]

Another instance found for hardfailover: EL7 Server/Client
https://testing.hpdd.intel.com/test_sets/3506468a-bc00-11e5-a592-5254006e85c2

Another instance found for hardfailover: EL7 Server/SLES11 SP3 Client
build# 3303
https://testing.hpdd.intel.com/test_sets/900a5b64-bb2b-11e5-b3d5-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 09/Feb/16 ]

Another instance found for hardfailover : EL6.7 Server/Client, tag 2.7.66, master build 3314
https://testing.hpdd.intel.com/test_sessions/7c5e8006-cb2d-11e5-b3e8-5254006e85c2

Another instance found for hardfailover : EL7 Server/Client, tag 2.7.66, master build 3314
https://testing.hpdd.intel.com/test_sessions/8d13249a-ca8f-11e5-9609-5254006e85c2

Another instance found for hardfailover : EL7 Server/SLES11 SP3 Client, tag 2.7.66, master build 3316
https://testing.hpdd.intel.com/test_sessions/2fbf67e4-cd4c-11e5-b1fa-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 24/Feb/16 ]

Another instance found on b2_8 for failover testing , build# 6.
https://testing.hpdd.intel.com/test_sessions/0aed3028-da39-11e5-a8a6-5254006e85c2
https://testing.hpdd.intel.com/test_sessions/eaf85780-d65e-11e5-afe8-5254006e85c2
https://testing.hpdd.intel.com/test_sessions/eb9f29ec-d8da-11e5-83e2-5254006e85c2
https://testing.hpdd.intel.com/test_sessions/2f0aa9f6-d5a5-11e5-9cc2-5254006e85c2
https://testing.hpdd.intel.com/test_sessions/c5a8e44c-d9c7-11e5-85dd-5254006e85c2

Comment by James Casper [ 24/May/17 ]

2.9.57, b3575:
https://testing.hpdd.intel.com/test_sessions/1f3eb5f9-bb00-4ea9-be10-4318eabbff04

Comment by James Casper [ 26/Sep/17 ]

2.10.1 b26:
https://testing.hpdd.intel.com/test_sessions/2ecbd0a0-6511-479f-b014-0222fd7009b1

Comment by Gerrit Updater [ 11/May/18 ]

Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/32355
Subject: LU-1895 tests: don't fail mmp test_5 due to race
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a0d9f2db4987fcf44176f60a6d060d4bba788b13

Comment by Gerrit Updater [ 15/Aug/18 ]

James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33007
Subject: LU-1895 tests: don't fail mmp test_5 due to race
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 63b5d72a43765d8f54fba884f752d658e9a0daea

Comment by Gerrit Updater [ 15/Aug/18 ]

Andreas Dilger (adilger@whamcloud.com) merged in patch https://review.whamcloud.com/32355/
Subject: LU-1895 tests: don't fail mmp test_5 due to race
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 7e7a4c9402d464d6b3e67a29a1b5c7e02acc4836

Comment by Peter Jones [ 15/Aug/18 ]

Landed for 2.12

Comment by Gerrit Updater [ 11/Sep/18 ]

John L. Hammond (jhammond@whamcloud.com) merged in patch https://review.whamcloud.com/33007/
Subject: LU-1895 tests: don't fail mmp test_5 due to race
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 9632e97b907fd55e0ec1f652fa936ecd24720e25

Generated at Sat Feb 10 01:20:37 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.