[LU-9487] mmp test_2: test_2 failed with 22 Created: 10/May/17 Updated: 03/Jun/17 Resolved: 03/Jun/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.0 |
| Fix Version/s: | Lustre 2.10.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | James Casper | Assignee: | Alex Zhuravlev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
trevis-45, failover |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
https://testing.hpdd.intel.com/test_sessions/adcc6618-5344-4cda-b96f-04bee182c49b New for tag 2.9.57 - not seen in past 6 months (Nov '16 - May '17). The same failure message is seen with 7 subsequent subtests. From test_log: CMD: trevis-45vm3 mkdir -p /mnt/lustre-mds1failover; mount -t lustre /dev/lvm-Role_MDS/P1 /mnt/lustre-mds1failover
trevis-45vm3 (MDS1-1): mount.lustre: according to /etc/mtab /dev/mapper/lvm--Role_MDS-P1 is already mounted on /mnt/lustre-mds1failover
Start of /dev/lvm-Role_MDS/P1 on mds1failover failed 17
trevis-45vm7 (MDS1-2): mount.lustre: mount /dev/mapper/lvm--Role_MDS-P1 at /mnt/lustre-mds1 failed: Invalid argument
trevis-45vm7: This may have multiple causes.
trevis-45vm7: Are the mount options correct?
trevis-45vm7: Check the syslog for more info.
Start of /dev/lvm-Role_MDS/P1 on mds1 failed 22
mmp test_2: @@@@@@ FAIL: mount failure on failover pair mds1,mds1failover
Trace dump:
= /usr/lib64/lustre/tests/test-framework.sh:4927:error_noexit()
= /usr/lib64/lustre/tests/mmp.sh:284:mount_after_interval_sub()
= /usr/lib64/lustre/tests/mmp.sh:296:mount_after_interval()
= /usr/lib64/lustre/tests/mmp.sh:446:test_2()
= /usr/lib64/lustre/tests/test-framework.sh:5215:run_one()
= /usr/lib64/lustre/tests/test-framework.sh:5254:run_one_logged()
= /usr/lib64/lustre/tests/test-framework.sh:5101:run_test()
= /usr/lib64/lustre/tests/mmp.sh:449:main()
Dumping lctl log to /test_logs/2017-05-08/lustre-master-el7-x86_64--failover--1_17_1__3575__-70321399293120-051547/mmp.test_2.*.1494260419.log
CMD: trevis-45vm1.trevis.hpdd.intel.com,trevis-45vm5,trevis-45vm6,trevis-45vm7,trevis-45vm8 /usr/sbin/lctl dk > /test_logs/2017-05-08/lustre-master-el7-x86_64--failover--1_17_1__3575__-70321399293120-051547/mmp.test_2.debug_log.\$(hostname -s).1494260419.log;
dmesg > /test_logs/2017-05-08/lustre-master-el7-x86_64--failover--1_17_1__3575__-70321399293120-051547/mmp.test_2.dmesg.\$(hostname -s).1494260419.log
Resetting fail_loc on all nodes...CMD: trevis-45vm1.trevis.hpdd.intel.com,trevis-45vm5,trevis-45vm6,trevis-45vm7,trevis-45vm8 lctl set_param -n fail_loc=0 fail_val=0 2>/dev/null
done.
mmp test_2: @@@@@@ FAIL: test_2 failed with 22
Trace dump:
= /usr/lib64/lustre/tests/test-framework.sh:4939:error()
= /usr/lib64/lustre/tests/test-framework.sh:5215:run_one()
= /usr/lib64/lustre/tests/test-framework.sh:5254:run_one_logged()
= /usr/lib64/lustre/tests/test-framework.sh:5101:run_test()
= /usr/lib64/lustre/tests/mmp.sh:449:main()
|
| Comments |
| Comment by James Casper [ 10/May/17 ] |
|
The test set that follows mmp (replay-single) also fails with the same failure message. replay-single is the last test set in the failover group. |
| Comment by Andreas Dilger [ 11/May/17 ] |
|
It looks like the MDS didn't clean up properly at the end of the previous test? This needs to be investigated more closely. |
| Comment by James Casper [ 11/May/17 ] |
|
A clarification on this statement: "New for tag 2.9.57 - not seen in past 6 months (Nov '16 - May '17)." This has not been seen in tag testing since November 10, 2016 (master branch, tags 51-56). There are 55 occurrences outside of tag testing (all in failover group testing). |
| Comment by Peter Jones [ 16/May/17 ] |
|
Alex is looking into this one |
| Comment by Alex Zhuravlev [ 17/May/17 ] |
|
the log looks a bit strange.. |
| Comment by Alex Zhuravlev [ 18/May/17 ] |
|
== mmp test 1: two mounts at the same time =========================================================== 16:19:03 (1494260343) for whatever reason MDS wasn't stopped by test 1, though I see in the scripts this: going to add a debugging |
| Comment by Gerrit Updater [ 18/May/17 ] |
|
Alex Zhuravlev (alexey.zhuravlev@intel.com) uploaded a new patch: https://review.whamcloud.com/27192 |
| Comment by Alex Zhuravlev [ 23/May/17 ] |
|
I think the root cause is that test #2 expects the primary MDS to be mounted: the test was developed by Jian Yu, it'd helpful to get any hints on this. |
| Comment by Gerrit Updater [ 23/May/17 ] |
|
Jian Yu (jian.yu@intel.com) uploaded a new patch: https://review.whamcloud.com/27253 |
| Comment by Jian Yu [ 23/May/17 ] |
|
Yes, Alex. In mmp.sh test_1(), it's likely the failover service is started before the primary one, I just pushed a patch to check and stop the failover service. |
| Comment by Gerrit Updater [ 03/Jun/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27253/ |
| Comment by Peter Jones [ 03/Jun/17 ] |
|
Landed for 2.10 |