[LU-5759] replay-dual test_21b: Restart of mds0 failed Created: 16/Oct/14 Updated: 02/Jul/15 Resolved: 08/Jan/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0 |
| Fix Version/s: | Lustre 2.7.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | Mikhail Pershin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
client and server: lustre-master build #2690 |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 16158 | ||||||||
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/aff6c30c-5427-11e4-abcf-5254006e85c2. The sub-test test_21b failed with the following error: Restart of mds0 failed! 08:51:44:Lustre: DEBUG MARKER: == replay-dual test 21b: commit on sharing, two clients == 08:51:21 (1413301881) 08:51:44:LustreError: 21059:0:(qsd_reint.c:54:qsd_reint_completion()) lustre-MDT0001: failed to enqueue global quota lock, glb fid:[0x200000006:0x10000:0x0], rc:-5 08:51:44:LustreError: 21059:0:(qsd_reint.c:54:qsd_reint_completion()) Skipped 3 previous similar messages 08:51:44:Lustre: DEBUG MARKER: /usr/sbin/lctl mark replay-dual test_21b: @@@@@@ FAIL: Restart of mds0 failed! 08:51:44:Lustre: DEBUG MARKER: replay-dual test_21b: @@@@@@ FAIL: Restart of mds0 failed! Info required for matching: replay-dual 21b |
| Comments |
| Comment by Jodi Levi (Inactive) [ 17/Oct/14 ] |
|
Mike, |
| Comment by Andreas Dilger [ 17/Oct/14 ] |
|
It looks like this started failing 100% on 2014-10-06, but wasn't noticed because replay-dual doesn't run as part of the per-patch review tests. According to the maloo test results the last passing test was commit 0b4b33592c09 " |
| Comment by Mikhail Pershin [ 21/Oct/14 ] |
|
the reason of failure is the wrong MDS index in test: Starting mds0: /mnt/mds0
CMD: onyx-45vm3 mkdir -p /mnt/mds0; mount -t lustre /mnt/mds0
onyx-45vm3: Usage: mount -V : print version
onyx-45vm3: mount -h : print this help
onyx-45vm3: mount : list mounted filesystems
onyx-45vm3: mount -l : idem, including volume labels
onyx-45vm3: So far the informational part. Next the mounting.
Further investigation shows that get_mds_dir() function in test-framework.sh was corrupted by commit 745c19c70319. Unfortunately regular testing didin't show that regression. It affects replay-dual.sh test_21b and several tests in test_27 group sanity.sh |
| Comment by Andreas Dilger [ 22/Oct/14 ] |
|
See also http://review.whamcloud.com/12149 to clean up this code a bit more. |
| Comment by Mikhail Pershin [ 27/Oct/14 ] |
|
http://review.whamcloud.com/12363 - patch was derived from Andreas fix but eliminates get_mds_num() entirely and use "lfs getstripe -M" in tests instead |
| Comment by Jian Yu [ 11/Nov/14 ] |
|
We need add replay-dual back into patch review test group. However, this failure is preventing replay-dual from passing on master branch. So I raise the priority of this ticket as a blocker. |
| Comment by Gerrit Updater [ 08/Jan/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12363/ |
| Comment by Jodi Levi (Inactive) [ 08/Jan/15 ] |
|
Patch landed to Master. |