[LU-1362] replay-dual test_16 fails to remount mdt Created: 02/May/12 Updated: 05/Oct/12 Resolved: 05/Oct/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Jay Lan (Inactive) | Assignee: | Lai Siyao |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Server: 2.1.1 in centos 6.2, kernel 2.6.32-220.4.1.el6, x86_64, lustre server 2.1.1-0.2nasS 1 mds/mgs (service360) The lustre git repo can be found at |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 6409 |
| Description |
|
Reply-dual test 16 failed: == replay-dual test 16: fail MDS during recovery (3571) == 17:38:52 (1335573532) The "Invalid argument" was about extents, which we do not turn on on MDS. The replay-dual.test_16.dmesg.service360.1335573548.log seemed to suggest the problem
REPLAY_DUAL-16.tgz is attached. The failure is 100% reproducible. Could the data corruption problem caused by trying to fail-over mds to the same node? |
| Comments |
| Comment by Peter Jones [ 02/May/12 ] |
|
Lai Could you please look into this one? Thanks Peter |
| Comment by Jay Lan (Inactive) [ 03/May/12 ] |
|
Attached two replay-dual-*.tar.bz2, one for test_14b and the other test_16. |
| Comment by Jay Lan (Inactive) [ 03/May/12 ] |
|
Attached two replay-dual-*.tar.bz2, one for test_14b and the other test_16. |
| Comment by Jay Lan (Inactive) [ 03/May/12 ] |
|
Sorry I ended up attaching replay-dual-14b.tar.bz2 and replay-dual-16.tar.bz2 multiple times. Please clean the extra copied up. This site formats mds without extents option, but oss with extents option. To get "Invalid argument" errors out of the testing, I added "noextents" to the MDS_MOUNT_OPTIONS and reran the tests. BTW, I always failed on tests 14b, 16, 20, and 21a. I used to think they were caused by the same problem. But it appeared that test 14b and tst 20 have the same failure signature, while test 16 and test 21a share the same. So, I attached test_logs of test 14b and test 16. |
| Comment by Lai Siyao [ 18/May/12 ] |
|
This looks to be the same issue of |
| Comment by Jay Lan (Inactive) [ 18/May/12 ] |
|
No, the test system was not running VM. |
| Comment by Lai Siyao [ 20/May/12 ] |
|
Will it fail if MDS failover node is different node? |
| Comment by Jay Lan (Inactive) [ 22/May/12 ] |
|
I do not have an extra machine to be a MDS failover node. |
| Comment by Lai Siyao [ 29/May/12 ] |
|
Jay, replay_barrier() calls mcreate after syncing target, as looks suspicious, I've a patch http://review.whamcloud.com/#change,2931, could you help verify it? |
| Comment by Jay Lan (Inactive) [ 30/May/12 ] |
|
Hi Siyao, unfortunately I had to report that the patch did not help. |
| Comment by Brian Murrell (Inactive) [ 04/Oct/12 ] |
|
Jay, You didn't happen to be using an iSCSI device as your target did you? Was it a Linux iSCSI target or some vendor? |
| Comment by Jay Lan (Inactive) [ 05/Oct/12 ] |
|
The target was ATA HDS725050KLA360. |
| Comment by Jay Lan (Inactive) [ 05/Oct/12 ] |
|
I set up my test machines and rerun the test. The test passed with 2.1.3 server with both sles11sp1 2.1.3 client and centos6.3 2.1.3 client. The failure I reported was on 2.1.1 centos client. We can close the case. |
| Comment by Peter Jones [ 05/Oct/12 ] |
|
ok thanks Jay! |