[LU-5900] replay-dual test_11: rm: cannot remove `/mnt/lustre/f11.replay-dual-[1-5]': No such file or directory Created: 11/Nov/14 Updated: 01/Dec/14 Resolved: 01/Dec/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0, Lustre 2.5.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Jian Yu | Assignee: | WC Triage |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre Build: https://build.hpdd.intel.com/job/lustre-b2_5/100/ |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 16486 | ||||||||
| Description |
|
replay-dual test 11 failed as follows: rm: cannot remove `/mnt/lustre/f11.replay-dual-[1-5]': No such file or directory replay-dual test_11: @@@@@@ FAIL: test_11 failed with 1 Dmesg on client node: Lustre: 15249:0:(client.c:2752:ptlrpc_replay_interpret()) @@@ Version mismatch during replay req@ffff88006a414400 x1484130260889212/t515396075526(515396075526) o36->lustre-MDT0000-mdc-ffff88006aa09400@10.1.4.66@tcp:12/10 lens 520/416 e 1 to 0 dl 1415377944 ref 2 fl Interpret:R/4/0 rc -75/-75 LustreError: 15249:0:(client.c:2740:ptlrpc_replay_interpret()) request replay timed out, restarting recovery LustreError: 167-0: lustre-MDT0000-mdc-ffff880037fb4400: This client was evicted by lustre-MDT0000; in progress operations using this service will fail. LustreError: 1678:0:(mdc_locks.c:918:mdc_enqueue()) ldlm_cli_enqueue: -5 Lustre: lustre-MDT0000-mdc-ffff880037fb4400: Connection restored to lustre-MDT0000 (at 10.1.4.66@tcp) LustreError: 1678:0:(dir.c:378:ll_get_dir_page()) lock enqueue: [0x200000007:0x1:0x0] at 0: rc -5 LustreError: 1678:0:(dir.c:584:ll_dir_read()) error reading dir [0x200000007:0x1:0x0] at 0: rc -5 Dmesg on MDS node: Lustre: lustre-MDT0000: recovery is timed out, evict stale exports Lustre: lustre-MDT0000: disconnecting 1 stale clients Lustre: 18536:0:(ldlm_lib.c:2092:target_recovery_thread()) too long recovery - read logs LustreError: dumping log to /tmp/lustre-log.1415377850.18536 Maloo report: https://testing.hpdd.intel.com/test_sets/a6c1b3de-68c5-11e4-a63a-5254006e85c2 |
| Comments |
| Comment by Jian Yu [ 11/Nov/14 ] |
|
The same failure also occurred on SLES11SP3/x86_64 client + RHEL6.5/x86_64 server test session: It's a regression failure introduced by Lustre b2_5 build #100. Unfortunately, I found that replay-dual was not in autotest review test groups, so the failure was not detected in patch review testing. |
| Comment by Jian Yu [ 11/Nov/14 ] |
|
Here is a for-test-only patch trying to reproduce the failure on Lustre b2_5 build #100: http://review.whamcloud.com/11611 |
| Comment by Jian Yu [ 11/Nov/14 ] |
|
More instance on Lustre b2_5 build #100: |
| Comment by Jian Yu [ 12/Nov/14 ] |
|
The same regression failure also occurred on master branch: |
| Comment by Jian Yu [ 12/Nov/14 ] |
|
It was the patches http://review.whamcloud.com/11213 (master) and http://review.whamcloud.com/12365 (b2_5) for |
| Comment by Peter Jones [ 01/Dec/14 ] |
|
As per Yu Jian this can be closed as a duplicate of |