[LU-994] 1.8<->2.1.54 Test failure on test suite replay-single 62 Created: 15/Jan/12 Updated: 19/Mar/12 Resolved: 19/Mar/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.2.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | Zhenyu Xu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 6488 | ||||||||
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/c460154a-3e73-11e1-b417-5254004bbbd3. |
| Comments |
| Comment by Peter Jones [ 15/Jan/12 ] |
|
Bobi Could you look into this one please? Thanks Peter |
| Comment by Zhenyu Xu [ 19/Jan/12 ] |
|
Just for the record. MDS recoery start at 1326480370.284875, setting recovery windows in 60 seconds. Client replay req x1390912056905666/t304942678018 (LDLM_ENQUEUE) at 1326480370.283516, and its timeout is 62 seconds At 1326480430.284992 MDS closed its recovery window, while the replay req timedout at 1326480432.535461, so the client was evicted and failed the test. The failure due to that the replay req's timeout time is over the MDS recovery window. |
| Comment by Zhenyu Xu [ 30/Jan/12 ] |
|
I cannot reproduce it on my vm machines. Sarah, would you mind checking what timeout values used on server and client, do they accord? Does AT (adaptive timeout) opened on client side? |
| Comment by Sarah Liu [ 16/Feb/12 ] |
|
I use default value for both obd_timeout and ldlm_timeout. So I guess they accord? Yes, AT is enabled on client side. TIMEOUT in my config file is also set to 20. |
| Comment by Zhenyu Xu [ 16/Feb/12 ] |
|
dup of |
| Comment by Peter Jones [ 17/Mar/12 ] |
|
Are you sure that this is a duplicate of LU1036? It seems to relate to a different test # (62 rather than 52) and seems to still be occurring in RC1... |
| Comment by Zhenyu Xu [ 19/Mar/12 ] |
|
it looks like a recovery timeout extension issue, and Does the rehit in RC1 contains patch of |
| Comment by Peter Jones [ 19/Mar/12 ] |
| Comment by Peter Jones [ 19/Mar/12 ] |
|
As per Oleg ok to close - this appears because the version of 1.8.x being tested with does not contain this fix |