[LU-3749] Failure on test suite replay-dual test_8: test_8 failed with 2 Created: 13/Aug/13 Updated: 02/Oct/13 Resolved: 02/Oct/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0 |
| Fix Version/s: | Lustre 2.5.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Maloo | Assignee: | Mikhail Pershin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
server and client: tag-2.4.90 RHEL6 |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9672 | ||||||||
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/fd1709ea-02c9-11e3-b384-52540035b04c. The sub-test test_8 failed with the following error:
MDS console: 22:33:28:Lustre: DEBUG MARKER: == replay-dual test 8: replay of resent request == 22:33:24 (1376199204) 22:33:28:Lustre: DEBUG MARKER: sync; sync; sync 22:33:28:Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 notransno 22:33:28:Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 readonly 22:33:28:LustreError: 14713:0:(osd_handler.c:1194:osd_ro()) *** setting lustre-MDT0000 read-only *** 22:33:28:LustreError: 14713:0:(osd_handler.c:1194:osd_ro()) Skipped 3 previous similar messages 22:33:28:Turning device dm-0 (0xfd00000) read-only 22:33:28:Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000 22:33:28:Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000 22:33:28:Lustre: DEBUG MARKER: lctl set_param fail_loc=0x119 22:33:29:Lustre: *** cfs_fail_loc=119, val=2147483648*** 22:33:29:LustreError: 14098:0:(ldlm_lib.c:2409:target_send_reply_msg()) @@@ dropping reply req@ffff88005c311c00 x1443048921768268/t498216206346(0) o36->c0662422-4c9b-ce8c-77e9-eb8caf0a37b9@10.10.4.206@tcp:0/0 lens 496/448 e 0 to 0 dl 1376199231 ref 1 fl Interpret:/0/0 rc 0/0 |
| Comments |
| Comment by Sarah Liu [ 13/Aug/13 ] |
|
SLES11 SP2 client also hit this issue: |
| Comment by Keith Mannthey (Inactive) [ 13/Aug/13 ] |
|
The client says this: 22:34:35:LustreError: 166-1: MGC10.10.4.208@tcp: Connection to MGS (at 10.10.4.208@tcp) was lost; in progress operations using this service will fail 22:34:35:Lustre: Evicted from MGS (at 10.10.4.208@tcp) after server handle changed from 0x2a56bfb73f955d8a to 0x2a56bfb73f95659b 22:35:07:Lustre: 19509:0:(client.c:2652:ptlrpc_replay_interpret()) @@@ Version mismatch during replay 22:35:07: req@ffff88007c91a800 x1443048921768268/t498216206346(498216206346) o36->lustre-MDT0000-mdc-ffff880062d92800@10.10.4.208@tcp:12/10 lens 496/416 e 0 to 0 dl 1376199343 ref 2 fl Interpret:R/4/0 rc -75/-75 22:35:07:Lustre: lustre-MDT0000-mdc-ffff880065203400: Connection restored to lustre-MDT0000 (at 10.10.4.208@tcp) 22:35:07:Lustre: Skipped 6 previous similar messages 22:35:48:Lustre: 19509:0:(import.c:1209:completed_replay_interpret()) lustre-MDT0000-mdc-ffff880062d92800: version recovery fails, reconnecting I wonder why we get a "Version mismatch during replay" Client and server appear to be the same version. |
| Comment by Oleg Drokin [ 14/Aug/13 ] |
|
Keith, it's a file data version mismatch. |
| Comment by Jodi Levi (Inactive) [ 14/Aug/13 ] |
|
Mike, |
| Comment by Andreas Dilger [ 24/Sep/13 ] |
|
Mike, any chance to look at this? |
| Comment by Mikhail Pershin [ 26/Sep/13 ] |
|
yes, I am looking at it |
| Comment by Mikhail Pershin [ 27/Sep/13 ] |
|
http://review.whamcloud.com/7786 - patch to fix that problem |
| Comment by Peter Jones [ 02/Oct/13 ] |
|
Landed for 2.5.0 |