[LU-3750] Failure on test suite replay-vbr test_1b: client not evicted Created: 13/Aug/13 Updated: 02/Oct/13 Resolved: 02/Oct/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0 |
| Fix Version/s: | Lustre 2.5.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Maloo | Assignee: | Mikhail Pershin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
server and client: tag 2.4.90 RHEL6 |
||
| Severity: | 3 |
| Rank (Obsolete): | 9673 |
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/4d4d45be-02ca-11e3-b384-52540035b04c. The sub-test test_1b failed with the following error:
client dmesg: Lustre: DEBUG MARKER: == replay-vbr test 1b: open (O_CREAT) checks version of parent == 23:02:02 (1376200922) Lustre: DEBUG MARKER: mkdir -p -m 755 /mnt/lustre/d0.replay-vbr/d1 Lustre: DEBUG MARKER: mcreate /mnt/lustre/fsa-$(hostname); rm /mnt/lustre/fsa-$(hostname) Lustre: DEBUG MARKER: if [ -d /mnt/lustre2 ]; then mcreate /mnt/lustre2/fsa-$(hostname); rm /mnt/lustre2/fsa-$(hostname); fi Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0000 Lustre: DEBUG MARKER: openfile -f O_RDWR:O_CREAT /mnt/lustre/d0.replay-vbr/d1/f.replay-vbr.1b LustreError: 11-0: lustre-MDT0000-mdc-ffff8800615be000: Communicating with 10.10.4.208@tcp, operation obd_ping failed with -107. Lustre: lustre-MDT0000-mdc-ffff8800615be000: Connection to lustre-MDT0000 (at 10.10.4.208@tcp) was lost; in progress operations using this service will wait for recovery to complete LustreError: 11-0: lustre-MDT0000-mdc-ffff88007d22a400: Communicating with 10.10.4.208@tcp, operation obd_ping failed with -107. Lustre: lustre-MDT0000-mdc-ffff88007d22a400: Connection to lustre-MDT0000 (at 10.10.4.208@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: 30938:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1376200926/real 1376200926] req@ffff88007ac65000 x1443051231773588/t0(0) o400->MGC10.10.4.208@tcp@10.10.4.208@tcp:26/25 lens 224/224 e 0 to 1 dl 1376200933 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 LustreError: 166-1: MGC10.10.4.208@tcp: Connection to MGS (at 10.10.4.208@tcp) was lost; in progress operations using this service will fail Lustre: 30937:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1376200933/real 1376200933] req@ffff88007ac65000 x1443051231773696/t0(0) o250->MGC10.10.4.208@tcp@10.10.4.208@tcp:26/25 lens 400/544 e 0 to 1 dl 1376200939 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Lustre: Evicted from MGS (at 10.10.4.208@tcp) after server handle changed from 0x2a56bfb73f95e89c to 0x2a56bfb73f95eff7 Lustre: MGC10.10.4.208@tcp: Connection restored to MGS (at 10.10.4.208@tcp) Lustre: 30937:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1376200938/real 1376200938] req@ffff88003dd53000 x1443051231773700/t0(0) o38->lustre-MDT0000-mdc-ffff8800615be000@10.10.4.208@tcp:12/10 lens 400/544 e 0 to 1 dl 1376200949 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 LustreError: 30937:0:(client.c:2695:ptlrpc_replay_interpret()) @@@ status 301, old was 0 req@ffff880023a91000 x1443051231773580/t566935683100(566935683100) o101->lustre-MDT0000-mdc-ffff8800615be000@10.10.4.208@tcp:12/10 lens 584/544 e 0 to 0 dl 1376201028 ref 2 fl Interpret:R/4/0 rc 301/301 |
| Comments |
| Comment by Sarah Liu [ 13/Aug/13 ] |
|
SLES11 SP2 client also hit this issue: |
| Comment by Jodi Levi (Inactive) [ 14/Aug/13 ] |
|
Mike, |
| Comment by Andreas Dilger [ 24/Sep/13 ] |
|
Mike, any chance to look at this? |
| Comment by Andreas Dilger [ 24/Sep/13 ] |
|
It looks like replay-vbr is totally failing: |
| Comment by Mikhail Pershin [ 27/Sep/13 ] |
|
http://review.whamcloud.com/7787 - patch restores VBR functionality. |
| Comment by Peter Jones [ 02/Oct/13 ] |
|
Landed for 2.5.0 |