[LU-3750] Failure on test suite replay-vbr test_1b: client not evicted Created: 13/Aug/13  Updated: 02/Oct/13  Resolved: 02/Oct/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: Lustre 2.5.0

Type: Bug Priority: Critical
Reporter: Maloo Assignee: Mikhail Pershin
Resolution: Fixed Votes: 0
Labels: None
Environment:

server and client: tag 2.4.90 RHEL6


Severity: 3
Rank (Obsolete): 9673

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/4d4d45be-02ca-11e3-b384-52540035b04c.

The sub-test test_1b failed with the following error:

client-12vm1.lab.whamcloud.com not evicted

client dmesg:

Lustre: DEBUG MARKER: == replay-vbr test 1b: open (O_CREAT) checks version of parent == 23:02:02 (1376200922)
Lustre: DEBUG MARKER: mkdir -p -m 755 /mnt/lustre/d0.replay-vbr/d1
Lustre: DEBUG MARKER: mcreate /mnt/lustre/fsa-$(hostname); rm /mnt/lustre/fsa-$(hostname)
Lustre: DEBUG MARKER: if [ -d /mnt/lustre2 ]; then mcreate /mnt/lustre2/fsa-$(hostname); rm /mnt/lustre2/fsa-$(hostname); fi
Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0000
Lustre: DEBUG MARKER: openfile -f O_RDWR:O_CREAT /mnt/lustre/d0.replay-vbr/d1/f.replay-vbr.1b
LustreError: 11-0: lustre-MDT0000-mdc-ffff8800615be000: Communicating with 10.10.4.208@tcp, operation obd_ping failed with -107.
Lustre: lustre-MDT0000-mdc-ffff8800615be000: Connection to lustre-MDT0000 (at 10.10.4.208@tcp) was lost; in progress operations using this service will wait for recovery to complete
LustreError: 11-0: lustre-MDT0000-mdc-ffff88007d22a400: Communicating with 10.10.4.208@tcp, operation obd_ping failed with -107.
Lustre: lustre-MDT0000-mdc-ffff88007d22a400: Connection to lustre-MDT0000 (at 10.10.4.208@tcp) was lost; in progress operations using this service will wait for recovery to complete
Lustre: 30938:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1376200926/real 1376200926]  req@ffff88007ac65000 x1443051231773588/t0(0) o400->MGC10.10.4.208@tcp@10.10.4.208@tcp:26/25 lens 224/224 e 0 to 1 dl 1376200933 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
LustreError: 166-1: MGC10.10.4.208@tcp: Connection to MGS (at 10.10.4.208@tcp) was lost; in progress operations using this service will fail
Lustre: 30937:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1376200933/real 1376200933]  req@ffff88007ac65000 x1443051231773696/t0(0) o250->MGC10.10.4.208@tcp@10.10.4.208@tcp:26/25 lens 400/544 e 0 to 1 dl 1376200939 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: Evicted from MGS (at 10.10.4.208@tcp) after server handle changed from 0x2a56bfb73f95e89c to 0x2a56bfb73f95eff7
Lustre: MGC10.10.4.208@tcp: Connection restored to MGS (at 10.10.4.208@tcp)
Lustre: 30937:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1376200938/real 1376200938]  req@ffff88003dd53000 x1443051231773700/t0(0) o38->lustre-MDT0000-mdc-ffff8800615be000@10.10.4.208@tcp:12/10 lens 400/544 e 0 to 1 dl 1376200949 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
LustreError: 30937:0:(client.c:2695:ptlrpc_replay_interpret()) @@@ status 301, old was 0  req@ffff880023a91000 x1443051231773580/t566935683100(566935683100) o101->lustre-MDT0000-mdc-ffff8800615be000@10.10.4.208@tcp:12/10 lens 584/544 e 0 to 0 dl 1376201028 ref 2 fl Interpret:R/4/0 rc 301/301


 Comments   
Comment by Sarah Liu [ 13/Aug/13 ]

SLES11 SP2 client also hit this issue:
https://maloo.whamcloud.com/test_sets/c3a31540-029d-11e3-b384-52540035b04c

Comment by Jodi Levi (Inactive) [ 14/Aug/13 ]

Mike,
Would you be able to look into this one as well?
Thank you!

Comment by Andreas Dilger [ 24/Sep/13 ]

Mike, any chance to look at this?

Comment by Andreas Dilger [ 24/Sep/13 ]

It looks like replay-vbr is totally failing:
https://maloo.whamcloud.com/test_sets/6a46a070-24f8-11e3-b47a-52540035b04c

Comment by Mikhail Pershin [ 27/Sep/13 ]

http://review.whamcloud.com/7787 - patch restores VBR functionality.

Comment by Peter Jones [ 02/Oct/13 ]

Landed for 2.5.0

Generated at Sat Feb 10 01:36:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.