[LU-6928] Version mismatch during DNE replay - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: Lustre 2.8.0
Affects Version/s: Lustre 2.8.0
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

During 24 hours DNE failover test, one of client fails because of Version mismatch during replay.

Lustre: 7977:0:(client.c:2828:ptlrpc_replay_interpret()) @@@ Version mismatch during replay
  req@ffff8806698209c0 x1508081941826144/t21475879788(21475879788) o36->lustre-MDT0001-mdc-ffff880821cfdc00@192.168.2.126@o2ib:12/10 lens 608/424 e 1 to 0 dl 1438262329 ref 2 fl Interpret:R/4/0 rc -75/-75
Lustre: 7977:0:(import.c:1301:completed_replay_interpret()) lustre-MDT0001-mdc-ffff880821cfdc00: version recovery fails, reconnecting
LustreError: 167-0: lustre-MDT0001-mdc-ffff880821cfdc00: This client was evicted by lustre-MDT0001; in progress operations using this service will fail.
LustreError: 9213:0:(vvp_io.c:1475:vvp_io_init()) lustre: refresh file layout [0x240002341:0x1d74:0x0] error -5.
LustreError: 29913:0:(lmv_obd.c:1332:lmv_fid_alloc()) Can't alloc new fid, rc -19
Lustre: lustre-MDT0001-mdc-ffff880821cfdc00: Connection restored to lustre-MDT0001 (at 192.168.2.126@o2ib)
Lustre: DEBUG MARKER: ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=43433 DURATION=86400 PERIOD=1800
Lustre: DEBUG MARKER: Client load failed on node c01, rc=1
Lustre: DEBUG MARKER: Duration: 86400
Lustre: Unmounted lustre-client

Attachments

Issue Links

is related to

LU-6831 The ticket for tracking all DNE2 bugs

Reopened

Activity

[LU-6928] Version mismatch during DNE replay

Di Wang (Inactive) added a comment - 27/Aug/15 4:41 PM

Duplicate with ~~LU-5951~~

Di Wang (Inactive) added a comment - 27/Aug/15 4:41 PM Duplicate with LU-5951

Niu Yawei (Inactive) added a comment - 27/Aug/15 7:01 AM

Yes, the patch for ~~LU-5951~~ (http://review.whamcloud.com/#/c/15473/) will fix this problem.

Niu Yawei (Inactive) added a comment - 27/Aug/15 7:01 AM Yes, the patch for LU-5951 ( http://review.whamcloud.com/#/c/15473/ ) will fix this problem.

Di Wang (Inactive) added a comment - 14/Aug/15 9:16 PM

Niu: could you please confirm if your patch can fix this problem? Thanks

Di Wang (Inactive) added a comment - 14/Aug/15 9:16 PM Niu: could you please confirm if your patch can fix this problem? Thanks

Gerrit Updater added a comment - 30/Jul/15 10:00 PM

wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/15812
Subject: ~~LU-6928~~ ptlrpc: Do not pack min XID for connect req
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: acb892354ac119dfe80edf60f5f2a2923cdcc980

Gerrit Updater added a comment - 30/Jul/15 10:00 PM wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/15812 Subject: LU-6928 ptlrpc: Do not pack min XID for connect req Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: acb892354ac119dfe80edf60f5f2a2923cdcc980

Di Wang (Inactive) added a comment - 30/Jul/15 8:58 PM - edited

Hmm, I think this is what happened

During replay

1. client send replay request to MDS01, and MDS01 did not handle it in time.
2. So client did not get reply, so it expires and reconnects the import. And during sending this CONNECT request, it will first find out the min XID first,
but it only tries to find such XID in sending and replay list, which is wrong, because the replay request (just timeout) are not in these list (see ptlrpc_check_set()) . So usually, it will get the xid of this connect request, which is obviously larger than the xid of replay request.

        /* find the lowest unreplied XID */
        list_for_each(tmp, &imp->imp_delayed_list) {
                struct ptlrpc_request *r;
                r = list_entry(tmp, struct ptlrpc_request, rq_list);
                if (r->rq_xid < min_xid)
                        min_xid = r->rq_xid;
        }
        list_for_each(tmp, &imp->imp_sending_list) {
                struct ptlrpc_request *r;
                r = list_entry(tmp, struct ptlrpc_request, rq_list);
                if (r->rq_xid < min_xid)
                        min_xid = r->rq_xid;
        }

Delete req from sending list after req failure.

   /* Request already may be not on sending or delaying list. This
                 * may happen in the case of marking it erroneous for the case
                 * ptlrpc_import_delay_req(req, status) find it impossible to
                 * allow sending this rpc and returns *status != 0. */
                if (!list_empty(&req->rq_list)) {
                        list_del_init(&req->rq_list);
                        atomic_dec(&imp->imp_inflight);
                }
                spin_unlock(&imp->imp_lock);

3. And MDT handle that replay request (mentioned in 1) at this time, which will add the lrd of this req into the reply_list.
4. Then connect request arrives, and it has bigger reply XID, which then delete the lrd created in step 3
5. After connect, client will resend the replay request, but server can not identify it as a already received request, because lrd has been deleted in 3. So the replay request has been executed twice, then it causes this Version mismatch issue.

so the fix might be in step 2, i.e. we need consider the expired request when finding this min XID, or we just do not pack this min xid for CONNECT request.

Di Wang (Inactive) added a comment - 30/Jul/15 8:58 PM - edited Hmm, I think this is what happened During replay 1. client send replay request to MDS01, and MDS01 did not handle it in time. 2. So client did not get reply, so it expires and reconnects the import. And during sending this CONNECT request, it will first find out the min XID first, but it only tries to find such XID in sending and replay list, which is wrong, because the replay request (just timeout) are not in these list (see ptlrpc_check_set()) . So usually, it will get the xid of this connect request, which is obviously larger than the xid of replay request. /* find the lowest unreplied XID */ list_for_each(tmp, &imp->imp_delayed_list) { struct ptlrpc_request *r; r = list_entry(tmp, struct ptlrpc_request, rq_list); if (r->rq_xid < min_xid) min_xid = r->rq_xid; } list_for_each(tmp, &imp->imp_sending_list) { struct ptlrpc_request *r; r = list_entry(tmp, struct ptlrpc_request, rq_list); if (r->rq_xid < min_xid) min_xid = r->rq_xid; } Delete req from sending list after req failure. /* Request already may be not on sending or delaying list. This * may happen in the case of marking it erroneous for the case * ptlrpc_import_delay_req(req, status) find it impossible to * allow sending this rpc and returns *status != 0. */ if (!list_empty(&req->rq_list)) { list_del_init(&req->rq_list); atomic_dec(&imp->imp_inflight); } spin_unlock(&imp->imp_lock); 3. And MDT handle that replay request (mentioned in 1) at this time, which will add the lrd of this req into the reply_list. 4. Then connect request arrives, and it has bigger reply XID, which then delete the lrd created in step 3 5. After connect, client will resend the replay request, but server can not identify it as a already received request, because lrd has been deleted in 3. So the replay request has been executed twice, then it causes this Version mismatch issue. so the fix might be in step 2, i.e. we need consider the expired request when finding this min XID, or we just do not pack this min xid for CONNECT request.

Di Wang (Inactive) added a comment - 30/Jul/15 8:20 PM

Yes, looks like same issue, I believe it is a bug caused by multiple slot patch. I will cook a patch.

Di Wang (Inactive) added a comment - 30/Jul/15 8:20 PM Yes, looks like same issue, I believe it is a bug caused by multiple slot patch. I will cook a patch.

James Nunez (Inactive) added a comment - 30/Jul/15 5:27 PM

Di - I'm seeing a very similar error in replay-single test 48 in review-dne-part-2. Do you think this is the same issue? The logs are at: https://testing.hpdd.intel.com/test_sets/8281843c-365f-11e5-830b-5254006e85c2

The client console shows:

18:10:31:Lustre: DEBUG MARKER: == replay-single test 48: MDS->OSC failure during precreate cleanup (2824) == 18:08:48 (1438193328)
18:10:31:Lustre: DEBUG MARKER: mcreate /mnt/lustre/fsa-$(hostname); rm /mnt/lustre/fsa-$(hostname)
18:10:31:Lustre: DEBUG MARKER: if [ -d /mnt/lustre2 ]; then mcreate /mnt/lustre2/fsa-$(hostname); rm /mnt/lustre2/fsa-$(hostname); fi
18:10:31:Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0000
18:10:31:LustreError: 7733:0:(client.c:2816:ptlrpc_replay_interpret()) request replay timed out, restarting recovery
18:10:31:Lustre: 7733:0:(client.c:2828:ptlrpc_replay_interpret()) @@@ Version mismatch during replay
18:10:31:  req@ffff8800379e8c80 x1508053244555028/t197568495717(197568495717) o36->lustre-MDT0000-mdc-ffff88007b65b000@10.1.4.62@tcp:12/10 lens 640/416 e 2 to 0 dl 1438193365 ref 1 fl Interpret:R/6/0 rc -75/-75
18:10:31:Lustre: 7733:0:(import.c:1306:completed_replay_interpret()) lustre-MDT0000-mdc-ffff88007b65b000: version recovery fails, reconnecting
18:10:31:LustreError: 167-0: lustre-MDT0000-mdc-ffff88007b65b000: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
18:10:31:LustreError: Skipped 1 previous similar message
18:10:31:LustreError: 13949:0:(lmv_obd.c:1473:lmv_statfs()) can't stat MDS #0 (lustre-MDT0000-mdc-ffff88007b65b000), error -5
18:10:31:LustreError: 13949:0:(llite_lib.c:1707:ll_statfs_internal()) md_statfs fails: rc = -5
18:10:31:Lustre: DEBUG MARKER: /usr/sbin/lctl mark  replay-single test_48: @@@@@@ FAIL: client_up failed 
18:10:31:Lustre: DEBUG MARKER: replay-single test_48: @@@@@@ FAIL: client_up failed

James Nunez (Inactive) added a comment - 30/Jul/15 5:27 PM Di - I'm seeing a very similar error in replay-single test 48 in review-dne-part-2. Do you think this is the same issue? The logs are at: https://testing.hpdd.intel.com/test_sets/8281843c-365f-11e5-830b-5254006e85c2 The client console shows: 18:10:31:Lustre: DEBUG MARKER: == replay-single test 48: MDS->OSC failure during precreate cleanup (2824) == 18:08:48 (1438193328) 18:10:31:Lustre: DEBUG MARKER: mcreate /mnt/lustre/fsa-$(hostname); rm /mnt/lustre/fsa-$(hostname) 18:10:31:Lustre: DEBUG MARKER: if [ -d /mnt/lustre2 ]; then mcreate /mnt/lustre2/fsa-$(hostname); rm /mnt/lustre2/fsa-$(hostname); fi 18:10:31:Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0000 18:10:31:LustreError: 7733:0:(client.c:2816:ptlrpc_replay_interpret()) request replay timed out, restarting recovery 18:10:31:Lustre: 7733:0:(client.c:2828:ptlrpc_replay_interpret()) @@@ Version mismatch during replay 18:10:31: req@ffff8800379e8c80 x1508053244555028/t197568495717(197568495717) o36->lustre-MDT0000-mdc-ffff88007b65b000@10.1.4.62@tcp:12/10 lens 640/416 e 2 to 0 dl 1438193365 ref 1 fl Interpret:R/6/0 rc -75/-75 18:10:31:Lustre: 7733:0:(import.c:1306:completed_replay_interpret()) lustre-MDT0000-mdc-ffff88007b65b000: version recovery fails, reconnecting 18:10:31:LustreError: 167-0: lustre-MDT0000-mdc-ffff88007b65b000: This client was evicted by lustre-MDT0000; in progress operations using this service will fail. 18:10:31:LustreError: Skipped 1 previous similar message 18:10:31:LustreError: 13949:0:(lmv_obd.c:1473:lmv_statfs()) can't stat MDS #0 (lustre-MDT0000-mdc-ffff88007b65b000), error -5 18:10:31:LustreError: 13949:0:(llite_lib.c:1707:ll_statfs_internal()) md_statfs fails: rc = -5 18:10:31:Lustre: DEBUG MARKER: /usr/sbin/lctl mark replay-single test_48: @@@@@@ FAIL: client_up failed 18:10:31:Lustre: DEBUG MARKER: replay-single test_48: @@@@@@ FAIL: client_up failed

People

Assignee:: Niu Yawei (Inactive)

Reporter:: Di Wang (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 30/Jul/15 3:03 PM

Updated:: 27/Aug/15 4:41 PM

Resolved:: 27/Aug/15 4:41 PM