[LU-1239] cascading client evictions - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Minor
Fix Version/s: Lustre 2.3.0
Affects Version/s: None
Labels:
None

Severity:
3
Rank (Obsolete):
4565

Description

recently I have found the following scenario that may lead to cascading client reconnects, lock timeouts, evictions, etc.

1. MDS is overloaded with enqueues, they consume all the threads on MDS_REQUEST portal.
2. it happened that some rpc timed out on 1 client what led to its reconnection. this client has some locks to cancel, MDS is waiting for them.
3. client sends MDS_CONNECT, but there is no empty thread to handle it.
4. other clients are waiting for their enqueue completions, they try to ping MDS if it is still alive, but PING is also sent to MDS_REQUEST portal, despite the fact it is a high priority rpc, it has no special handlers (srv_hpreq_handler == NULL) and therefore 2nd thread is not reserved for hi-priority rpcs on such services:

static int ptlrpc_server_allow_normal(struct ptlrpc_service *svc, int force)
{
#ifndef __KERNEL__
        if (1) /* always allow to handle normal request for liblustre */
                return 1;
#endif
        if (force ||
            svc->srv_n_active_reqs < svc->srv_threads_running - 2)
                return 1;

        if (svc->srv_n_active_reqs >= svc->srv_threads_running - 1)
                return 0;

        return svc->srv_n_active_hpreq > 0 || svc->srv_hpreq_handler == NULL;
}

no thread to handle pings - other clients get timed out rpc.
6. once 1 ldlm lock times out, enqueue completes and an MDS_CONNECT may be taken into handling, however this client is likely to have an enqueue rpc in processing on MDS, thus it gets ebusy and will re-try only after some delay, whereas others tries to re-connect and consume MDS threads by enqueues
again. this is being discussed in ~~LU-7~~, but it is not the main issue here.

fixes:
1) reserve an extra threads on services which expect PINGS to come.
2) make CONNECTs hi-priority RPCs.
3) ~~LU-7~~ to address (6)

Attachments

Issue Links

is related to

LU-7 Reconnect server->client connection

Resolved

Activity

[LU-1239] cascading client evictions

Nathan Rutman added a comment - 21/Nov/12 3:27 PM

Xyratex MRP-455

Nathan Rutman added a comment - 21/Nov/12 3:27 PM Xyratex MRP-455

Cory Spitz added a comment - 03/Aug/12 1:05 AM

Chris, yes the patch is suitable for 2.1. Cray initially found this bug on 2.1 and Vitaly developed the fix for Xyratex's 2.1+patches: https://github.com/Xyratex/lustre-stable/commit/afcf3cf1091c67d076ef36dc0d73cd649f84421e.

Cory Spitz added a comment - 03/Aug/12 1:05 AM Chris, yes the patch is suitable for 2.1. Cray initially found this bug on 2.1 and Vitaly developed the fix for Xyratex's 2.1+patches: https://github.com/Xyratex/lustre-stable/commit/afcf3cf1091c67d076ef36dc0d73cd649f84421e .

Christopher Morrone (Inactive) added a comment - 02/Aug/12 6:49 PM

Is this patch suitable for 2.1? I think we're seeing the same cascading client evictions there.

Christopher Morrone (Inactive) added a comment - 02/Aug/12 6:49 PM Is this patch suitable for 2.1? I think we're seeing the same cascading client evictions there.

Peter Jones added a comment - 25/Jul/12 10:56 AM

Landed for 2.3

Peter Jones added a comment - 25/Jul/12 10:56 AM Landed for 2.3

Nathan Rutman added a comment - 18/May/12 7:11 PM

sorry, one month. I didn't realize it had gone through some revisions.

Nathan Rutman added a comment - 18/May/12 7:11 PM sorry, one month. I didn't realize it had gone through some revisions.

Nathan Rutman added a comment - 18/May/12 7:08 PM

This patch has been sitting here for two months with no review - what should we do with it?

Nathan Rutman added a comment - 18/May/12 7:08 PM This patch has been sitting here for two months with no review - what should we do with it?

Nathan Rutman added a comment - 05/Apr/12 6:52 PM

Lustre 2.10 server, Lustre 1.8.6 clients.
4000 nodes simultaneously trying to mkdir -p the same directory in the lustre root dir and create a (distinct) file in that dir.

The MDS_CONNECT and OST_CONNECT RPCs should only be high priority if they are reconnects, not if they are initial connects. Please add a check for MSG_CONNECT_RECONNECT instead of just making all CONNECT requests high priority.

Why can't they all be high priority? They should take 0 time to process on the MDS relative to anything else, and we want a responsive client mount command. And it's simpler code.

Nathan Rutman added a comment - 05/Apr/12 6:52 PM Lustre 2.10 server, Lustre 1.8.6 clients. 4000 nodes simultaneously trying to mkdir -p the same directory in the lustre root dir and create a (distinct) file in that dir. The MDS_CONNECT and OST_CONNECT RPCs should only be high priority if they are reconnects, not if they are initial connects. Please add a check for MSG_CONNECT_RECONNECT instead of just making all CONNECT requests high priority. Why can't they all be high priority? They should take 0 time to process on the MDS relative to anything else, and we want a responsive client mount command. And it's simpler code.

Andreas Dilger added a comment - 05/Apr/12 4:32 PM

What version of Lustre hit this problem, and what kind of workload blocked all of the MDS threads?

Andreas Dilger added a comment - 05/Apr/12 4:32 PM What version of Lustre hit this problem, and what kind of workload blocked all of the MDS threads?

Vitaly Fertman added a comment - 20/Mar/12 3:03 PM

http://review.whamcloud.com/2355

Vitaly Fertman added a comment - 20/Mar/12 3:03 PM http://review.whamcloud.com/2355

People

Assignee:: WC Triage

Reporter:: Vitaly Fertman

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 20/Mar/12 11:41 AM

Updated:: 17/Mar/15 5:54 PM

Resolved:: 25/Jul/12 10:56 AM