[LU-3443] performance impact of mdc_rpc_lock serialization Created: 06/Jun/13  Updated: 16/Oct/13  Resolved: 16/Oct/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0, Lustre 2.1.4
Fix Version/s: Lustre 2.5.0

Type: Bug Priority: Minor
Reporter: Ned Bass Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: performance

Issue Links:
Related
is related to LU-3442 MDS performance degraded by reading o... Resolved
Severity: 3
Rank (Obsolete): 8584

 Description   

Serialization of in-flight RPCs on mdc_rpc_lock makes non-modifying operations, such as calling open() on a directory, vulnerable to blocking due to a slow backend MDS filesystem. In particular, users may see long delays running ls when LDLM_ENQUEUE requests get blocked behind long-lived metadata-modifying MDS_REINT requests.

For example, in LU-3442, RPCs involving writing llog records experienced long service times due to a misbehaving backend filesystem. Therefore RPCs for operations like create(), unlink() and rename() would stay in flight for many seconds on the client. Unfortunately, these long-lived in-flight RPCs prevent LDLM_ENQUEUE requests for open() on a directory from being sent, due to the mdc_get_rpc_lock() call in mdc_enqueue(). Once issued, the LDLM_ENQUEUE request completes almost immediately since it doesn't involve synchronous I/O on the backend. It would be desirable if such non-modifying operations could be shielded from the effects of slow synchronous operations.

To that end, it would be helpful to clarify what the mdc_rpc_lock is protecting. mdc_enqueue() has this to say:

812         /* It is important to obtain rpc_lock first (if applicable), so that
813          * threads that are serialised with rpc_lock are not polluting our
814          * rpcs in flight counter. We do not do flock request limiting, though*/
815         if (it) {
816                 mdc_get_rpc_lock(obddev->u.cli.cl_rpc_lock, it);
817                 rc = mdc_enter_request(&obddev->u.cli);

but it's not clear to me what is meant by "polluting", and why the counter can't be protected by a separate lock that need no be held across the entire network request.
I also observe that the in-flight RPC counter for an OBD import rarely exceeds 1 or 2, and never approaches the upper limit of 8. So it seems we are not doing a good job of keeping a full pipeline of in-flight RPCs.

LLNL-bug-id: TOSS-2084



 Comments   
Comment by Alex Zhuravlev [ 07/Jun/13 ]

this semaphore is needed to implement "execute-once" semantics on MDS. MDS remembers the last result/XID for every client in last_rcvd file, so if the client hasn't got a reply, it can just resend the request and MDS will reconstruct the reply being aware the request has been already executed.
if we don't have this semaphore, then concurrent request will rewrite that information in last_rcvd and MDS won't be able to detect execution status, which is pretty wrong in the number of cases (as many metadata requests are not idempotent).

this is a known issue we're going to address with multislot last_rcvd at some point. it'll likely request small changes to the protocol so that the client can say which specific slot to use like NFSv4 does (if I remember correctly).

Comment by Andreas Dilger [ 07/Jun/13 ]

Ned, in LU-2613 there is work being done to avoid allocating a new transno to the open request from the client. It might also be possible to send multiple open requests from the client in parallel if they do not have the O_CREAT flag set. However, Alex's point is also valid - the open requests still change the state of the MDS in memory. If the client resends the open request and the MDS has already opened it and does not detect this correctly, then it will leak an open refcount and the file will not be closed or deleted when it is unlinked.

It might be possible to detect this based on the handle used for the open, but that would further increase the complexity of the recovery process, and only fix the non-O_CREAT open() case. The generic solution for allowing multiple modifying RPCs in flight is to allow multiple slots on the last_rcvd file for a single client.

The exact mechanism for this has not been designed at this point, and is not on any development roadmap. To be honest, given the glacial speed of the OpenSFS RFP process, it is unclear how any feature development projects will be started in the foreseeable future.

Comment by Ned Bass [ 07/Jun/13 ]

Alex, Andreas, thank you for filling in the details here. It sounds like there's nothing to do in the short term for this issue. Though if you are amenable, I'll submit a patch adding a comment block above the definition of struct mdc_rpc_lock to capture some of the useful information here.

Comment by Ned Bass [ 07/Jun/13 ]

Since it's just a comment, I'll post it for review here first, to avoid wasting time for hudson/maloo.

--- a/lustre/include/lustre_mdc.h
+++ b/lustre/include/lustre_mdc.h
@@ -69,9 +69,27 @@ struct obd_export;
 struct ptlrpc_request;
 struct obd_device;
 
+/**
+ * Serializes in-flight MDT-modifying RPC requests to preserve idempotency.
+ *
+ * This mutex is used to implement execute-once semantics on the MDT.
+ * The MDT stores the last transaction ID and result for every client in
+ * its last_rcvd file. If the client doesn't get a reply, it can safely
+ * resend the request and the MDT will reconstruct the reply being aware
+ * that the request has already been executed. Without this lock,
+ * execution status of concurrent in-flight requests would be
+ * overwritten.
+ *
+ * This design limits the extent to which we can keep a full pipeline of
+ * in-flight requests from a single client.  This limitation could be
+ * overcome by allowing multiple slots per client in the last_rcvd file.
+ */
 struct mdc_rpc_lock {
+       /** Lock protecting in-flight RPC concurrency. */
        struct mutex            rpcl_mutex;
+       /** Intent associated with currently executing request. */
        struct lookup_intent    *rpcl_it;
+       /** Used for MDS/RPC load testing purposes. */
        int                     rpcl_fakes;
 };

Comment by Prakash Surya (Inactive) [ 07/Jun/13 ]

To be honest, given the glacial speed of the OpenSFS RFP process, it is unclear how any feature development projects will be started in the foreseeable future.

Should I open a new ticket for this?

Comment by Ned Bass [ 07/Jun/13 ]

Patch to add above comment:

http://review.whamcloud.com/6593

Comment by Jodi Levi (Inactive) [ 16/Oct/13 ]

Patch landed to Master. Please let me know if more work is needed in this ticket and I will reopen.

Generated at Sat Feb 10 01:33:57 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.