[LU-10275] ptlrpc reply acknowledgement Created: 21/Dec/14  Updated: 18/Dec/18

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Minor
Reporter: Liang Zhen (Inactive) Assignee: Amir Shehata (Inactive)
Resolution: Unresolved Votes: 0
Labels: lnet, performance

Attachments: Microsoft Word ack_perf_5.xlsx     Microsoft Word pltrpc_reply_resend_2.docx    
Issue Links:
Related
is related to LU-10274 LNet event add-on function Closed
Rank (Obsolete): 16881

 Description   

Because most ptlrpc messages do not have ACK , RPC client cannot distinguish message loss from long service time. Also, in current implementation, message-resend can only be triggered by RPC client after service timeout, no matter which message is lost in lifecycle of RPC.

To improve Lustre RAS against message loss, we should allow message resend for any step of RPC lifecycle. However, current RPC client already has request message timeout/resend protocol and adaptive timeout, it may need fundamental changes if we want to have ACK for request message and use network timeout instead of service time to trigger request message resend. This may require a lot more efforts and resources, so it is not covered by this document.

Reply-resend is relatively simple and more practicable, RPC server can repeatedly resend reply at fix time interval (e.g. 20 seconds), which should be sufficient even for latency in environment with router. Reply-resend can be stopped when there is an ACK for reply message, or client is evicted/disconnected.



 Comments   
Comment by Liang Zhen (Inactive) [ 24/Dec/14 ]

I created a dedicated patch for lnet event add-on: INTL-166

Comment by Liang Zhen (Inactive) [ 04/Jan/15 ]

patch list:
http://review.whamcloud.com/#/c/13203
http://review.whamcloud.com/#/c/13204
http://review.whamcloud.com/#/c/13219
http://review.whamcloud.com/#/c/13220
http://review.whamcloud.com/#/c/13227
http://review.whamcloud.com/#/c/13228
http://review.whamcloud.com/#/c/13489

Comment by Andreas Dilger [ 25/Feb/15 ]

Liang, have you done any testing on this to determine how much it improves reliability, recoverability, etc? Any idea on what kind of performance impact it has on normal operation? What kind of interoperability is needed for this (i.e. can it be done on a per-client basis, or do all clients need it, or what)?

Comment by Liang Zhen (Inactive) [ 26/Feb/15 ]

Andreas, because this patch can only improve reliability on one direction so far, so I did some tests with enabling message drop for reply portals only, without this patch, I got client eviction time to time while running random workload. After I applied this patch (because it is a small cluster, so I set reply-resend interval to a small value like 2-4 seconds instead of default value, so it can resend within AT), there was almost no client eviction.Of course we can't expect improvement like this in real world because message drop can be on both directions, but I think it is a good step to start.
This feature can be applied on a per-client basis, or let's say we can upgrade any node without breaking interoperability, this feature is enabled only when both end of ptlrpc connection has this patch. Also, it can be enabled/disabled at runtime.

I will attach some data, these data are not from this patch because I collected them before I worked this patch, but from a simple patch to enable ACK for all ptlrpc messages, so I think they are essentially same. From these data, we lost about 10% performance for lightweight metadata operations (0-stripe) when we have ACK for all messages (both request & reply), so I assume we may lose 5% performance with reply-ack only.

Comment by Liang Zhen (Inactive) [ 26/Feb/15 ]

performance data for ACKed lnet messages.

Comment by Liang Zhen (Inactive) [ 11/Mar/15 ]

Andreas, do you have any concern/comment on these data?

Comment by Andreas Dilger [ 12/Mar/15 ]

Liang, thanks for the data. It looks like the overhead is noticeable, but not so bad that the change is unusable.

Is it possible to make this feature optional, so that we can turn it on or off to debug?

Comment by Liang Zhen (Inactive) [ 14/Mar/15 ]

Andreas, yes I think I can do this, I will update the patch to make it optional.

Comment by Liang Zhen (Inactive) [ 19/Mar/15 ]

I will work on CORAL soon, so have to reassign this ticket to bobijam. I will maintain patches for a while, but can't finish the landing process.

Comment by Andreas Dilger [ 17/Dec/18 ]

Amir, how does this relate to our recent discussions about LNet Health and reply timeouts? Are these patches still useful?

Comment by Amir Shehata (Inactive) [ 18/Dec/18 ]

This looks like it's covered with the LNet Health work. I'll take a look at the docs in more detail to see what he had intended.

Generated at Sat Feb 10 02:33:36 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.