[LU-12444] Remove ambiguous request flag of no_resend Created: 17/Jun/19  Updated: 17/Jun/19

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Minor
Reporter: Li Xi Assignee: Li Xi
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-12378 sanity-quota test 1 fails with 'proje... Resolved
Rank (Obsolete): 9223372036854775807

 Description   

Two flags of rq_no_delay and rq_no_resend might not be necessary. We don't have precise definitions and uages conditions for them to distinguish them.

I think rq_no_delay means the request should quit and return whenever there is any possibility of being blocked, either it is caused by reconnection or other conditions. And rq_no_resend should always be set when rq_no_delay is set, which is true in a lot of places but not all. It is not clear why rq_no_resend is necessary. Even there is a case in which rq_no_delay is not suitable, a more precise flag or a better mechanism should be used.



 Comments   
Comment by Gerrit Updater [ 17/Jun/19 ]

Li Xi (lixi@ddn.com) uploaded a new patch: https://review.whamcloud.com/35244
Subject: LU-12444 ptlrpc: remove no_resend flag
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8ecd22a39c25365d54b7febb1a2b5eef7e3ccae9

Comment by Patrick Farrell (Inactive) [ 17/Jun/19 ]

Li Xi,

Thank you very much for diving in and trying this...

Comment by Andreas Dilger [ 17/Jun/19 ]

IMHO, no_resend has a clear meaning - the RPC may be queued on the client, but it gets one chance to be sent and if it times out there is no reason to resend it. I don't think this is the same as "never" blocking an RPC for no_delay. What constitutes "blocking"? Local memory allocation, queue delay in the network, other?

Comment by Li Xi [ 17/Jun/19 ]

OK. Then "blocking" needs more detailed definition. But I think the use case of no_delay seems clear: quit whenever it hits problem/failure when trying to proceed, or seeing high possibility of problem/failure if proceed. So, if memory allocation failure, yes, no_delay request would quit. And if the request handler foresees high possibility of slow memory allocation or failure of memory allocation, no_delay request would quit too.

no_resend has a clear meaning - the RPC may be queued on the client, but it gets one chance to be sent and if it times out there is no reason to resend it.

This sounds like part of no_delay's functionality. I am wondering whether there is any possibility to re-use no_delay for (most of) the cases when no_resend is used. And I know there might be some cases when rq_no_delay is not suitable or not enough. And my attemption here is to check what these cases are. "req->rq_no_resend = req->rq_no_delay = 1" is written in a lot of places. So, I feel this might simplify the logic in general.

At least, even cleaning up rq_no_resend is too complex to come true, we still need a patch adding some comments to explicitly explain the difference between these two flags to avoid future confusion.

Generated at Sat Feb 10 02:52:38 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.