[LU-10889] Inconsistent request deadline between client and server. Created: 09/Apr/18  Updated: 06/Jun/21  Resolved: 19/Apr/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.12.0

Type: Bug Priority: Major
Reporter: Alexander Boyko Assignee: Alexander Boyko
Resolution: Fixed Votes: 0
Labels: patch

Issue Links:
Related
Severity: 3
Epic: timeout
Rank (Obsolete): 9223372036854775807

 Description   

A client can resend requests and a server have ability to find a duplicate requests. In this case a server drops the request. As a result the client request deadline is different than the deadline for same request at server.
For example

00000100:00080000:11.1F:1468964870.796746:0:27071:0:(service.c:1556:ptlrpc_server_check_resend_in_progress()) @@@ Found duplicate req in processing  req@ffff880d9168b080 x1539226900119556/t0(0) o101->08998e08-887e-a620-6fb3-36cb6d9403ee@2145@gni1:-1/-1 lens 576/0 e 0 to 0 dl 1468965375 ref 1 fl New:/2/ffffffff rc 0/-1
00000100:00080000:11.1:1468964870.796752:0:27071:0:(service.c:1557:ptlrpc_server_check_resend_in_progress()) @@@ Request being processed  req@ffff880cd7f5dcc0 x1539226900119556/t0(0) o101->08998e08-887e-a620-6fb3-36cb6d9403ee@2145@gni1:-1/-1 lens 576/0 e 0 to 0 dl 1468965030 ref 1 fl New:/0/ffffffff rc 0/-1

At client request x1539226900119556 has deadline 1468965375, at server 1468965030. So, in the worst case the client will wait for 345sec long than server for the request.
 



 Comments   
Comment by Gerrit Updater [ 09/Apr/18 ]

Alexandr Boyko (c17825@cray.com) uploaded a new patch: https://review.whamcloud.com/31910
Subject: LU-10889 ptlrpc: update req timeout if resending happened
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 745ac133737b43e88c8bdbaa250850a46de8116c

Comment by Andreas Dilger [ 12/Apr/18 ]

This is only an issue if the server fails, and the client does not get IR notification of this and waits longer before a resend?

On a related note (which is what I'd thought your patch was fixing), if you are working on this area of code you might consider to fix LU-8750.

Comment by Gerrit Updater [ 19/Apr/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31910/
Subject: LU-10889 ptlrpc: update req timeout if resending happened
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b4603a9e81239b4e6021c640c1d24e4ed8f8fc4b

Comment by Peter Jones [ 19/Apr/18 ]

Landed for 2.12

Generated at Sat Feb 10 02:39:04 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.