[LU-10776] Large grant requests still don't work, resulting in small write RPCs Created: 05/Mar/18  Updated: 07/Nov/19  Resolved: 09/Apr/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.12.0, Lustre 2.10.7

Type: Bug Priority: Minor
Reporter: Patrick Farrell (Inactive) Assignee: Patrick Farrell (Inactive)
Resolution: Fixed Votes: 0
Labels: patch

Issue Links:
Duplicate
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Previously, LU-2049 introduced a bug with older clients and larger grant requests stemming from 4 MiB RPCs. This was resolved by a simple tweak to interop behavior, but unfortunately, there are still (at least two...) bugs in the grant handling code with 16 MiB RPCs.

Specifically, the server automatically refuses grant requests > 2 GiB in size, which is a problem because client side grant requested is calculated like this:
bytes_per_page*pages_per_rpc*max_rpcs_in_flight + various padding (extent tax, etc)
(see osc_announce_cached)

Consider 16 MiB RPCs with 128 RPCs in flight:
4096*4096*128 = 2 GiB, and that + tax is > 2 GiB. This gets those requests refused by the server, which results in terrible performance.

There is a different but also "fun" bug when we go higher:
16 MiB RPCs with 256 RPCs in flight:
4096*4096*256 = 4 GiB + tax is > 4 GiB.... Which overflows the 32 bit unsigned o_undirty which is used to communicate grant request to the server. So we actually request, effectively, only the extent tax and other padding. This results in a maximum grant request of ~ 19 MiB, which works quite badly.

The result of all this is badly degraded performance at very high RPC sizes and max_rpcs_in_flight.

Suggested patch will limit grant request to < 2 GiB.



 Comments   
Comment by Gerrit Updater [ 05/Mar/18 ]

Patrick Farrell (paf@cray.com) uploaded a new patch: https://review.whamcloud.com/31533
Subject: LU-10776 osc: Do not request more than 2GiB grant
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 329aa190e3790c1485c0622fc5d0a110918f5d56

Comment by Gerrit Updater [ 09/Apr/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31533/
Subject: LU-10776 osc: Do not request more than 2GiB grant
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: c0246d8878097a618efa9296d90896ac2cc2e9e4

Comment by Peter Jones [ 09/Apr/18 ]

Landed for 2.12

Comment by Gerrit Updater [ 17/Jan/19 ]

Li Xi (lixi@ddn.com) uploaded a new patch: https://review.whamcloud.com/34051
Subject: LU-10776 osc: Do not request more than 2GiB grant
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 3accbc1867e03375e82718ab37b31be967e4a757

Comment by Gerrit Updater [ 15/Feb/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34051/
Subject: LU-10776 osc: Do not request more than 2GiB grant
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 5ad464969157aee2d868f09a4dc302e01d0221c6

Generated at Sat Feb 10 02:38:05 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.