Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
3
-
9223372036854775807
Description
Previously, LU-2049 introduced a bug with older clients and larger grant requests stemming from 4 MiB RPCs. This was resolved by a simple tweak to interop behavior, but unfortunately, there are still (at least two...) bugs in the grant handling code with 16 MiB RPCs.
Specifically, the server automatically refuses grant requests > 2 GiB in size, which is a problem because client side grant requested is calculated like this:
bytes_per_page*pages_per_rpc*max_rpcs_in_flight + various padding (extent tax, etc)
(see osc_announce_cached)
Consider 16 MiB RPCs with 128 RPCs in flight:
4096*4096*128 = 2 GiB, and that + tax is > 2 GiB. This gets those requests refused by the server, which results in terrible performance.
There is a different but also "fun" bug when we go higher:
16 MiB RPCs with 256 RPCs in flight:
4096*4096*256 = 4 GiB + tax is > 4 GiB.... Which overflows the 32 bit unsigned o_undirty which is used to communicate grant request to the server. So we actually request, effectively, only the extent tax and other padding. This results in a maximum grant request of ~ 19 MiB, which works quite badly.
The result of all this is badly degraded performance at very high RPC sizes and max_rpcs_in_flight.
Suggested patch will limit grant request to < 2 GiB.