Allocating the maximum reply buffer size was previously a significant issue, as LNet would drop any too-large replies on the floor and the client would never see them. Now, with RPC resending this is less of a concern that we always allocate the maximum reply buffer size, and instead find a balance between "too large" (expensive allocations and high RAM usage) and "too small" (too many resends). Definitely ACLs are in the "too large" territory today, many files never even have an ACL, and those that do store only a few extra groups.
Having some kind of common and relatively light-weight helper routine for each these buffers that behaves like a decaying average, but works like a "median" instead of a "mean" would be ideal. That keeps each buffer component large enough to receive a full-sized reply if that is commonly seen, while not averaging to "slightly smaller than the useful size" which will otherwise happen if there are some cases where the full-sized buffer is not needed. It should check the buffer sizes after receiving each reply to ensure decisions are based on the actually-needed buffer sizes rather than the clients estimate of what the sizes should be.
Since we may not need the maximum-sized buffer for each component on each file (e.g. maximum ACL size + maximum layout size + maximum xattr size), and there is room for rounding up to the actual allocation size, there is some room for aggregation of these buffers at allocation time. We don't want to just pin them at the maximum size ever seen, since this may vary dramatically by user or workload over time as well.
One possibility for implementation is to have a regular decaying average to find the mean buffer size, and additionally store the buffer sizes that exceed this value within a few recent time windows in an array and the count of entries that are within some threshold of this maximum. As long as the count or large replies is above some threshold (e.g. more than 5% of all replies in this window) then the maximum value is used, otherwise the decaying average (which excludes these large replies) is used. That avoids the false "benefit" of having a larger average buffer size, when e.g. 4% of replies are a bit larger than the decaying average but still need a resend, yet 96% of replies are much smaller and do not benefit from the larger allocation.
We already have code that does most of this on a per-target basis in struct imp_at using struct adaptive_timeout and at_measured. This could get a slightly better name like averaging_table, and the usage routines encapsulated a bit better. It would make sense to avoid direct access to the parameters at_history, at_min, and at_max in the main code (e.g. add pointers to them to the data structures) so that we can use different values for the adaptive buffer sizes.
Still more to do - That patch was low hanging fruit, limiting a few buffers that only hold certain xattrs to the max size for those xattrs. We've still got the simplistic "just use the biggest size you've seen" behavior. But it is less serious because we now limit xattr sizes to 64 KiB, whereas before we were at 1 MiB for ldiskfs, which was obviously super bad.