By plenty, I meant there are plenty of available memory.
Yes, but how much memory is "plenty"? In the real world, memory is a finite resource. We can not program with the assumption that there is always free memory. Lustre must behave reasonably by default when memory is under contention.
Unfortunately, the exact amount of the extra memory highly depends on the performance and configuration of the OST.
No, it does not. Client memory is far faster than disk on a remote OST over the network pretty much by definition. Client memory under normal use cases is also under contention by actual applications, which are not represented by the naive tests that were used to create the graphs in this ticket.
In the real world, people buy client memory sized to fit their application. No one has the budget to buy double or triple the amount of ram for all their clients just to leave Lustre more buffer space.
Memory contention is normal in the real world, and Lustre's defaults should be selected to meet and function reasonably under real world usage.
I didn't mean the memory is for Lustre client to use for buffer space. For write RPC, clients have to hold those writing pages until the OST commits the corresponding transaction. Therefore, clients have to have extra memory to pin those pages in memory and applications can not use them. For ZFS, the typical txg timeout is 5 seconds, which means the clients will pin 5 seconds of writing data in memory; depending on the writing throughput on the client side, this can be a lot.
There is really nothing we can do on the client side. Probably we can do some tune for ZFS. The I/O generated by Lustre is different from I/O of generic workload, so we may look into the timeout of txg or restrict the memory of write cache.