Description
Lustre has server side QoS mechanism based on NRS TBF policy (LU-3558). NRS TBF policy is able to enforce rate limitations based on both NID rules and JOBID rules. However, when using JOBD-based TBF rules, if multiple jobs run on the same client, the RPC rates of those jobs will be affected by each other. More precisely, the job that has high RPC rate limitation might get slow RPC rate actually. The reason of that is, the job that has slower RPC rate limitations might exaust the max-in-flight-RPC-number limitation, or the max-cache-pages limitation.
In order to prevent this from happening, a client side mechanism needs to be added to make the RPC sending chechanism at least more fair for all jobs.