We'd like to be able to perturb the timing of request processing at the PtlRPC layer with the goal being to simulate high server load, and find and expose timing related problems.
Our initial idea is to create an NRS policy that will delay request handling for some configurable amount of time. When the policy is started and a request arrives the policy will calculate an offset, within a defined, user-configurable range, from the request arrival time to set a request "start time". We can use the cfs_binheap implementation to store these requests and sort them based on this "start time". Request's are then removed from the binheap for handling only once we've reached/passed their start time. We could also choose to only delay some % of requests by allowing the request enqueue to fallback to FIFO (or whatever).
I have an initial implementation mostly done (just need to finish up lprocfs stuff). I appreciate any thoughts on this approach.