Shared directory unlinks using metabench seem significantly slower with Lustre 2.x clients versus Lustre 1.8.x
512 processes, 8 ppn, 2.1 server
1.8.6 clients: 18k unlinks/s
2.3 clients: 7k unlinks/s
creates are comprable at 28k creates/s
Mdtest shows no such regression. Digging into metabench a little, it seems that when deleting files, metabench processes (I am told) readdir, select "the next" file, and delete it, effectively racing each other, whereas mdtest deterministically choses the files to delete based on process id.
This seems to imply it's directory locking contention on the MDT.
On the clients, the majority difference in time is spend in ptlrpc_queue_wait (not sure of units):
On the MDT, the big difference is mdt_object_find_lock:
Also, using ldlm stats, it seems the 2.3 clients cause twice as many ldlm_bl_callbacks as 1.8.8 clients.
So apparently the 2.3 client is holding directory locks differently than the 1.8 clients. We're still looking into this, but if anyone has thoughts we'd love to hear them.