Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Lustre 2.1.6, Lustre 2.4.1, Lustre 2.5.0, Lustre 2.12.0
-
3
-
8195
Description
Shared directory unlinks using metabench seem significantly slower with Lustre 2.x clients versus Lustre 1.8.x
512 processes, 8 ppn, 2.1 server
1.8.6 clients: 18k unlinks/s
2.3 clients: 7k unlinks/s
creates are comprable at 28k creates/s
Mdtest shows no such regression. Digging into metabench a little, it seems that when deleting files, metabench processes (I am told) readdir, select "the next" file, and delete it, effectively racing each other, whereas mdtest deterministically choses the files to delete based on process id.
This seems to imply it's directory locking contention on the MDT.
On the clients, the majority difference in time is spend in ptlrpc_queue_wait (not sure of units):
1.8.8: 347
2.3: 923
On the MDT, the big difference is mdt_object_find_lock:
1.8.8: 51us
2.3: 1110us
Also, using ldlm stats, it seems the 2.3 clients cause twice as many ldlm_bl_callbacks as 1.8.8 clients.
So apparently the 2.3 client is holding directory locks differently than the 1.8 clients. We're still looking into this, but if anyone has thoughts we'd love to hear them.
Attachments
Issue Links
- duplicates
-
LU-5232 cache directory contents on file descriptor on lock revocation
- Resolved
- is duplicated by
-
LU-4906 rm -rf triggers too much MDS_READPAGE
- Resolved
-
LU-4096 Do not allocate large buffer for readdir rpc in case of a small directory
- Resolved
- is related to
-
LU-15535 deadlock on lli->lli_lsm_sem
- Open
-
LU-1167 Poor mdtest unlink performance with multiple processes per node
- Open
-
LU-9458 LustreError: 12764:0:(sec_bulk.c:188:enc_pools_release_free_pages()) ASSERTION( npages <= page_pools.epp_free_pages ) failed:
- Resolved
-
LU-10999 Use readdir cache for lookup when available
- Open
-
LU-11000 Retain cached dentries under directory update lock
- Open
-
LU-17493 restore LDLM cancel on blocking callback
- Open
-
LU-8641 speedup run_metabech () : make cleanup optional
- Resolved
- is related to
-
LU-10225 sanity test 1 error: apply rmdir/rm on striped dir failed
- Open
-
LU-1431 Support for larger than 1MB sequential I/O RPCs
- Resolved
-
LU-5232 cache directory contents on file descriptor on lock revocation
- Resolved