[LU-3308] large readdir chunk size slows unlink/"rm -r" performance - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.1.6, Lustre 2.4.1, Lustre 2.5.0, Lustre 2.12.0
Labels:

Severity:
3
Rank (Obsolete):
8195

Description

Shared directory unlinks using metabench seem significantly slower with Lustre 2.x clients versus Lustre 1.8.x

512 processes, 8 ppn, 2.1 server
1.8.6 clients: 18k unlinks/s
2.3 clients: 7k unlinks/s

creates are comprable at 28k creates/s

Mdtest shows no such regression. Digging into metabench a little, it seems that when deleting files, metabench processes (I am told) readdir, select "the next" file, and delete it, effectively racing each other, whereas mdtest deterministically choses the files to delete based on process id.

This seems to imply it's directory locking contention on the MDT.

On the clients, the majority difference in time is spend in ptlrpc_queue_wait (not sure of units):
1.8.8: 347
2.3: 923

On the MDT, the big difference is mdt_object_find_lock:
1.8.8: 51us
2.3: 1110us

Also, using ldlm stats, it seems the 2.3 clients cause twice as many ldlm_bl_callbacks as 1.8.8 clients.

So apparently the 2.3 client is holding directory locks differently than the 1.8 clients. We're still looking into this, but if anyone has thoughts we'd love to hear them.

Attachments

Issue Links

duplicates

LU-5232 cache directory contents on file descriptor on lock revocation

Resolved

is duplicated by

LU-4906 rm -rf triggers too much MDS_READPAGE

Resolved

LU-4096 Do not allocate large buffer for readdir rpc in case of a small directory

Resolved

is related to

LU-15535 deadlock on lli->lli_lsm_sem

Open

LU-1167 Poor mdtest unlink performance with multiple processes per node

Open

LU-9458 LustreError: 12764:0:(sec_bulk.c:188:enc_pools_release_free_pages()) ASSERTION( npages <= page_pools.epp_free_pages ) failed:

Resolved

LU-10999 Use readdir cache for lookup when available

Open

LU-11000 Retain cached dentries under directory update lock

Open

LU-17493 restore LDLM cancel on blocking callback

Open

LU-8641 speedup run_metabech () : make cleanup optional

Resolved

is related to

LU-10225 sanity test 1 error: apply rmdir/rm on striped dir failed

Open

LU-1431 Support for larger than 1MB sequential I/O RPCs

Resolved

LU-5232 cache directory contents on file descriptor on lock revocation

Resolved

(5 is related to, 3 is related to )

Activity

People

Assignee:: WC Triage

Reporter:: Nathan Rutman

Votes:: 0 Vote for this issue

Watchers:: 20 Start watching this issue

Dates

Created:: 10/May/13 12:03 AM

Updated:: 01/Feb/24 12:31 AM