Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.6.0
-
3
-
13548
Description
On 2.5.57-79-ge7f99a5 I see that rm -rf on a directory with n regular files causes O(n) MDS_READPAGE requests to be sent. I ran the following on 2.4.3 and master:
export MOUNT_2=y llmount.sh cd /mnt/lustre echo clear | tee /proc/fs/lustre/mdc/*/stats mkdir d touch d/{0..255} cat /proc/fs/lustre/mdc/*/stats cd /mnt/lustre2 echo clear | tee /proc/fs/lustre/mdc/*/stats rm -rf d cat /proc/fs/lustre/mdc/*/stats
On 2.4.3:
## mkdir and touch req_waittime 773 samples [usec] 65 55988 339933 3295555639 req_active 773 samples [reqs] 1 1 773 773 mds_close 256 samples [usec] 65 486 58958 16078566 mds_reint 257 samples [usec] 93 826 74764 29250026 ldlm_enqueue 259 samples [usec] 191 1423 150223 115570903 seq_query 1 samples [usec] 55988 55988 55988 3134656144 ## rm -rf snapshot_time 1397509895.872324 secs.usecs req_waittime 258 samples [usec] 50 1054 32830 6165600 req_active 258 samples [reqs] 1 1 258 258 ldlm_cancel 258 samples [usec] 50 1054 32830 6165600 snapshot_time 1397509895.872354 secs.usecs req_waittime 524 samples [usec] 45 9854 212336 225452482 req_active 524 samples [reqs] 1 4 876 1660 mds_close 1 samples [usec] 390 390 390 152100 mds_reint 257 samples [usec] 331 9854 151751 209416267 mds_readpage 3 samples [usec] 271 323 902 272634 ldlm_enqueue 261 samples [usec] 45 736 59120 15595504 ldlm_cancel 2 samples [usec] 64 109 173 15977
On master:
## mkdir and touch: snapshot_time 1397507941.992796 secs.usecs snapshot_time 1397507941.992828 secs.usecs req_waittime 1282 samples [usec] 50 2674 364172 203043372 req_active 1282 samples [reqs] 1 1 1282 1282 mds_close 256 samples [usec] 61 640 50251 15984775 mds_reint 257 samples [usec] 89 1045 60223 21230981 mds_getxattr 256 samples [usec] 50 658 38276 9345158 ldlm_enqueue 513 samples [usec] 82 2674 215422 156482458 ## rm -rf snapshot_time 1397507954.948995 secs.usecs req_waittime 991 samples [usec] 31 5949 371017 322109413 req_active 991 samples [reqs] 1 9 2132 6404 mds_close 1 samples [usec] 126 126 126 15876 mds_reint 257 samples [usec] 168 5949 173654 221727790 mds_readpage 132 samples [usec] 158 2173 44232 21316906 mds_getxattr 60 samples [usec] 31 345 5769 828911 ldlm_enqueue 423 samples [usec] 44 2496 123479 70146809 ldlm_cancel 118 samples [usec] 65 891 23757 8073121 snapshot_time 1397507954.949096 secs.usecs req_waittime 1 samples [usec] 108 108 108 11664 req_active 1 samples [reqs] 1 1 1 1 ldlm_cancel 1 samples [usec] 108 108 108 11664
(If you noticed that ldlm_enqueue and mds_reint are present, it's because I used http://review.whamcloud.com/#/c/6223/1 which is awesome and has been landable for nearly one year (still on patch set 1), but nobody ever reviews it.)
Attachments
Issue Links
- duplicates
-
LU-3308 large readdir chunk size slows unlink/"rm -r" performance
-
- Reopened
-
- is related to
-
LU-4902 Do not require rpc_lock for readdir
-
- Resolved
-
-
LU-5232 cache directory contents on file descriptor on lock revocation
-
- Resolved
-
- is related to
-
LU-4367 unlink performance regression on lustre-2.5.52 client
-
- Resolved
-
The statahead behaviour has been changed after moving the directory page cache from LLITE to MDC for striped directory.
Originally, the directory page cache is in LLITE, and the ldlm lock is only held when fetch the directory page from MDS. After that the directory ldlm lock will be released. And then the statahead thread will traversal the directory page without the directory ldlm lock, even if someone is performing "rm -rf" and cancel (or ELC) the directory ldlm lock from the client, the statahead still holds the page reference. It is not a serious issue for statahead to get "-ENOENT" because of some name entry has been removed, because it is statahead internal failure and invisible to the applications. The statahead thread still can go ahead to fee-fetch as much as possible. So there is no much directory ldlm lock ping-pong.
But the situation has changed after moving the directory page to MDC. In current implementation, when the statahead traversal the directory, it does not hold the page reference, it needs to verify the directory ldlm lock for every name entry. That means there will be the directory ldlm lock ping-pong between the statahead thread and the "rm -rf" thread for every name entry.
Summary, the original directory ldlm lock ping-pong is per page, the current directory ldlm lock ping-pong is per name entry. That is why the bad performance.