Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.6.0
-
3
-
13548
Description
On 2.5.57-79-ge7f99a5 I see that rm -rf on a directory with n regular files causes O(n) MDS_READPAGE requests to be sent. I ran the following on 2.4.3 and master:
export MOUNT_2=y llmount.sh cd /mnt/lustre echo clear | tee /proc/fs/lustre/mdc/*/stats mkdir d touch d/{0..255} cat /proc/fs/lustre/mdc/*/stats cd /mnt/lustre2 echo clear | tee /proc/fs/lustre/mdc/*/stats rm -rf d cat /proc/fs/lustre/mdc/*/stats
On 2.4.3:
## mkdir and touch req_waittime 773 samples [usec] 65 55988 339933 3295555639 req_active 773 samples [reqs] 1 1 773 773 mds_close 256 samples [usec] 65 486 58958 16078566 mds_reint 257 samples [usec] 93 826 74764 29250026 ldlm_enqueue 259 samples [usec] 191 1423 150223 115570903 seq_query 1 samples [usec] 55988 55988 55988 3134656144 ## rm -rf snapshot_time 1397509895.872324 secs.usecs req_waittime 258 samples [usec] 50 1054 32830 6165600 req_active 258 samples [reqs] 1 1 258 258 ldlm_cancel 258 samples [usec] 50 1054 32830 6165600 snapshot_time 1397509895.872354 secs.usecs req_waittime 524 samples [usec] 45 9854 212336 225452482 req_active 524 samples [reqs] 1 4 876 1660 mds_close 1 samples [usec] 390 390 390 152100 mds_reint 257 samples [usec] 331 9854 151751 209416267 mds_readpage 3 samples [usec] 271 323 902 272634 ldlm_enqueue 261 samples [usec] 45 736 59120 15595504 ldlm_cancel 2 samples [usec] 64 109 173 15977
On master:
## mkdir and touch: snapshot_time 1397507941.992796 secs.usecs snapshot_time 1397507941.992828 secs.usecs req_waittime 1282 samples [usec] 50 2674 364172 203043372 req_active 1282 samples [reqs] 1 1 1282 1282 mds_close 256 samples [usec] 61 640 50251 15984775 mds_reint 257 samples [usec] 89 1045 60223 21230981 mds_getxattr 256 samples [usec] 50 658 38276 9345158 ldlm_enqueue 513 samples [usec] 82 2674 215422 156482458 ## rm -rf snapshot_time 1397507954.948995 secs.usecs req_waittime 991 samples [usec] 31 5949 371017 322109413 req_active 991 samples [reqs] 1 9 2132 6404 mds_close 1 samples [usec] 126 126 126 15876 mds_reint 257 samples [usec] 168 5949 173654 221727790 mds_readpage 132 samples [usec] 158 2173 44232 21316906 mds_getxattr 60 samples [usec] 31 345 5769 828911 ldlm_enqueue 423 samples [usec] 44 2496 123479 70146809 ldlm_cancel 118 samples [usec] 65 891 23757 8073121 snapshot_time 1397507954.949096 secs.usecs req_waittime 1 samples [usec] 108 108 108 11664 req_active 1 samples [reqs] 1 1 1 1 ldlm_cancel 1 samples [usec] 108 108 108 11664
(If you noticed that ldlm_enqueue and mds_reint are present, it's because I used http://review.whamcloud.com/#/c/6223/1 which is awesome and has been landable for nearly one year (still on patch set 1), but nobody ever reviews it.)
Attachments
Issue Links
- duplicates
-
LU-3308 large readdir chunk size slows unlink/"rm -r" performance
- Reopened
- is related to
-
LU-4902 Do not require rpc_lock for readdir
- Resolved
-
LU-5232 cache directory contents on file descriptor on lock revocation
- Resolved
- is related to
-
LU-4367 unlink performance regression on lustre-2.5.52 client
- Resolved