Massive directory metadata operation performance decrease
(LU-14146)
|
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Technical task | Priority: | Minor |
| Reporter: | Gregoire Pichon | Assignee: | Hongchao Zhang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | dne3, patch, performance, recovery | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||||||||||||||
| Description |
|
This feature is a complement to the It will improve the performance of modify metadata cross-MDT operations, while ensuring the correctness of recovery handling (request resend, and MDT recovery). |
| Comments |
| Comment by Gregoire Pichon [ 17/Jul/15 ] |
|
I have updated the patch http://review.whamcloud.com/#/c/14375/ to implement this feature. |
| Comment by Peter Jones [ 17/Jul/15 ] |
|
Thanks Gregoire. This work is queued up for 2.9 |
| Comment by Gerrit Updater [ 15/May/18 ] |
|
Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/32412 |
| Comment by Andreas Dilger [ 15/May/18 ] |
|
Note that the above patch is not a replacement for Grégoire's patch. It is a code cleanup to move the MDC semaphore to the lustre/osp directory, and when http://review.whamcloud.com/14375 is updated that code should probably be removed. In the meantime we avoid allocating the mdc_rpc_lock for each client->MDT connection. |
| Comment by Gerrit Updater [ 01/Oct/18 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32412/ |
| Comment by James A Simmons [ 06/Jun/21 ] |
|
The 14375 patch has been revived. Currently I'm collecting data to see if it helps with DNE2 scaling issues. |
| Comment by James A Simmons [ 15/Jun/21 ] |
|
https://www.opensfs.org/wp-content/uploads/Evaluation-of-DoM-SNE-scaling_Simmons_revised051821.pdf Looking at the data compared to my LUG talk since it is the same hardware we see improvements in file removals and stats. I do see a regression in reads of small files. I don't know if that is due to recent changes or the patch itself at this point. This is using the new default settings of max_mod_rpcs_in_flight = 8. I need to bump it up to see if I can get more out of it. |
| Comment by Andreas Dilger [ 16/Jun/21 ] |
|
James, thanks for testing this out. The graphs would be much easier to compare if they had the before/after results for the same tests on a single graph. As for max_mod_rpcs_in_flight for the MDS, I think it is totally reasonable to increase this higher than 8, since we don't want clients to bottleneck on the MDSes if they are busy handling lots of client requests. The MDS has max_rpcs_in_flight=512, so it wouldn't be unreasonable to have the MDS tune this to match mds.MDS.mdt_out.threads_max. |
| Comment by Gerrit Updater [ 08/Mar/22 ] |
|
"Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46735 |
| Comment by Gerrit Updater [ 06/Jun/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/14375/ |
| Comment by Gerrit Updater [ 16/Jun/22 ] |
|
"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47656 |
| Comment by Gerrit Updater [ 20/Jun/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47656/ |