[LU-14459] DNE3: directory auto split during create Created: 19/Feb/21 Updated: 25/Aug/22 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major |
| Reporter: | Andreas Dilger | Assignee: | Lai Siyao |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | dne3 | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Directory auto-split should be done when the dir_split_count is hit during file creation to minimize the number of entries that need to be moved, rather than afterward. Otherwise, the entries are being created on a single MDT first, and only being migrated on a later directory access. This is sub-optimal for two reasons:
|
| Comments |
| Comment by Gerrit Updater [ 22/Feb/21 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41713 |
| Comment by Andreas Dilger [ 22/Feb/21 ] |
|
Lai, I started looking into this to see if it was easily implemented. I pushed the above patch with some preliminary changes, but it needs more work to be done properly. It looks like la_dirent_count is only fetched once at getattr time, but then never updated when directories are added/removed. That needs to be updated during file creation/removal so that the count is at least somewhat accurate as the directory changes. If it simplifies implementation, then la_dirent_count doesn't need to be perfect (e.g. it could be checked only when the name hash % 64 == 0, or whatever), since a growing directory should eventually increase it enough to cause a split, even if the split is a bit late, and for large directories it will eventually hit the stripe count limit and then an accurate count again doesn't matter. I wasn't sure exactly where to put the check for mdt_should_auto_split() and mdt_auto_split_add(). First I was thinking in mdt_create(), but that doesn't have la_dirent_count for parent, and adding a call to fetch it on each create seemed like a lot of added overhead. It might make sense to move la_dirent_count into struct mdt_object rather than struct lu_attrs, so that it is kept in memory with the directory instead of having to be fetched from the OSD repeatedly. The la_dirent_count could be fetched once for the mdt_object when it is first loaded, then updated in memory for each create/unlink in the directory (it isn't clear if splitting the directory is useful for hard links or not?). Minor note - I see mot_auto_split_disabled in struct mdt_object, but it doesn't look like this is used anywhere and could be removed? Or is there intended to be a way to force a directory to not be split (e.g. attribute set on the directory or layout)? |
| Comment by Lai Siyao [ 22/Feb/21 ] |
|
The dirent count is maintained in osd_object.oo_dirent_count, which will be updated in creation/removal. The reason why it's not cached in mdt_object is that if the newly created sub file is remote object, its parent is remote, thus the cached dirent count can't be updated. mot_auto_split_disabled is from legacy code, it's intended to stop retrying splitting a directory (maybe we should say both splitting and sub files migration) if some error occurred in auto-split, and it failed after several retries. But it's not implemented, and currently it will retry finish splitting directory once server notices an unfinished split. |
| Comment by Gerrit Updater [ 17/Mar/21 ] |
|
Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/42064 |
| Comment by Gerrit Updater [ 13/Apr/21 ] |
|
Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43289 |
| Comment by Gerrit Updater [ 13/Apr/21 ] |
|
Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43290 |
| Comment by Gerrit Updater [ 13/Apr/21 ] |
|
Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43291 |
| Comment by Gerrit Updater [ 13/May/21 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43684 |
| Comment by Gerrit Updater [ 08/Jun/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43684/ |
| Comment by Gerrit Updater [ 12/Jul/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43289/ |
| Comment by Gerrit Updater [ 12/Jul/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43290/ |
| Comment by Gerrit Updater [ 12/Jul/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43291/ |
| Comment by Andreas Dilger [ 01/Feb/22 ] |
|
The performance impact of the auto-split during mdtest is too high, since this moves all of the entries while the benchmark is running, and DNE distributed transactions cause a lot of sync operations. It would be better to delay the entry move, like put migrations for directory split onto a queue, and only move those entries when the RPC threads are idle (maybe with a "low priority" RPC queue that is only handled when high/normal RPCs are finished?). That would avoid blocking the benchmark RPCs, but still split the directory early so that the metadata IOPS of the directory increase during the test, and "most" entries are created on the right MDT, with fewer entries need to be moved later on. |