[LU-14459] DNE3: directory auto split during create Created: 19/Feb/21  Updated: 25/Aug/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major
Reporter: Andreas Dilger Assignee: Lai Siyao
Resolution: Unresolved Votes: 0
Labels: dne3

Issue Links:
Cloners
Duplicate
duplicates LU-14467 Allow split metadata across MDTs duri... Resolved
Related
is related to LU-14146 Massive directory metadata operation ... Open
is related to LU-11025 DNE3: directory restripe Resolved
is related to LU-14464 Auto restripe triggers with none of t... Open
is related to LU-14466 metadata performance slows if the met... Open
is related to LU-15692 performance regressions for files in ... Resolved
is related to LU-15720 imbalanced file creation in 'crush' s... Resolved
is related to LU-15502 mkdir returns -EBADF if default LMV i... Resolved
Rank (Obsolete): 9223372036854775807

 Description   

Directory auto-split should be done when the dir_split_count is hit during file creation to minimize the number of entries that need to be moved, rather than afterward. Otherwise, the entries are being created on a single MDT first, and only being migrated on a later directory access. This is sub-optimal for two reasons:

  • the initial file creation will be limited to a single MDT and not distribute the load across multiple MDTs, so just entry migration will still leave all the inodes on the original MDT
  • the post-creation migration will need to move many more files/entries to the other MDTs than if the files are created on the correct MDT to start with


 Comments   
Comment by Gerrit Updater [ 22/Feb/21 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41713
Subject: LU-14459 mdt: autosplit directory during create
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: bcdd4c972b8e8b3a2f53158740a7ff3999c895f0

Comment by Andreas Dilger [ 22/Feb/21 ]

Lai, I started looking into this to see if it was easily implemented. I pushed the above patch with some preliminary changes, but it needs more work to be done properly.

It looks like la_dirent_count is only fetched once at getattr time, but then never updated when directories are added/removed. That needs to be updated during file creation/removal so that the count is at least somewhat accurate as the directory changes. If it simplifies implementation, then la_dirent_count doesn't need to be perfect (e.g. it could be checked only when the name hash % 64 == 0, or whatever), since a growing directory should eventually increase it enough to cause a split, even if the split is a bit late, and for large directories it will eventually hit the stripe count limit and then an accurate count again doesn't matter.

I wasn't sure exactly where to put the check for mdt_should_auto_split() and mdt_auto_split_add(). First I was thinking in mdt_create(), but that doesn't have la_dirent_count for parent, and adding a call to fetch it on each create seemed like a lot of added overhead. It might make sense to move la_dirent_count into struct mdt_object rather than struct lu_attrs, so that it is kept in memory with the directory instead of having to be fetched from the OSD repeatedly. The la_dirent_count could be fetched once for the mdt_object when it is first loaded, then updated in memory for each create/unlink in the directory (it isn't clear if splitting the directory is useful for hard links or not?).

Minor note - I see mot_auto_split_disabled in struct mdt_object, but it doesn't look like this is used anywhere and could be removed? Or is there intended to be a way to force a directory to not be split (e.g. attribute set on the directory or layout)?

Comment by Lai Siyao [ 22/Feb/21 ]

The dirent count is maintained in osd_object.oo_dirent_count, which will be updated in creation/removal.

The reason why it's not cached in mdt_object is that if the newly created sub file is remote object, its parent is remote, thus the cached dirent count can't be updated.

mot_auto_split_disabled is from legacy code, it's intended to stop retrying splitting a directory (maybe we should say both splitting and sub files migration) if some error occurred in auto-split, and it failed after several retries. But it's not implemented, and currently it will retry finish splitting directory once server notices an unfinished split.

Comment by Gerrit Updater [ 17/Mar/21 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/42064
Subject: LU-14459 mdt: trigger dir split in create/open
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0edbcc3ce6de3f2c625c5f3be6805565a81fab51

Comment by Gerrit Updater [ 13/Apr/21 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43289
Subject: LU-14459 llite: init stripe pfid after dir split
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 768bc1b9e93ebc5c3c53612287abd3ca136c622c

Comment by Gerrit Updater [ 13/Apr/21 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43290
Subject: LU-14459 mdt: restripe parent may be a stripe
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 93711b9e00d28d235c2b10409bf135eb6c6e5944

Comment by Gerrit Updater [ 13/Apr/21 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43291
Subject: LU-14459 mdt: support fixed directory layout
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0c0995c016c49f21b1dea938cdb71c6ccb7bf658

Comment by Gerrit Updater [ 13/May/21 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43684
Subject: LU-14459 lmv: change default hash type to crush
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: e859f57768b8b7ee2e44924c2b36e39a7f0f323a

Comment by Gerrit Updater [ 08/Jun/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43684/
Subject: LU-14459 lmv: change default hash type to crush
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: bb60caa1c6e7c14c201916dc0423442d10c86a27

Comment by Gerrit Updater [ 12/Jul/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43289/
Subject: LU-14459 llite: reset pfid after dir migration
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: abbe545a63b304e803ee62443dd65f1feeed15cd

Comment by Gerrit Updater [ 12/Jul/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43290/
Subject: LU-14459 mdt: restripe parent may be a stripe
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: a84efc8607ae8057499a8800699f336e821b03d8

Comment by Gerrit Updater [ 12/Jul/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43291/
Subject: LU-14459 mdt: support fixed directory layout
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 4c2514f4832801374092f3a48c755248af345566

Comment by Andreas Dilger [ 01/Feb/22 ]

The performance impact of the auto-split during mdtest is too high, since this moves all of the entries while the benchmark is running, and DNE distributed transactions cause a lot of sync operations. It would be better to delay the entry move, like put migrations for directory split onto a queue, and only move those entries when the RPC threads are idle (maybe with a "low priority" RPC queue that is only handled when high/normal RPCs are finished?).

That would avoid blocking the benchmark RPCs, but still split the directory early so that the metadata IOPS of the directory increase during the test, and "most" entries are created on the right MDT, with fewer entries need to be moved later on.

Generated at Sat Feb 10 03:09:57 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.