[LU-11025] DNE3: directory restripe Created: 16/May/18  Updated: 21/Nov/23  Due: 16/Dec/18  Resolved: 17/Feb/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0
Fix Version/s: Lustre 2.14.0

Type: New Feature Priority: Major
Reporter: Lai Siyao Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: dne3

Issue Links:
Gantt End to Start
has to be done after LU-10329 DNE3: REMOTE_PARENT_DIR scalability Open
Related
is related to LU-17307 osd_dirent_count() keeps multiple thr... Resolved
is related to LU-13481 sanity test_33h: MDT index mismatch 5... Resolved
is related to LU-7607 Preserve inode number after MDT migra... Open
is related to LU-4684 DNE3: allow migrating DNE striped dir... Resolved
is related to LU-7749 DNE3: migrated orphan survive till ne... Open
is related to LU-15692 performance regressions for files in ... Resolved
is related to LU-15720 imbalanced file creation in 'crush' s... Resolved
is related to LU-13424 unable to migrate mirrored files Resolved
is related to LU-13406 set_param: param_path 'lod/*/mdt_hash... Resolved
is related to LU-8698 DNE3: check unknown hash type properl... Resolved
is related to LU-13522 mdt.mdt_mds_mds_conns is not counted ... Resolved
is related to LU-14459 DNE3: directory auto split during create Open
is related to LU-13691 Allow for lfs migrate between MDTs to... Open
is related to LUDOC-462 Add feature documentation for directo... Resolved
is related to LU-13417 DNE3: mkdir() automatically create re... Resolved
is related to LU-12867 DNE3: new DNE2 hash function to handl... Resolved
Epic/Theme: dne3
Rank (Obsolete): 9223372036854775807

 Description   

In DNE system, MDTs are likely to become imbalanced over time, and user may also add/remove MDTs. So there is a need to move load from one MDT to anothe, this is what dir restripe can do.

Directory restripe should meet below requirements:

  • After directory restripe, the load (disk usage, incoming requests) should be shared fairly between MDTs.
  • In the process of directory restriping, the directory should be accessed without problem.
  • Move as little data as necessary to make restripe quick and to minimize system impact.
  • Keep object fid unchanged to support NFS on Lustre.

Functional spec:

  • A new hash type will be introduced, directory with this hash type should move as little data as necessary in each restripe. To achieve this, directory with this hash type doesn't hash(name) mod stripe_count, but a fixed number like maximum MDT count, so that each restripe won't cause file hash change. And unlike old striped directory, each stripe contains a range of numbers of between [0, maximum_MDT_count], e.g, [0, 99], and each restripe will split this stripe into two, and each with half the range, i.e, [0, 49] and [50, 99], that means only half of the files needs to be moved.
  • Directory restripe will be done automatically, when MDT finds a directory or stripe is growing fast, it will put this directory or stripe into a global list, and a dedicated thread will scan this list and split directories or stripes in this list. Stripe merge should be similar, when a stripe size is shrinking below a limit, it will be merged to its previous stripe.
  • Unlike directory migration, restripe doesn't move file inode, but only dirent. So after restripe, half of the files under this stripe will become remote objects.


 Comments   
Comment by Andreas Dilger [ 16/May/18 ]

A new hash type will be introduced, directory with this hash type should move as little data as necessary in each restripe. To achieve this, directory with this hash type doesn't hash(name) % stripe_count, but a fixed number like maximum MDT count, so that each restripe won't cause file hash change. And unlike old striped directory, each stripe contains a range of numbers of between [0, maximum_MDT_count], e.g, [0, 99], and each restripe will split this stripe into two, and each with half the range, i.e, [0, 49] and [50, 99], that means only half of the files needs to be moved.

Could you please describe this approach more completely? Let's say we pick an arbitrary upper limit of LMV_MAX_STRIPE_COUNT = 256 stripes per directory (which seems quite reasonable). The client computes idx = hash(name) % LMV_MAX_STRIPE_COUNT to get an idx value 0, 255, and what does it do with idx after this? If there is a single shard, idx is irrelevant, and the file goes on the master MDT. If the directory doubles in size and there are now 1+1= 2 shards, then the hash is split into 2 parts 0, 127 and 128, 255 and put onto the 2 shards, moving 1/2 of the existing entries to the new shard? If directory doubles in size again and there are 2+2=4 shards, then the hash is split into 4 parts 0, 63, 64, 127, 128, 191, 192, 255 and put onto the 4 shards, again moving 1/2 of each existing shard onto the new shards? If we assume that a growing directory will average about 3/4 full shards at any time (i.e. half way between previous split and next split), then it will have on average 2/3 remote entries (migrated to the shard when it was split) and 1/3 local entries (created locally after the shard was split).

Comment by Lai Siyao [ 17/May/18 ]

Yes, it's just like what you described. So we track the usage of directory, if it's growing fast, then we split stripe fast, so that it can reach max stripes (MDT count, or a limit we set in the beginning) soon. This may be able to avoid too much remote objects, but if a directory is not growing fast, and it's becoming large over long time, it will contain 2/3 remote objects like your described.

Comment by Lai Siyao [ 17/May/18 ]

And if the system doesn't need to support NFS on Lustre, there should be an option to allow restripe move both inode and dirent upon stripe split.

Comment by Gerrit Updater [ 18/Nov/19 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36774
Subject: LU-11025 uapi: introduce OBD_CONNECT2_CONSISTENT_HASH
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c7e3a81b51a37dbb07e167f517ff6989f8f91f2e

Comment by Gerrit Updater [ 18/Nov/19 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36775
Subject: LU-11025 dne: introduce consistent hashed striped directory
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 89425b7ae5540456744892c5174c5efe7a338172

Comment by Gerrit Updater [ 26/Nov/19 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36864
Subject: LU-11025 dne: refactor dir migration
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 08039913d01bde64e57c47cc80c9225c0adf6e13

Comment by Gerrit Updater [ 26/Nov/19 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36865
Subject: LU-11025 lod: refactor lod_mdt_alloc_qos/rr()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6ed0371345132c2b002b564fe7ac09c6b1ec9955

Comment by Gerrit Updater [ 30/Nov/19 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36898
Subject: LU-11025 dne: support directory split/merge
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a7b8a3ddb03ad57da875c0e58d49c0df9b8a9a6f

Comment by Gerrit Updater [ 14/Dec/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36774/
Subject: LU-11025 uapi: introduce OBD_CONNECT2_CRUSH
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: dbafa9df0f8f72ff8849af9066eac46a2c980e9f

Comment by Gerrit Updater [ 20/Jan/20 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37281
Subject: LU-11025 mdt: don't save remote lock if op not from client
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5463bdb73f5b70c201f619f60c79c512e22d9f00

Comment by Gerrit Updater [ 20/Jan/20 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37282
Subject: LU-11025 obd: add MDD/LOD o_fid_alloc
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 116ad977447d4c72821ce48d6ebd3cd3f60525cf

Comment by Gerrit Updater [ 20/Jan/20 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37283
Subject: LU-11025 mdt: normalize temp XATTR buffer usage
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 76214bf426cc7b431e52418f24f1edb1dcb6fd49

Comment by Gerrit Updater [ 20/Jan/20 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37284
Subject: LU-11025 dne: directory auto split
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b6bfdc9ebef3e1d534d7751f52cac96a5d2946c1

Comment by Gerrit Updater [ 20/Feb/20 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37634
Subject: LU-11025 dne: change dir layout via dt_layout_change
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9f7dfcd99a9d7f759aece9c9f4ff4fca9f5f092f

Comment by Gerrit Updater [ 25/Feb/20 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37711
Subject: LU-11025 lmv: simplify name to stripe mapping
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c2932477544c2c170a786abce41c0571e7a91ad7

Comment by Gerrit Updater [ 25/Feb/20 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37712
Subject: LU-11025 dne: refactor dir migration code
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 04c7ae1b6d155deb487ccf0de11f42b7225f5c55

Comment by Gerrit Updater [ 25/Feb/20 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37713
Subject: LU-11025 dne: support non-recursive mode migration
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 462c429409b3bc306fd029028fff48cbe8c3b61d

Comment by Gerrit Updater [ 27/Feb/20 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37752
Subject: LU-11025 dne: introduce new directory hash type: "crush"
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 3615e347a6dd95b7494dcebc3e3a576f6f9fd105

Comment by Gerrit Updater [ 29/Mar/20 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38097
Subject: LU-11025 osd: osd_attr_get() returns dirent count
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c2bd214e1b813db2131d3b05241b6312d9d3711d

Comment by Gerrit Updater [ 31/Mar/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36775/
Subject: LU-11025 dne: introduce new directory hash type: "crush"
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 0a1cf8da806962d663f23ed813764e4011a36ee7

Comment by Gerrit Updater [ 31/Mar/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37634/
Subject: LU-11025 dne: change dir layout via dt_layout_change
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b2eef3dd2f042b629b8bb25dc963ad5c7da86f22

Comment by Gerrit Updater [ 31/Mar/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37711/
Subject: LU-11025 lmv: simplify name to stripe mapping
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 703afd15fc939ad4f425364ddc033080630c0127

Comment by Gerrit Updater [ 01/Apr/20 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38107
Subject: LU-11025 tests: only set crush for newer builds
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 856ac5e390b8b94edc933d98f108aae20f881de5

Comment by Gerrit Updater [ 03/Apr/20 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38135
Subject: LU-11025 dne: add FID mapping interfaces
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 13c24fc21cae66a8351d9522d4f6f2a4737dd350

Comment by Gerrit Updater [ 14/Apr/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38107/
Subject: LU-11025 tests: only set crush for newer builds
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 95211684d4bde4c4009ec9184956aff61658b735

Comment by Gerrit Updater [ 15/Apr/20 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38232
Subject: LU-11025 uapi: add OBD_CONNECT2_FIDMAP
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 923d34249dd27d84036a8a0529f504d0af9f2428

Comment by Gerrit Updater [ 15/Apr/20 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38233
Subject: LU-11025 dne: support FID map
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6d811adae029c6348664acfcd92396f4de57d1a8

Comment by Gerrit Updater [ 20/Apr/20 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38285
Subject: LU-11025 mdd: add fidmap reclaim thread
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 2eb4307f3925128e8fd8210d147bb72ee0e26de9

Comment by Gerrit Updater [ 23/Apr/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37712/
Subject: LU-11025 dne: refactor dir migration
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 3f608461b387df056c9563d4c2879b05fb54a5a5

Comment by Gerrit Updater [ 07/May/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36865/
Subject: LU-11025 lod: refactor lod_mdt_alloc_qos/rr()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e284bfbe78d008ebfaae14878893bc7200338f83

Comment by Gerrit Updater [ 07/May/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37281/
Subject: LU-11025 mdt: don't save remote lock if op not from client
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e496dbf7ed9eb64d347ac203873713531a3fbe59

Comment by Gerrit Updater [ 07/May/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38232/
Subject: LU-11025 uapi: add OBD_CONNECT2_FIDMAP
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: c96fa612d5b0f3218642052c8ae1918883267c61

Comment by Gerrit Updater [ 20/May/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37282/
Subject: LU-11025 obdclass: add lu_device_operations::ldo_fid_alloc()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 736d2d62ab1f00926000f0c3aa31fcb6aa53050f

Comment by Gerrit Updater [ 20/May/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38097/
Subject: LU-11025 osd: osd_attr_get() returns dirent count
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 03a4431dac1c59fa2b98501fc7dfb8451a0a2af8

Comment by Gerrit Updater [ 20/May/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36898/
Subject: LU-11025 dne: support directory restripe
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 2e2b16c28bcf4048ba4f34129b7fb91c36b55a71

Comment by Andreas Dilger [ 20/May/20 ]

Lai, I think it makes sense to move the FIDMAP patches over to their own LU ticket. I don't think that this functionality is critical for 2.14, and it will simplify their tracking. I don't think that DNE directory auto-split will move all of the inodes by default, only the directory entries, and FIDMAP is only needed in the case of inode migration (which should not be the default).

Comment by Gerrit Updater [ 02/Jun/20 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38801
Subject: LU-11025 mdt: remove unused code
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a35065a10025b6f41e2ae6f8336f1ab984b4e0f6

Comment by Gerrit Updater [ 02/Jun/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37284/
Subject: LU-11025 dne: directory restripe and auto split
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: a336d7c7c1cd62a5a5213835aa85b8eaa87b076a

Comment by Gerrit Updater [ 11/Jun/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38801/
Subject: LU-11025 mdt: remove unused code
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f3fef81c7d7656160d6db6f542dd5178974fab78

Generated at Sat Feb 10 02:40:17 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.