[LU-4684] DNE3: allow migrating DNE striped directory Created: 28/Feb/14  Updated: 16/May/22  Resolved: 01/Apr/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0, Lustre 2.7.0, Lustre 2.8.0
Fix Version/s: Lustre 2.12.0, Lustre 2.14.0

Type: Improvement Priority: Major
Reporter: Andreas Dilger Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: dne3

Issue Links:
Blocker
is blocked by LU-7357 Add layout lock for striped directories. Resolved
Duplicate
is duplicated by LU-11191 Striped DIR accross 2 MDTs very slow Resolved
is duplicated by LU-4029 DNE (distributed namespace) and legac... Resolved
Related
is related to LU-4876 LFSCK remove entry from /REMOTE_PAREN... Open
is related to LU-5550 LFSCK 5: Migrate the name entry from ... Open
is related to LU-11611 Incorrect return value in mdt_reint_u... Resolved
is related to LU-6515 Always true check in mdd_migrate_create Resolved
is related to LU-11025 DNE3: directory restripe Resolved
is related to LU-10329 DNE3: REMOTE_PARENT_DIR scalability Open
is related to LUDOC-395 Add documentation for DNE striped dir... Resolved
is related to LU-3531 DNE2: striped directory Resolved
Rank (Obsolete): 12872

 Description   

The lfs migrate patch http://review.whamcloud.com/6662 will only allow migrating from a 1-stripe directory to another 1-stripe directory. That is unfortunate given that DNE2 is implementing striped directories. At a minimum this limitation needs to be documented for lfs migrate in the 2.6.0 release.

I think that implementing migration from one striped directory to another striped directory (restripe) is not very much more complex. Essentially, walk the source directory(ies), hash the name, migrate it to the correct target directory stripe. For migration, the "right" directory will never be the source directory, but for restriping a single directory to multiple, some of the entries will already be in the right place (stripe 0) and others would need to move to stripe N based on the hash. If changing from M to N stripes or changing the hash function, 1/M entries would already be on the right MDT and wouldn't need to be moved.



 Comments   
Comment by Di Wang [ 14/Jul/15 ]

I probably can not fix this in 2.8, so move it to 2.9?

Comment by Andreas Dilger [ 09/Sep/16 ]

Assign to Lai for now, you can work out with Fan Yong if he has more time for this one (after the ZFS OI Scrub is done?)

Comment by Gerrit Updater [ 01/Apr/17 ]

Lai Siyao (lai.siyao@intel.com) uploaded a new patch: https://review.whamcloud.com/26297
Subject: LU-4684 dne: support migrating striped directory
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 19779a48fe647aa1a420e396ce2720dc33219a4b

Comment by Andreas Dilger [ 11/Apr/17 ]

Directory migration should be implemented very similar to directory restriping, so that they can share the same code. Essentially, create a new layout for the directory that includes old and new shards (directory stripes), then iterate over all of the entries and move them to the shard that they belong to in the new layout. This ensures that if the (long and not atomic) operation is interrupted in the middle all of the entries can still be found in either the old or new location (which is the reason for LMV_HASH_FLAG_MIGRATION - so that the client will try all shards to find the entry) since the entries will always remain in the one directory layout even during movement. It would even allow migration to happen while the directory is being accessed, since we only need to lock one entry exclusively at a time.

With directory restriping, only the existing name entries (not inodes) would be moved to the new shards, and newly-created entries would be created on the correct MDTs. When moving from a non-striped directory to a striped directory, the non-striped directory would become the master entry, and it would gain new shards to hold all of the name entries. When moving from one stripe count to another (or changing hash functions), then new shards would be added at the start of the migration operation, all entries are migrated according to the new hash function/modulus, then if the stripe count is reduced then the (verified-to-be-empty) old shard(s) would be removed.

With directory migration the hash function would need to be enhanced to add new shard(s) at the end and store an "offset" or similar parameter in the LMV EA, or otherwise flag the old shards as "do not use", and restripe from the source shards to the target shards.

All of these operations would essentially be handled by introducing a new hash function/flags that properly maps entries from one set of shards to another. Either redistributing entries evenly across shards in the restriping case, or leaving one or more shards empty in the migration case.

We might consider optimizing the case of removing only a single shard (e.g. MDT removal) during migration, where the new shard is added at the end, all entries are moved from the old shard to the new one, and then the new shard replaces the old one in the layout. During this time, the hash function should keep names mapped to the other existing shards unchanged so that they do not all need to be migrated. Otherwise, all name entries in this directory will become remote and hurt access performance.

The inodes themselves would also need to be moved during migration (not necessarily restriping, if we do this only when there are a relatively small number of entries in the directory, say < 32000), but inode migration is largely independent of how the directory layout causes name entries to be hashed. Essentially, the DNE code would be told whether only the name is being migrated, or if the whole inode needs to be migrated also. The name migration can happen in one step, then the inode migration can happen in a second step (either interleaved with name migration, or as a second step after all names have been moved.

Comment by Gerrit Updater [ 27/Oct/17 ]

Lai Siyao (lai.siyao@intel.com) uploaded a new patch: https://review.whamcloud.com/29822
Subject: LU-4684 lod: store dir stripe fids in lmm
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 07af57e02f1dfaf502daa03fb26652a546dc4ed1

Comment by Gerrit Updater [ 27/Feb/18 ]

Lai Siyao (lai.siyao@intel.com) uploaded a new patch: https://review.whamcloud.com/31424
Subject: LU-4684 dne: pack lmv ea in migrate rpc
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 38179cf0f839d13196afbe2cf03a1d87d7d19098

Comment by Gerrit Updater [ 27/Feb/18 ]

Lai Siyao (lai.siyao@intel.com) uploaded a new patch: https://review.whamcloud.com/31425
Subject: LU-4684 mdt: improve directory stripe lock
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 1d060d3815496b9006aaa24d74e7f55f12c42292

Comment by Gerrit Updater [ 27/Feb/18 ]

Lai Siyao (lai.siyao@intel.com) uploaded a new patch: https://review.whamcloud.com/31426
Subject: LU-4684 xattr: add list support for remote object
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: aa371d40fe401cbffaf3829ba6d3c72162214528

Comment by Gerrit Updater [ 27/Feb/18 ]

Lai Siyao (lai.siyao@intel.com) uploaded a new patch: https://review.whamcloud.com/31427
Subject: LU-4684 dne: migrate striped directory
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4b39b37e9e0cb2ce9b4f0653f1658f271066f2ae

Comment by Gerrit Updater [ 03/Mar/18 ]

Lai Siyao (lai.siyao@intel.com) uploaded a new patch: https://review.whamcloud.com/31504
Subject: LU-4684 lmv: support accessing migrating directory
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 39b623fb03ee32fa416b63707e2236aa063dbf39

Comment by Gerrit Updater [ 13/Mar/18 ]

Lai Siyao (lai.siyao@intel.com) uploaded a new patch: https://review.whamcloud.com/31626
Subject: LU-4684 migrate: shrink dir layout after migration
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 299e2f67b285fbec41c8d6745a0cf8a32af5690e

Comment by Gerrit Updater [ 09/Apr/18 ]

Lai Siyao (lai.siyao@intel.com) uploaded a new patch: https://review.whamcloud.com/31914
Subject: LU-4684 ptlrpc: add dir migration connect flag
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 89459fc76b140ec6910a7bf46f1401a7c06c13dc

Comment by Gerrit Updater [ 06/May/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31914/
Subject: LU-4684 ptlrpc: add dir migration connect flag
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 14b98596fa2433b6c8a8cce3ee5d0ddefab5982e

Comment by Gerrit Updater [ 11/May/18 ]

Lai Siyao (lai.siyao@intel.com) uploaded a new patch: https://review.whamcloud.com/32356
Subject: LU-4684 test: improve dir migrate test in racer
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 2a1e530bb5a03bd17a2899ca824060d7799b1a50

Comment by Gerrit Updater [ 17/May/18 ]

Lai Siyao (lai.siyao@intel.com) uploaded a new patch: https://review.whamcloud.com/32445
Subject: LU-4684 debug: dump objs in mdt_fini()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7af336e896416ee0a6492fe8fadab31998578538

Comment by Gerrit Updater [ 29/May/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31425/
Subject: LU-4684 mdt: improve directory stripe lock
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 18aee6838907192c03c5f70e88624686c1c074da

Comment by Gerrit Updater [ 29/May/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31426/
Subject: LU-4684 xattr: add list support for remote object
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 4325b1e456647f519a0eca32204554e0c358646f

Comment by Gerrit Updater [ 31/May/18 ]

Lai Siyao (lai.siyao@intel.com) uploaded a new patch: https://review.whamcloud.com/32589
Subject: LU-4684 lod: missing lock in lod_index_try()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 81e60c65f7ed8b4faef2a56ef65657b1a340b3de

Comment by Gerrit Updater [ 12/Jun/18 ]

Lai Siyao (lai.siyao@intel.com) uploaded a new patch: https://review.whamcloud.com/32701
Subject: LU-4684 mdt: rename may cause deadlock
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: eff8d5bc1be10418fbcb056cef4bdc2f993d5285

Comment by Gerrit Updater [ 07/Aug/18 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/32946
Subject: LU-4684 llite: add lock for dir layout data
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 3af345d83520bc9e485f44d1e6c00262df700c4e

Comment by Gerrit Updater [ 07/Aug/18 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/32947
Subject: LU-4684 migrate: small fixes
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a0a8e19de507d9ee90d0b6c0798e9014d8e1368a

Comment by Gerrit Updater [ 09/Aug/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/31424/
Subject: LU-4684 migrate: pack lmv ea in migrate rpc
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 470bdeec6ca5b4c68f456a10d68511653e67b378

Comment by Gerrit Updater [ 17/Sep/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/31427/
Subject: LU-4684 migrate: migrate striped directory
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 169738e30a7e0b57f27a517d78d2c928b3bb0f5c

Comment by Gerrit Updater [ 17/Sep/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/31626/
Subject: LU-4684 migrate: shrink dir layout after migration
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 0a83d948f37bec7fca6e9aa30f59f26354273b23

Comment by Gerrit Updater [ 01/Oct/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/31504/
Subject: LU-4684 lmv: support accessing migrating directory
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 976b609abcdf5adc29f55b742ff6b1307b2b6484

Comment by Gerrit Updater [ 09/Oct/18 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33324
Subject: LU-4684 migrate: replace PFID via source
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9a42c69380a905c7b9aad85dc70f9afad0ec2ac6

Comment by Gerrit Updater [ 09/Oct/18 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33325
Subject: LU-4684 migrate: link parents lock may deadlock
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a410f20e682267e20e30eaab6aee299548322837

Comment by Gerrit Updater [ 23/Oct/18 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33456
Subject: LU-4684 lod: parse layout for migrating directory
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 515953a6ece1deb01720e63145e07e62add5f553

Comment by Gerrit Updater [ 29/Oct/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33324/
Subject: LU-4684 migrate: replace PFID via source
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: bd596fbe449f4fbab18ed184ccce1e141928b116

Comment by Gerrit Updater [ 02/Nov/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33456/
Subject: LU-4684 lod: parse layout for migrating directory
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 83aa8ffcf2e8a5b7da9ead941ed0b22a435d6790

Comment by Gerrit Updater [ 02/Nov/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32946/
Subject: LU-4684 llite: add lock for dir layout data
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ae828cd3b092a38adbc86a4da320dd9d3a0fc80c

Comment by Peter Jones [ 02/Nov/18 ]

Landed for 2.12

Comment by Andreas Dilger [ 27/Aug/19 ]

This ticket is still listed as the reason racer.sh is skipping directory migration.

Either directory migration is working and can be enabled in racer.sh and remove LU-4684 from the comment, or a new ticket should be filed (and marked always_except) with the issue(s) and a patch submitted to change racer.sh to list that ticket as the reason it is disabled and the always_except label removed from this ticket.

Comment by Gerrit Updater [ 28/Jan/21 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41359
Subject: LU-4684 tests: enable racer directory migration
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 51f4f706aa94538af7d1703a4e46b47579b0cad6

Comment by Gerrit Updater [ 28/Jan/21 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41360
Subject: LU-4684 tests: enable racer directory migration
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 1c412393a42d454aa949fb0d87e788056d771c75

Comment by Gerrit Updater [ 08/Jul/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/41359/
Subject: LU-4684 tests: enable racer directory migration
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 3070ca9b18206025d9fd55817bf4da1ec486b6be

Generated at Sat Feb 10 01:44:56 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.