[LU-10329] DNE3: REMOTE_PARENT_DIR scalability Created: 05/Dec/17 Updated: 10/Aug/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Improvement | Priority: | Critical |
| Reporter: | Andreas Dilger | Assignee: | Lai Siyao |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | LMR, dne3, medium | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Epic Link: | MDT rebalance v3 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
For DNE filesystems where there are large numbers of remote entries, for example once In order to limit contention and scaling issues in REMOTE_PARENT_DIR it makes sense to have multiple such directories. As a starting point, one REMOTE_PARENT_DIR_MDTxxxx for each remote MDT would be useful, but it may be necessary to have a tree of directories similar to the O/<seq>/dN object directories. Having a separate REMOTE_PARENT_DIR_MDTxxxx per MDT would also allow LFSCK to efficiently scan remote entries for a given MDT, if there was a problem (e.g. MDT was marked offline and returned into the namespace later). |
| Comments |
| Comment by Andreas Dilger [ 18/Oct/18 ] |
|
This is needed before |
| Comment by Andreas Dilger [ 30/Jul/20 ] |
|
It seems that the large directory patches have landed, so the need for this is reduced. However, it would still be useful to have a per-MDT REMOTE_PARENT_DIR_MDTxxxx directory, since that would simplify things like LFSCK checking if one MDT was offline/corrupted/removed. It also provides a point of scaling so that there can be different updates between pairs of MDTs that do not contend for the same locks. |
| Comment by Andreas Dilger [ 02/Dec/21 ] |
|
In addition to improving performance by having multiple REMOTE_PARENT_DIR_MDTxxxx directories, another significant benefit would be reduced risk from filesystem corruption. An MDT with 60M entries in REMOTE_PARENT_DIR is currently running e2fsck because of a problem with that directory, with an unknown ETA, so having multiple independent directories would reduce risk and the number of entries that need to be repaired significantly, and potentially improve the scalability of parallel pass2/pass3 operations as well. |
| Comment by Andreas Dilger [ 03/Apr/23 ] |
|
For upgrading MDTs that are currently using a single REMOTE_PARENT_DIR, I think a two-stage approach could be used. Initially, add "read-only" support for such directories into 2.16.0 to allow for downgrade:
Then in 2.17+ start using those directories (which would be incompatible for pre-2.16 servers, or servers not patched as above):
Having a two-stage update process like this, and minimizing the "read-only" patches to simplify backport (e.g. to b2_15) will allow upgrade/downgrade without breaking the whole filesystem. One minor drawback of having separate directories would be that renaming files/subdirs from one remote MDT directory to another would also mean renaming the agent locally from one REMOTE_PARENT_DIR_MDTxxxx to another, but that is not worse than a local cross-directory rename, and only a small fraction of the overhead of the distributed rename itself (BFL, multiple MDT transactions, etc.). That local filesystem overhead is probably offset by not having a single huge REMOTE_PARENT_DIR to hold all of the remote links. |
| Comment by Lai Siyao [ 08/May/23 ] |
|
REMOTE_PARENT_DIR is used in osd layer only, which means it doesn't know which MDT its parent is located, but only knows that whether parent is on local MDT. So we can't create a subdir for each MDT, do you think it's okay to create 64 subdirs for all remote MDTs? |
| Comment by Andreas Dilger [ 08/May/23 ] |
|
There are definitely benefits to splitting the entries up by remote MDT. This could be used to quickly find/check/remove remote links for a specific MDT if needed (eg. MDT removed/lost with metadata redundancy). Does it make sense to lift REMOTE_PARENT_DIR out of the OSD and into MDD? |
| Comment by Gerrit Updater [ 10/Aug/23 ] |
|
"Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51910 |
| Comment by Gerrit Updater [ 10/Aug/23 ] |
|
"Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51911 |
| Comment by Gerrit Updater [ 10/Aug/23 ] |
|
"Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51912 |