[LU-15383] DNE directories not connected to REMOTE_PARENT_DIR Created: 17/Dec/21  Updated: 03/Nov/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Andreas Dilger Assignee: Lai Siyao
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Cloners
Clones LU-15330 ext2fs_get_pathname() very slow for l... Open
Related
is related to LU-15388 Wrong dotdot FID parameter for osd_ad... Resolved
is related to LU-10329 DNE3: REMOTE_PARENT_DIR scalability Open
is related to LU-4876 LFSCK remove entry from /REMOTE_PAREN... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Running e2fsck on an MDT with a large number of striped directories shows a large number of disconnected directory entries. On several different systems this is showing a large number of errors during e2fsck, resulting in tens or hundreds of thousands of entries that need to be connected to list+found:

Unconnected directory inode 2102494 (/REMOTE_PARENT_DIR/???)
Connect to /lost+found? yes
Unconnected directory inode 2102510 (/REMOTE_PARENT_DIR/???)
Connect to /lost+found? yes
Unconnected directory inode 2102514 (/REMOTE_PARENT_DIR/???)
Connect to /lost+found? yes

This doesn't appear to be caused by filesystem errors, as it has been seen repeatedly, so it is more likely to be a bug in the code (eg. rename doesn't connect striped directories properly, maybe if the rename source was on a remote MDT and the target is on the local MDT, or the reverse, or during "lfs migrate -m" or similar.



 Comments   
Comment by Lai Siyao [ 21/Dec/21 ]

It looks LU-15388 explained this, and the patch there can fix this issue.

Comment by Andreas Dilger [ 21/Dec/21 ]

I don't know if that will help. The ".." FID is not checked by e2fsck, so there must still be some other inconsistency at the ldiskfs level that is causing e2fsck to report a problem.

Comment by Gerrit Updater [ 04/Jan/22 ]

"Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45961
Subject: LU-15383 osd-ldiskfs: check .. upon object destroy
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: aacc3718e81314c4157874dfbf438a8b696bbf56

Comment by Lai Siyao [ 11/Jan/22 ]

The test result shows this only happens in sanity 300p, which tests striped directory with -ENOSPC.

Comment by Andreas Dilger [ 31/Jan/22 ]

This might be related to LU-15388, since REMOTE_PARENT_DIR entries have the wrong ".." entry, so updates to that directory will be incorrect.

Generated at Sat Feb 10 03:17:50 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.