[LU-5626] Corruption of MDT “..” entry in non-HTree ldiskfs directories Created: 15/Sep/14 Updated: 02/Dec/15 Resolved: 07/Nov/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0, Lustre 2.4.3, Lustre 2.5.3 |
| Fix Version/s: | Lustre 2.7.0, Lustre 2.5.4 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Patrick Farrell (Inactive) | Assignee: | Bob Glossman (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | e2fsck, e2fsprogs, patch | ||
| Environment: |
Lustre 2.4 or newer, file system upgraded from 1.8 |
||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 15741 | ||||||||||||
| Description |
|
This is essentially exactly the same problem, but it manifests itself slightly different in non-HTree directories. The “..” entry must remain as the second entry in the directory block (FSCK demands this), and when a directory created under 1.8 (now on a 2.4+ server with dirdata enabled) is moved to a new parent, the “..” entry is updated. Exactly as happened in In the lucky case where space is already available in the second entry in a directory, the “..” entry is -recreated in the same place, FID attached. If not, it is created in the next available space of sufficient size. This causes complaints from FSCK, and when FSCK repairs this, it places the updated “..” immediately after “.” again, which causes it to overlap the next entry in the directory block. This entry - which is for a real user created file, not . or .. - is moved to Lost + found. This is because add_dirent_to_buf (used when not in a dx directory) has the same bug as “ldiskfs_update_dotdot”, which was fixed in I don’t have time at the moment to commit & update the new ldiskfs patch file to Gerrit, but I will do so shortly. In the meantime, I’m attaching the new patch file & the resulting namei.c to this bug. The patch is a bit ugly and could probably use improvement, but in my testing, it does fix the bug. I'll share replication details in a comment. One ‘technical debt’ problem with this patch: |
| Comments |
| Comment by Patrick Farrell (Inactive) [ 15/Sep/14 ] |
|
This problem can be reproduced by formatting a file system under 1.8 (or, probably, earlier versions of 2.x), creating a directory with at least one file in it, stopping the file system & adding the dirdata attribute to the MDT, then starting the same file system with 2.4 or newer (bug exists in master as well) and moving that directory to a new location. Running fsck will show errors similar to those reported in This is an example of a directory block AFTER the problem has occurred. Note the presence of the first entry, ".". Its length (24 decimal) includes the old ".." entry, which is seen in the second 12 bytes, but is not read because it's inside the rec_len of the first entry. Looking further down the directory block, we see a normal file dentry, then after that, we see the new ".." entry (look for '2e2e'), which includes the FID of the new parent. (It is followed by other file dentries.) |
| Comment by Patrick Farrell (Inactive) [ 15/Sep/14 ] |
|
Patch file & resulting namei.c from ldiskfs "make" of current master+this patch. |
| Comment by Patrick Farrell (Inactive) [ 15/Sep/14 ] |
|
The attached patch attempts to resolve the issue by special casing "..". A special, alternate length for ".." is calculated, which does not include the data section. When a dotdot entry is identified, the space checking code first checks to see if there is sufficient space for the data secton; if there is not, it then checks for space for the special alternate length. This guarantees ".." will be placed on top of the pre-existing ".." entry, even when there is not additional space for the FID. The result of this space check is recorded and is used to determine whether or not to write the data section. |
| Comment by Patrick Farrell (Inactive) [ 16/Sep/14 ] |
|
Patch is here: Local testing suggests this resolves the issue. |
| Comment by Patrick Farrell (Inactive) [ 17/Sep/14 ] |
|
We also saw file systems taking errors and getting remounted read-only, and we were initially unable to figure out why. It turns out that when an incorrect/damaged (IE, ".." entry in the wrong place) non-HTree directory is converted to an HTree directory, the conversion goes badly wrong, and the resulting directory is badly corrupt. With the patch for this bug, it's no longer possible to get in the bad state. I thought I'd share the errors here so others who hit this bug have a better chance of finding this JIRA ticket. Here's what the resulting errors look like - The key thing is "rec_len=2049", which we've always seen in this situation: |
| Comment by James A Simmons [ 22/Oct/14 ] |
|
Once this lands the ldiskfs patches for SLES12 and RHEL7 will need to be updated. |
| Comment by nasf (Inactive) [ 30/Oct/14 ] |
|
James, you mean the ldiskfs/kernel_patches/patches/sles11sp2/ext4-data-in-dirent.patch should be updated? or anything else? |
| Comment by James A Simmons [ 30/Oct/14 ] |
|
Now these changes need to be integrated into the upcoming SLES12 and RHEL7 ldiskfs work. |
| Comment by James A Simmons [ 31/Oct/14 ] |
|
To clarify RHEL7 and SLES12 server side support are both 2.8 feature so this ticket is safe to close. Just need to integrate this work into those distros for 2.8. |
| Comment by James A Simmons [ 03/Nov/14 ] |
|
Looks like we will need this for SLES11 SP3 as well. |
| Comment by Bob Glossman (Inactive) [ 05/Nov/14 ] |
|
for sles11sp3: |
| Comment by Bob Glossman (Inactive) [ 06/Nov/14 ] |
|
patched namei.c from sles11sp3 requested by comment in http://review.whamcloud.com/#/c/12585. to be absolutely clear this is the ldiskfs/namei.c after the full, updated patch series is applied, not just the 1 refreshed data-in-dirent patch. |
| Comment by Andreas Dilger [ 07/Nov/14 ] |
|
SLES 11 SP3 patch landed, need one for RHEL 7 ldiskfs series in |
| Comment by Bob Glossman (Inactive) [ 07/Nov/14 ] |
|
Already talked to Yang Sheng about el7. The latest refresh of http://review.whamcloud.com/#/c/10249 includes changes that are supposed to address the issue. Hopefully the el7 ldiskfs patches will be landing soon. |
| Comment by Peter Jones [ 07/Nov/14 ] |
|
ok then it sounds like the recent landing to master of the SLES fix means that this can be marked as resolved for 2.7 |
| Comment by James A Simmons [ 07/Nov/14 ] |
|
We should wait until 2.8 to land the ldiskfs series for RHEL7. |
| Comment by Gerrit Updater [ 03/Dec/14 ] |
|
Jian Yu (jian.yu@intel.com) uploaded a new patch: http://review.whamcloud.com/12926 |
| Comment by Gerrit Updater [ 03/Dec/14 ] |
|
Jian Yu (jian.yu@intel.com) uploaded a new patch: http://review.whamcloud.com/12927 |
| Comment by Gerrit Updater [ 09/Dec/14 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12926/ |
| Comment by Gerrit Updater [ 09/Dec/14 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12927/ |
| Comment by Andreas Dilger [ 04/Nov/15 ] |
|
Attached is a script that can be used to recover the files from lost+found in this case, when the output from e2fsck is available. The script itself is not ready for distribution use because it hard-codes pathnames to log files and such, but was useful in recovering problematic files. This could theoretically also be handled within e2fsck by moving the second entry (if otherwise valid) to another spot in the directory before clobbering it with "..", but I haven't looked at e2fsck to know of the complexity of this (it would have to know where to put the entry to avoid making it unreachable in htree directories, or if the directory doesn't have enough space). |