[LU-9836] Issues with 2.10 upgrade and files missing LMAC_FID_ON_OST flag - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: Lustre 2.11.0, Lustre 2.10.4
Affects Version/s: Lustre 2.10.0
Labels:
None
Environment:
3.10.0-514.21.1.el7_lustre.x86_64

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

Last weekend, we've upgraded our lustre from 2.7 to 2.10. After the upgrade, we were missing about 36M objects. After a bunch of troubleshooting, we ended up running e2fsck (which recovered the objects to lost+found) and ll_recover_lost_found_objs (which moved them back to the proper place in the ldiskfs filesystem). It's worth noting that lfsck couldn't recover the objects from lost+found (because of some kind of incompatibility between the objects EA and lfsck, details following).

Couple of remarks:
1. it looks like the functionality from ll_recover_lost_found_objs has been moved to the oi_scrub initial process but it would not work for us. We noticed 2 things:
a. in the lab, we recreated the issue (by moving manually objects to lost+found) and osd_initial_OI_scrub() would recover only the first 255 objects. We couldn't figure out why it stopped at 255 and restarting the OST would not recover any more than the initial 255.
b. in prod, osd_initial_OI_scrub() would run but not fix anything. The trace would come back with osd_ios_lf_fill() returning -EINVAL. After troubleshooting this issue more, it turns out all the objects in lost+found do have no compat flag (in particular no LMAC_FID_ON_OST) in the LMA extended attribute and eventually we end up in with osd_get_idif() returning -EINVAL (because __ost_xattr_get() returned 24). We believe all those files were created with lustre 2.7.
-> this is how far we got troubleshooting those 2 issues. Sounds like bugs, we are happy to give more details and/or file a bug report if that helps.
2. our lustre has 96 OST (id 0 to 97). All of the bad objects were located on 24 of them (id 48 to 71) – about 1.5M bad inodes out of 3M per OST. What's special about id 48 to 71, is that those OSTs have been reformatted about 6 months ago (with the same id, but at creation we forgot to add --replace to mkfs or do a writeconf). At the time, we saw some "precreate FID 0x0:3164581 is over 100000 larger than the LAST_ID 0x0:0, only precreating the last 10000 objects." in the logs. This sounds like the potential root cause to our issue last week, but we really can't figure out how this would have caused half of inodes to not get LMAC_FID_ON_OST and get lost in ldiskfs.
3. after fixing everything, when we run the lfsck -t scrub, all the bad objects are being checked and reported as failed in oi_scrub (example below). After digging, this comes down to the same ost_get_idif() function returning -EINVAL. We can fix this by copying files.
checked: 3383278
updated: 0
failed: 1469776

Overall, we just wanted to report this on the mailing list in case someone else runs into this issue and see if we should open bugs about 1.a. and 1.b. And also, we were curious whether anybody had any explanation on how we got there and whether 2. could explain it.

This is pretty dense, but overall reports 3 issues:

osd_initial_OI_scrub() seems to recover up to 255 files from lost+found, never more
somehow we got some files without compat flags that don't get processed by osd_initial_OI_scrub() and report failed by lfsck -t scrub
we have 24 OST of 96 that had 1.5M unattached inodes somehow and we're curious how this could have happened.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

800a_mount_patched.log.0.gz
262 kB
29/Aug/17 12:33 PM
800a_mount.log.gz
251 kB
17/Aug/17 5:30 PM
debugfs_mount_logs.tar.gz
232 kB
28/Aug/17 12:04 PM
l210_loop_4g.tar.xz
2.30 MB
11/Sep/17 11:47 AM
ls_output.gz
2 kB
28/Aug/17 1:07 PM

Issue Links

is related to

LU-11584 kernel BUG at ldiskfs.h:1907!

Resolved

Activity

[LU-9836] Issues with 2.10 upgrade and files missing LMAC_FID_ON_OST flag

Tim McMullan (Inactive) added a comment - 30/Aug/17 11:34 AM

This is on a real block device and is too large to upload reasonably. I will try to recreate on a loopback device that is small enough to upload here

Tim McMullan (Inactive) added a comment - 30/Aug/17 11:34 AM This is on a real block device and is too large to upload reasonably. I will try to recreate on a loopback device that is small enough to upload here

nasf (Inactive) added a comment - 30/Aug/17 12:40 AM

mcmult, are you using loop device or real block device for the test? If it is loop device, how large is it? and is it possible to upload the 'bad' image?

nasf (Inactive) added a comment - 30/Aug/17 12:40 AM mcmult , are you using loop device or real block device for the test? If it is loop device, how large is it? and is it possible to upload the 'bad' image?

Tim McMullan (Inactive) added a comment - 29/Aug/17 12:34 PM

I applied the patch and mounted it up, here is the log from the patched mount. 800a_mount_patched.log.0.gz

Thank you!800a_mount_patched.log.0.gz

Tim McMullan (Inactive) added a comment - 29/Aug/17 12:34 PM I applied the patch and mounted it up, here is the log from the patched mount. 800a_mount_patched.log.0.gz Thank you! 800a_mount_patched.log.0.gz

nasf (Inactive) added a comment - 28/Aug/17 3:38 PM

mcmult, honestly, I am not sure why the readdir() cannot return name entry from non-empty lost+found directory. I made a debug patch 28757. Would you please to try it just on your current OST image to see whether the items under lost+found can be recovered? Please re-collect the -1 level debug logs on the OST when the patch applied. Thanks!

nasf (Inactive) added a comment - 28/Aug/17 3:38 PM mcmult , honestly, I am not sure why the readdir() cannot return name entry from non-empty lost+found directory. I made a debug patch 28757. Would you please to try it just on your current OST image to see whether the items under lost+found can be recovered? Please re-collect the -1 level debug logs on the OST when the patch applied. Thanks!

Gerrit Updater added a comment - 28/Aug/17 3:32 PM

Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/28757
Subject: ~~LU-9836~~ scrub: 32bit hash for scanning lost+found
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 762d699d30f6023e7b239ed7fd138ea849c766bd

Gerrit Updater added a comment - 28/Aug/17 3:32 PM Fan Yong (fan.yong@intel.com) uploaded a new patch: https://review.whamcloud.com/28757 Subject: LU-9836 scrub: 32bit hash for scanning lost+found Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 762d699d30f6023e7b239ed7fd138ea849c766bd

Tim McMullan (Inactive) added a comment - 28/Aug/17 1:16 PM

I just mounted it up ldiskfs, ran ls, and to make things more strange I am seeing the same set of objects (ls_output.gz. I see that 8314 is different though... 0 length file. I can try manually restoring it and see if it continues as expected?

Thanks!

Tim McMullan (Inactive) added a comment - 28/Aug/17 1:16 PM I just mounted it up ldiskfs, ran ls, and to make things more strange I am seeing the same set of objects ( ls_output.gz . I see that 8314 is different though... 0 length file. I can try manually restoring it and see if it continues as expected? Thanks!

nasf (Inactive) added a comment - 28/Aug/17 12:35 PM

It is strange that the debugfs shows that the /lost+found is not empty, but the readdir() during mount only found "." and ".." entries. Currently, I am not sure what caused such strange behavior, but since debugfs parses the directory by itself logic, not general readdir(), I would suggest to mount the device "ldiskfs", then double check the /lost+found directory. Thanks!

nasf (Inactive) added a comment - 28/Aug/17 12:35 PM It is strange that the debugfs shows that the /lost+found is not empty, but the readdir() during mount only found "." and ".." entries. Currently, I am not sure what caused such strange behavior, but since debugfs parses the directory by itself logic, not general readdir(), I would suggest to mount the device "ldiskfs", then double check the /lost+found directory. Thanks!

Tim McMullan (Inactive) added a comment - 28/Aug/17 12:05 PM

I've uploaded debugfs_mount_logs.tar.gz which contains the output ls through debugfs and a mount performed just after running it.

Thanks!

Tim McMullan (Inactive) added a comment - 28/Aug/17 12:05 PM I've uploaded debugfs_mount_logs.tar.gz which contains the output ls through debugfs and a mount performed just after running it. Thanks!

nasf (Inactive) added a comment - 28/Aug/17 4:29 AM - edited

00100000:00000001:7.0:1502990563.221005:0:18533:0:(osd_scrub.c:2331:osd_ios_general_scan()) Process entered
00100000:00000001:7.0:1502990563.221042:0:18533:0:(osd_scrub.c:2105:osd_ios_lf_fill()) Process entered
00100000:00000001:7.0:1502990563.221042:0:18533:0:(osd_scrub.c:2109:osd_ios_lf_fill()) Process leaving (rc=0 : 0 : 0)
00100000:00000001:7.0:1502990563.221043:0:18533:0:(osd_scrub.c:2105:osd_ios_lf_fill()) Process entered
00100000:00000001:7.0:1502990563.221044:0:18533:0:(osd_scrub.c:2109:osd_ios_lf_fill()) Process leaving (rc=0 : 0 : 0)
00100000:00000001:7.0:1502990563.221005:0:18533:0:(osd_scrub.c:2331:osd_ios_general_scan()) Process entered

That means the /lost+found only contains '.' and '..' entries, empty. So if you have the environment of partly recovered system, would you please to show me the output:

debugfs -c -R "ls /lost+found/" $device

If the /lost+found is not empty, then please re-collect the Lustre kernel debug logs as you did in the comment https://jira.hpdd.intel.com/browse/LU-9836?focusedCommentId=205627&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-205627

Thanks!

nasf (Inactive) added a comment - 28/Aug/17 4:29 AM - edited 00100000:00000001:7.0:1502990563.221005:0:18533:0:(osd_scrub.c:2331:osd_ios_general_scan()) Process entered 00100000:00000001:7.0:1502990563.221042:0:18533:0:(osd_scrub.c:2105:osd_ios_lf_fill()) Process entered 00100000:00000001:7.0:1502990563.221042:0:18533:0:(osd_scrub.c:2109:osd_ios_lf_fill()) Process leaving (rc=0 : 0 : 0) 00100000:00000001:7.0:1502990563.221043:0:18533:0:(osd_scrub.c:2105:osd_ios_lf_fill()) Process entered 00100000:00000001:7.0:1502990563.221044:0:18533:0:(osd_scrub.c:2109:osd_ios_lf_fill()) Process leaving (rc=0 : 0 : 0) 00100000:00000001:7.0:1502990563.221005:0:18533:0:(osd_scrub.c:2331:osd_ios_general_scan()) Process entered That means the /lost+found only contains '.' and '..' entries, empty. So if you have the environment of partly recovered system, would you please to show me the output: debugfs -c -R "ls /lost+found/" $device If the /lost+found is not empty, then please re-collect the Lustre kernel debug logs as you did in the comment https://jira.hpdd.intel.com/browse/LU-9836?focusedCommentId=205627&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-205627 Thanks!

Tim McMullan (Inactive) added a comment - 25/Aug/17 11:54 AM

Hey @nasf, just wondering if this log has been helpful. I can also generate a log on a fresh instance of lustre and get the log while it recovers files and stops partway though if you would like it!

Thanks!

Tim McMullan (Inactive) added a comment - 25/Aug/17 11:54 AM Hey @nasf, just wondering if this log has been helpful. I can also generate a log on a fresh instance of lustre and get the log while it recovers files and stops partway though if you would like it! Thanks!

Tim McMullan (Inactive) added a comment - 17/Aug/17 5:33 PM

I've attached the log (800a_mount.log.gz) to the ticket. It should cover from when i issued the mount to about a half second after mount returned. Thanks!

Tim McMullan (Inactive) added a comment - 17/Aug/17 5:33 PM I've attached the log ( 800a_mount.log.gz ) to the ticket. It should cover from when i issued the mount to about a half second after mount returned. Thanks!

People

Assignee:: nasf (Inactive)

Reporter:: Julien Wallior

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 04/Aug/17 10:19 PM

Updated:: 31/Oct/18 7:44 PM

Resolved:: 25/Jan/18 7:41 AM