[LU-13392] FID-in-LMA does not match the object self-fid after upgrade from 2.10 to 2.12 Created: 26/Mar/20 Updated: 03/Feb/21 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Knut Franke | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | None | ||
| Environment: |
Lustre on ZFS, CentOS 7.7, 3.10.0-1062.9.1.el7_lustre.x86_64 |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
After upgrading a filesystem from Lustre 2.10.8 to 2.12.4 (following the major release upgrade procedure from chapter 17.2 of the manual), lstat() would hang on some of the files. After disabling auto_scrub on all OSTs, lstat() returns wtih -1 EREMCHG (Remote address changed). This appears to be related to the following errors in the OSS syslogs: 2020-03-26T10:50:21.222726+01:00 oss1 kernel: [249279.579945] LustreError: 32828:0:(osd_object.c:481:osd_check_lma()) aeromdo-OST0001: FID-in-LMA [0x100000000:0x0:0x0] does not match the object self-fid [0x100010000:0x0:0x0] 2020-03-26T10:50:21.222757+01:00 oss1 kernel: [249279.656311] LustreError: 32828:0:(osd_object.c:481:osd_check_lma()) Skipped 600 previous similar messages 2020-03-26T10:50:22.438285+01:00 oss1 kernel: [249280.818924] LustreError: 32828:0:(ofd_dev.c:1507:ofd_create_hdl()) aeromdo-OST0001: Can't find FID Sequence 0x0: rc = -78 2020-03-26T10:50:22.438306+01:00 oss1 kernel: [249280.872078] LustreError: 32828:0:(ofd_dev.c:1507:ofd_create_hdl()) Skipped 599 previous similar messages lctl lfsck_start -A -o did not resolve the issue; according to OI_scrub info, 258 out of 4659445 failed to be repaired on OST0000, as well as 322 out of 4661773 on OST0001. The issue appears to affect old files (created around 2015) rather than recently modified ones. |
| Comments |
| Comment by Knut Franke [ 26/Mar/20 ] |
|
This might be related to LU-12278, which also mentions the "FID-in-LMA does not match the object self-fid" error; although we do not experience the " can't get bonus" error, nor the crashes reported there. |
| Comment by Knut Franke [ 01/Apr/20 ] |
|
Digging deeper, I've identified two small text files (small enough to fit into one OST object); one affected by the issue, the other not. I've tracked down the following difference in the on-disk data structures of the two: working$ lfs getstripe --verbose tutorial/pre
tutorial/pre
lmm_magic: 0x0BD10BD0
lmm_seq: 0x300004280
lmm_object_id: 0x2edc
lmm_fid: [0x300004280:0x2edc:0x0]
lmm_stripe_count: 2
lmm_stripe_size: 1048576
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: 0
obdidx objid objid group
0 707339712 0x2a2925c0 0
1 707431336 0x2a2a8ba8 0
# ll_decode_filter_fid O/0/d$((707339712 % 32))/707339712
O/0/d0/707339712: warning: ffid size is unexpected (44 bytes), recompile?
O/0/d0/707339712: parent=[0x300004280:0x2edc:0x0] stripe=0
# ll_decode_filter_fid O/0/d$((707431336 % 32))/707431336
O/0/d8/707431336: warning: ffid size is unexpected (44 bytes), recompile?
O/0/d8/707431336: parent=[0x300004280:0x2edc:0x0] stripe=1
So everything works out correctly, despite the fact that ll_decode_filter is apparently unhappy about the size of the trusted.fid attribute. inaccessible (stat() fails)$ lfs getstripe --verbose watchdog.log
watchdog.log
lmm_magic: 0x0BD10BD0
lmm_seq: 0x300004280
lmm_object_id: 0x2eee
lmm_fid: [0x300004280:0x2eee:0x0]
lmm_stripe_count: 2
lmm_stripe_size: 1048576
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: 0
obdidx objid objid group
0 498981674 0x1dbddb2a 0
1 498980471 0x1dbdd677 0
# ll_decode_filter_fid O/0/d$((498981674 % 32))/498981674
O/0/d10/498981674: parent=[0x300004280:0x2eee:0x0] stripe=0 stripe_size=1048576 stripe_count=2 layout_version=0 range=0
# ll_decode_filter_fid O/0/d$((498980471 % 32))/498980471
O/0/d23/498980471: error reading fid: No data available
Indeed, I could verify using zdb that object 498980471 on OST 1 is missing the trusted.fid and trusted.version attributes (though trusted.lma is present).
Given that everything was working with Luster 2.10, this looks to me as if 2.12 no longer supports the on-disk format used by some/all of the older files (for the example above, the ZFS object was created in May 2016; I'd have to do more research to find out what Lustre version we were using back then and whether a migration from an older ldiskfs installation was involved).
|
| Comment by Knut Franke [ 16/Apr/20 ] |
|
So after digging through the source code and the filesystem some more, the puzzle pieces are slowly coming together. I'm still not sure whether or not the (empty) object on OST1 is supposed to have a trusted.fid EA, but apparently trusted.lma is inconsistent with something else:
oss0 # getfattr -n trusted.lma --only-values O/0/d$((498981674 % 32))/498981674 | xxd 0000000: 0000 0000 0000 0000 0000 0000 0100 0000 ................ 0000010: 2adb bd1d 0000 0000 *....... oss1 # getfattr -n trusted.lma --only-values O/0/d$((498980471 % 32))/498980471 | xxd 0000000: 0000 0000 0000 0000 0000 0000 0100 0000 ................ 0000010: 77d6 bd1d 0000 0000 w....... which (I think) translates into FID-in-LMA values of [0x100000000:0x1dbddb2a:0x0] and [0x100000000:0x1dbdd677:0x0], respectively. After the failed attempt at accessing the file, searching the lctl debug_kernel output for these yields the following on oss1: 00080000:00020000:18.0:1587047810.568980:0:1951:0:(osd_object.c:481:osd_check_lma()) aeromdo-OST0001: FID-in-LMA [0x100000000:0x1dbdd677:0x0] does not match the object self-fid [0x100010000:0x1dbdd677:0x0] and no results on oss0. This suggests to me that the sequence number of the object self-fid is off on OST1, but unfortunately I have no idea how this is derived during lookup.
|
| Comment by Knut Franke [ 23/Apr/20 ] |
|
After manually setting the FID-in-LMA on ost1 to }}[0x100010000:0x1dbdd677:0x0]{{, stat on the file succeeds, but reports incorrect ownership (root.root) and timestamps (too recent), so clearly something else is amiss here. |
| Comment by Knut Franke [ 23/Apr/20 ] |
|
In lustre/osp/osp_internal.h, I found the following comment:
Looking at the inaccessible files (and the OSS logs), it seems that the entire issue can be traced to lookup failures of objects on OST 1 with FID-in-LMA sequence number 0x100000000 (i.e. written by Lustre 2.4/2.5, which is a reasonable assumption for the filesystem and files in question), where Lustre erroneously adds the OST index to the self-fid during comparsion. If this is true, this error should occur for basically all files written by Lustre 2.4/2.5 (except if they have a stripe count of 1 and only reside on OST 0). |
| Comment by Knut Franke [ 27/Apr/20 ] |
|
During further testing with other affected files, I could not reproduce the issue with incorrect ownership/timestamp after manually updating the FID-in-LMA to include the OST index. I'm assuming that issue was due to some other tampering of mine while debugging with that particular file. |
| Comment by Knut Franke [ 29/Apr/20 ] |
|
I've updated all affected objects in the filesystem (script attached). So far everything looks fine, no more hangs or stat() failures or errors in the Lustre logs. |