[LU-13392] FID-in-LMA does not match the object self-fid after upgrade from 2.10 to 2.12 Created: 26/Mar/20  Updated: 03/Feb/21

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.4
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Knut Franke Assignee: WC Triage
Resolution: Unresolved Votes: 1
Labels: None
Environment:

Lustre on ZFS, CentOS 7.7, 3.10.0-1062.9.1.el7_lustre.x86_64


Attachments: HTML File update_25_objects    
Issue Links:
Related
is related to LU-14119 FID-in-LMA [fid1] does not match the ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

After upgrading a filesystem from Lustre 2.10.8 to 2.12.4 (following the major release upgrade procedure from chapter 17.2 of the manual), lstat() would hang on some of the files. After disabling auto_scrub on all OSTs, lstat() returns wtih -1 EREMCHG (Remote address changed). This appears to be related to the following errors in the OSS syslogs:

2020-03-26T10:50:21.222726+01:00 oss1 kernel: [249279.579945] LustreError: 32828:0:(osd_object.c:481:osd_check_lma()) aeromdo-OST0001: FID-in-LMA [0x100000000:0x0:0x0] does not match the object self-fid [0x100010000:0x0:0x0]
2020-03-26T10:50:21.222757+01:00 oss1 kernel: [249279.656311] LustreError: 32828:0:(osd_object.c:481:osd_check_lma()) Skipped 600 previous similar messages
2020-03-26T10:50:22.438285+01:00 oss1 kernel: [249280.818924] LustreError: 32828:0:(ofd_dev.c:1507:ofd_create_hdl()) aeromdo-OST0001: Can't find FID Sequence 0x0: rc = -78
2020-03-26T10:50:22.438306+01:00 oss1 kernel: [249280.872078] LustreError: 32828:0:(ofd_dev.c:1507:ofd_create_hdl()) Skipped 599 previous similar messages

lctl lfsck_start -A -o did not resolve the issue; according to OI_scrub info, 258 out of 4659445 failed to be repaired on OST0000, as well as 322 out of 4661773 on OST0001.

The issue appears to affect old files (created around 2015) rather than recently modified ones.



 Comments   
Comment by Knut Franke [ 26/Mar/20 ]

This might be related to LU-12278, which also mentions the "FID-in-LMA does not match the object self-fid" error; although we do not experience the "

can't get bonus" error, nor the crashes reported there.

Comment by Knut Franke [ 01/Apr/20 ]

Digging deeper, I've identified two small text files (small enough to fit into one OST object); one affected by the issue, the other not. I've tracked down the following difference in the on-disk data structures of the two:

working 

$ lfs getstripe --verbose tutorial/pre
tutorial/pre
lmm_magic:         0x0BD10BD0
lmm_seq:           0x300004280
lmm_object_id:     0x2edc
lmm_fid:           [0x300004280:0x2edc:0x0]
lmm_stripe_count:  2
lmm_stripe_size:   1048576
lmm_pattern:       1
lmm_layout_gen:    0
lmm_stripe_offset: 0
    obdidx       objid       objid       group
         0       707339712     0x2a2925c0                0
         1       707431336     0x2a2a8ba8                0

# ll_decode_filter_fid O/0/d$((707339712 % 32))/707339712
O/0/d0/707339712: warning: ffid size is unexpected (44 bytes), recompile?
O/0/d0/707339712: parent=[0x300004280:0x2edc:0x0] stripe=0
# ll_decode_filter_fid O/0/d$((707431336 % 32))/707431336
O/0/d8/707431336: warning: ffid size is unexpected (44 bytes), recompile?
O/0/d8/707431336: parent=[0x300004280:0x2edc:0x0] stripe=1

 

So everything works out correctly, despite the fact that ll_decode_filter is apparently unhappy about the size of the trusted.fid attribute.

inaccessible (stat() fails)

$ lfs getstripe --verbose watchdog.log
watchdog.log
lmm_magic:         0x0BD10BD0
lmm_seq:           0x300004280
lmm_object_id:     0x2eee
lmm_fid:           [0x300004280:0x2eee:0x0]
lmm_stripe_count:  2
lmm_stripe_size:   1048576
lmm_pattern:       1
lmm_layout_gen:    0
lmm_stripe_offset: 0
    obdidx       objid       objid       group
         0       498981674     0x1dbddb2a                0
         1       498980471     0x1dbdd677                0

# ll_decode_filter_fid O/0/d$((498981674 % 32))/498981674
O/0/d10/498981674: parent=[0x300004280:0x2eee:0x0] stripe=0 stripe_size=1048576 stripe_count=2 layout_version=0 range=0
# ll_decode_filter_fid O/0/d$((498980471 % 32))/498980471
O/0/d23/498980471: error reading fid: No data available

Indeed, I could verify using zdb that object 498980471 on OST 1 is missing the trusted.fid and trusted.version attributes (though trusted.lma is present).

 

Given that everything was working with Luster 2.10, this looks to me as if 2.12 no longer supports the on-disk format used by some/all of the older files (for the example above, the ZFS object was created in May 2016; I'd have to do more research to find out what Lustre version we were using back then and whether a migration from an older ldiskfs installation was involved).

 

Comment by Knut Franke [ 16/Apr/20 ]

So after digging through the source code and the filesystem some more, the puzzle pieces are slowly coming together. I'm still not sure whether or not the (empty) object on OST1 is supposed to have a trusted.fid EA, but apparently trusted.lma is inconsistent with something else:

 

oss0 # getfattr -n trusted.lma --only-values O/0/d$((498981674 % 32))/498981674 | xxd
0000000: 0000 0000 0000 0000 0000 0000 0100 0000  ................
0000010: 2adb bd1d 0000 0000                      *.......
oss1 # getfattr -n trusted.lma --only-values O/0/d$((498980471 % 32))/498980471 | xxd
0000000: 0000 0000 0000 0000 0000 0000 0100 0000  ................
0000010: 77d6 bd1d 0000 0000                      w.......

which (I think) translates into FID-in-LMA values of [0x100000000:0x1dbddb2a:0x0] and [0x100000000:0x1dbdd677:0x0], respectively. After the failed attempt at accessing the file, searching the lctl debug_kernel output for these yields the following on oss1:

00080000:00020000:18.0:1587047810.568980:0:1951:0:(osd_object.c:481:osd_check_lma()) aeromdo-OST0001: FID-in-LMA [0x100000000:0x1dbdd677:0x0] does not match the object self-fid [0x100010000:0x1dbdd677:0x0] 

and no results on oss0. This suggests to me that the sequence number of the object self-fid is off on OST1, but unfortunately I have no idea how this is derived during lookup.

 

Comment by Knut Franke [ 23/Apr/20 ]

After manually setting the FID-in-LMA on ost1 to }}[0x100010000:0x1dbdd677:0x0]{{, stat on the file succeeds, but reports incorrect ownership (root.root) and timestamps (too recent), so clearly something else is amiss here.

Comment by Knut Franke [ 23/Apr/20 ]

In lustre/osp/osp_internal.h, I found the following comment:

In 2.6+ ost_idx is packed into IDIF FID, while in 2.4 and 2.5 IDIF is always FID_SEQ_IDIF(0x100000000ULL), which does not include OST index in the seq.

Looking at the inaccessible files (and the OSS logs), it seems that the entire issue can be traced to lookup failures of  objects on OST 1 with FID-in-LMA sequence number 0x100000000 (i.e. written by Lustre 2.4/2.5, which is a reasonable assumption for the filesystem and files in question), where Lustre erroneously adds the OST index to the self-fid during comparsion. If this is true, this error should occur for basically all files written by Lustre 2.4/2.5 (except if they have a stripe count of 1 and only reside on OST 0).

Comment by Knut Franke [ 27/Apr/20 ]

During further testing with other affected files, I could not reproduce the issue with incorrect ownership/timestamp after manually updating the FID-in-LMA to include the OST index. I'm assuming that issue was due to some other tampering of mine while debugging with that particular file.

Comment by Knut Franke [ 29/Apr/20 ]

I've updated all affected objects in the filesystem (script attached). So far everything looks fine, no more hangs or stat() failures or errors in the Lustre logs.

Generated at Sat Feb 10 03:00:53 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.