Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13392

FID-in-LMA does not match the object self-fid after upgrade from 2.10 to 2.12

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.12.4
    • None
    • Lustre on ZFS, CentOS 7.7, 3.10.0-1062.9.1.el7_lustre.x86_64
    • 3
    • 9223372036854775807

    Description

      After upgrading a filesystem from Lustre 2.10.8 to 2.12.4 (following the major release upgrade procedure from chapter 17.2 of the manual), lstat() would hang on some of the files. After disabling auto_scrub on all OSTs, lstat() returns wtih -1 EREMCHG (Remote address changed). This appears to be related to the following errors in the OSS syslogs:

      2020-03-26T10:50:21.222726+01:00 oss1 kernel: [249279.579945] LustreError: 32828:0:(osd_object.c:481:osd_check_lma()) aeromdo-OST0001: FID-in-LMA [0x100000000:0x0:0x0] does not match the object self-fid [0x100010000:0x0:0x0]
      2020-03-26T10:50:21.222757+01:00 oss1 kernel: [249279.656311] LustreError: 32828:0:(osd_object.c:481:osd_check_lma()) Skipped 600 previous similar messages
      2020-03-26T10:50:22.438285+01:00 oss1 kernel: [249280.818924] LustreError: 32828:0:(ofd_dev.c:1507:ofd_create_hdl()) aeromdo-OST0001: Can't find FID Sequence 0x0: rc = -78
      2020-03-26T10:50:22.438306+01:00 oss1 kernel: [249280.872078] LustreError: 32828:0:(ofd_dev.c:1507:ofd_create_hdl()) Skipped 599 previous similar messages
      

      lctl lfsck_start -A -o did not resolve the issue; according to OI_scrub info, 258 out of 4659445 failed to be repaired on OST0000, as well as 322 out of 4661773 on OST0001.

      The issue appears to affect old files (created around 2015) rather than recently modified ones.

      Attachments

        Issue Links

          Activity

            [LU-13392] FID-in-LMA does not match the object self-fid after upgrade from 2.10 to 2.12
            knut.franke Knut Franke added a comment -

            I've updated all affected objects in the filesystem (script attached). So far everything looks fine, no more hangs or stat() failures or errors in the Lustre logs.

            knut.franke Knut Franke added a comment - I've updated all affected objects in the filesystem (script attached). So far everything looks fine, no more hangs or stat() failures or errors in the Lustre logs.
            knut.franke Knut Franke added a comment -

            During further testing with other affected files, I could not reproduce the issue with incorrect ownership/timestamp after manually updating the FID-in-LMA to include the OST index. I'm assuming that issue was due to some other tampering of mine while debugging with that particular file.

            knut.franke Knut Franke added a comment - During further testing with other affected files, I could not reproduce the issue with incorrect ownership/timestamp after manually updating the FID-in-LMA to include the OST index. I'm assuming that issue was due to some other tampering of mine while debugging with that particular file.
            knut.franke Knut Franke added a comment -

            In lustre/osp/osp_internal.h, I found the following comment:

            In 2.6+ ost_idx is packed into IDIF FID, while in 2.4 and 2.5 IDIF is always FID_SEQ_IDIF(0x100000000ULL), which does not include OST index in the seq.

            Looking at the inaccessible files (and the OSS logs), it seems that the entire issue can be traced to lookup failures of  objects on OST 1 with FID-in-LMA sequence number 0x100000000 (i.e. written by Lustre 2.4/2.5, which is a reasonable assumption for the filesystem and files in question), where Lustre erroneously adds the OST index to the self-fid during comparsion. If this is true, this error should occur for basically all files written by Lustre 2.4/2.5 (except if they have a stripe count of 1 and only reside on OST 0).

            knut.franke Knut Franke added a comment - In lustre/osp/osp_internal.h, I found the following comment: In 2.6+ ost_idx is packed into IDIF FID, while in 2.4 and 2.5 IDIF is always FID_SEQ_IDIF(0x100000000ULL), which does not include OST index in the seq. Looking at the inaccessible files (and the OSS logs), it seems that the entire issue can be traced to lookup failures of  objects on OST 1 with FID-in-LMA sequence number 0x100000000 (i.e. written by Lustre 2.4/2.5, which is a reasonable assumption for the filesystem and files in question), where Lustre erroneously adds the OST index to the self-fid during comparsion. If this is true, this error should occur for basically all files written by Lustre 2.4/2.5 (except if they have a stripe count of 1 and only reside on OST 0).
            knut.franke Knut Franke added a comment -

            After manually setting the FID-in-LMA on ost1 to }}[0x100010000:0x1dbdd677:0x0]{{, stat on the file succeeds, but reports incorrect ownership (root.root) and timestamps (too recent), so clearly something else is amiss here.

            knut.franke Knut Franke added a comment - After manually setting the FID-in-LMA on ost1 to }} [0x100010000:0x1dbdd677:0x0] {{, stat on the file succeeds, but reports incorrect ownership (root.root) and timestamps (too recent), so clearly something else is amiss here.
            knut.franke Knut Franke added a comment -

            So after digging through the source code and the filesystem some more, the puzzle pieces are slowly coming together. I'm still not sure whether or not the (empty) object on OST1 is supposed to have a trusted.fid EA, but apparently trusted.lma is inconsistent with something else:

             

            oss0 # getfattr -n trusted.lma --only-values O/0/d$((498981674 % 32))/498981674 | xxd
            0000000: 0000 0000 0000 0000 0000 0000 0100 0000  ................
            0000010: 2adb bd1d 0000 0000                      *.......
            oss1 # getfattr -n trusted.lma --only-values O/0/d$((498980471 % 32))/498980471 | xxd
            0000000: 0000 0000 0000 0000 0000 0000 0100 0000  ................
            0000010: 77d6 bd1d 0000 0000                      w.......
            

            which (I think) translates into FID-in-LMA values of [0x100000000:0x1dbddb2a:0x0] and [0x100000000:0x1dbdd677:0x0], respectively. After the failed attempt at accessing the file, searching the lctl debug_kernel output for these yields the following on oss1:

            00080000:00020000:18.0:1587047810.568980:0:1951:0:(osd_object.c:481:osd_check_lma()) aeromdo-OST0001: FID-in-LMA [0x100000000:0x1dbdd677:0x0] does not match the object self-fid [0x100010000:0x1dbdd677:0x0] 

            and no results on oss0. This suggests to me that the sequence number of the object self-fid is off on OST1, but unfortunately I have no idea how this is derived during lookup.

             

            knut.franke Knut Franke added a comment - So after digging through the source code and the filesystem some more, the puzzle pieces are slowly coming together. I'm still not sure whether or not the (empty) object on OST1 is supposed to have a trusted.fid EA, but apparently trusted.lma is inconsistent with something else:   oss0 # getfattr -n trusted.lma --only-values O/0/d$((498981674 % 32))/498981674 | xxd 0000000: 0000 0000 0000 0000 0000 0000 0100 0000 ................ 0000010: 2adb bd1d 0000 0000 *....... oss1 # getfattr -n trusted.lma --only-values O/0/d$((498980471 % 32))/498980471 | xxd 0000000: 0000 0000 0000 0000 0000 0000 0100 0000 ................ 0000010: 77d6 bd1d 0000 0000 w....... which (I think) translates into FID-in-LMA values of [0x100000000:0x1dbddb2a:0x0] and [0x100000000:0x1dbdd677:0x0] , respectively. After the failed attempt at accessing the file, searching the lctl debug_kernel output for these yields the following on oss1: 00080000:00020000:18.0:1587047810.568980:0:1951:0:(osd_object.c:481:osd_check_lma()) aeromdo-OST0001: FID-in-LMA [0x100000000:0x1dbdd677:0x0] does not match the object self-fid [0x100010000:0x1dbdd677:0x0] and no results on oss0. This suggests to me that the sequence number of the object self-fid is off on OST1, but unfortunately I have no idea how this is derived during lookup.  
            knut.franke Knut Franke added a comment -

            Digging deeper, I've identified two small text files (small enough to fit into one OST object); one affected by the issue, the other not. I've tracked down the following difference in the on-disk data structures of the two:

            working 

            $ lfs getstripe --verbose tutorial/pre
            tutorial/pre
            lmm_magic:         0x0BD10BD0
            lmm_seq:           0x300004280
            lmm_object_id:     0x2edc
            lmm_fid:           [0x300004280:0x2edc:0x0]
            lmm_stripe_count:  2
            lmm_stripe_size:   1048576
            lmm_pattern:       1
            lmm_layout_gen:    0
            lmm_stripe_offset: 0
                obdidx       objid       objid       group
                     0       707339712     0x2a2925c0                0
                     1       707431336     0x2a2a8ba8                0
            
            # ll_decode_filter_fid O/0/d$((707339712 % 32))/707339712
            O/0/d0/707339712: warning: ffid size is unexpected (44 bytes), recompile?
            O/0/d0/707339712: parent=[0x300004280:0x2edc:0x0] stripe=0
            # ll_decode_filter_fid O/0/d$((707431336 % 32))/707431336
            O/0/d8/707431336: warning: ffid size is unexpected (44 bytes), recompile?
            O/0/d8/707431336: parent=[0x300004280:0x2edc:0x0] stripe=1
            

             

            So everything works out correctly, despite the fact that ll_decode_filter is apparently unhappy about the size of the trusted.fid attribute.

            inaccessible (stat() fails)

            $ lfs getstripe --verbose watchdog.log
            watchdog.log
            lmm_magic:         0x0BD10BD0
            lmm_seq:           0x300004280
            lmm_object_id:     0x2eee
            lmm_fid:           [0x300004280:0x2eee:0x0]
            lmm_stripe_count:  2
            lmm_stripe_size:   1048576
            lmm_pattern:       1
            lmm_layout_gen:    0
            lmm_stripe_offset: 0
                obdidx       objid       objid       group
                     0       498981674     0x1dbddb2a                0
                     1       498980471     0x1dbdd677                0
            
            # ll_decode_filter_fid O/0/d$((498981674 % 32))/498981674
            O/0/d10/498981674: parent=[0x300004280:0x2eee:0x0] stripe=0 stripe_size=1048576 stripe_count=2 layout_version=0 range=0
            # ll_decode_filter_fid O/0/d$((498980471 % 32))/498980471
            O/0/d23/498980471: error reading fid: No data available
            

            Indeed, I could verify using zdb that object 498980471 on OST 1 is missing the trusted.fid and trusted.version attributes (though trusted.lma is present).

             

            Given that everything was working with Luster 2.10, this looks to me as if 2.12 no longer supports the on-disk format used by some/all of the older files (for the example above, the ZFS object was created in May 2016; I'd have to do more research to find out what Lustre version we were using back then and whether a migration from an older ldiskfs installation was involved).

             

            knut.franke Knut Franke added a comment - Digging deeper, I've identified two small text files (small enough to fit into one OST object); one affected by the issue, the other not. I've tracked down the following difference in the on-disk data structures of the two: working  $ lfs getstripe --verbose tutorial/pre tutorial/pre lmm_magic: 0x0BD10BD0 lmm_seq: 0x300004280 lmm_object_id: 0x2edc lmm_fid: [0x300004280:0x2edc:0x0] lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: 1 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 707339712 0x2a2925c0 0 1 707431336 0x2a2a8ba8 0 # ll_decode_filter_fid O/0/d$((707339712 % 32))/707339712 O/0/d0/707339712: warning: ffid size is unexpected (44 bytes), recompile? O/0/d0/707339712: parent=[0x300004280:0x2edc:0x0] stripe=0 # ll_decode_filter_fid O/0/d$((707431336 % 32))/707431336 O/0/d8/707431336: warning: ffid size is unexpected (44 bytes), recompile? O/0/d8/707431336: parent=[0x300004280:0x2edc:0x0] stripe=1   So everything works out correctly, despite the fact that ll_decode_filter is apparently unhappy about the size of the trusted.fid attribute. inaccessible (stat() fails) $ lfs getstripe --verbose watchdog.log watchdog.log lmm_magic: 0x0BD10BD0 lmm_seq: 0x300004280 lmm_object_id: 0x2eee lmm_fid: [0x300004280:0x2eee:0x0] lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: 1 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 498981674 0x1dbddb2a 0 1 498980471 0x1dbdd677 0 # ll_decode_filter_fid O/0/d$((498981674 % 32))/498981674 O/0/d10/498981674: parent=[0x300004280:0x2eee:0x0] stripe=0 stripe_size=1048576 stripe_count=2 layout_version=0 range=0 # ll_decode_filter_fid O/0/d$((498980471 % 32))/498980471 O/0/d23/498980471: error reading fid: No data available Indeed, I could verify using zdb that object 498980471 on OST 1 is missing the trusted.fid and trusted.version attributes (though trusted.lma is present).   Given that everything was working with Luster 2.10, this looks to me as if 2.12 no longer supports the on-disk format used by some/all of the older files (for the example above, the ZFS object was created in May 2016; I'd have to do more research to find out what Lustre version we were using back then and whether a migration from an older ldiskfs installation was involved).  
            knut.franke Knut Franke added a comment -

            This might be related to LU-12278, which also mentions the "FID-in-LMA does not match the object self-fid" error; although we do not experience the "

            can't get bonus" error, nor the crashes reported there.

            knut.franke Knut Franke added a comment - This might be related to  LU-12278 , which also mentions the "FID-in-LMA does not match the object self-fid" error; although we do not experience the " can't get bonus" error, nor the crashes reported there.

            People

              wc-triage WC Triage
              knut.franke Knut Franke
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: