Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7867

OI scrubber causing performance issues

Details

    • Question/Request
    • Resolution: Fixed
    • Major
    • None
    • Lustre 2.5.5
    • None
    • 9223372036854775807

    Description

      OI scrubber causing performance issues.

      • There are orphaned objects that are causing the OI scrubber to start.
      • Is there an online or offline way of running lfsck at the moment?

      Attachments

        Issue Links

          Activity

            [LU-7867] OI scrubber causing performance issues

            Any further information about the issue? Have we ever run e2fsck as discussed above?

            yong.fan nasf (Inactive) added a comment - Any further information about the issue? Have we ever run e2fsck as discussed above?

            Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/18999/
            Subject: LU-7867 debugfs: fix check for out-of-bound xattr value
            Project: tools/e2fsprogs
            Branch: master-lustre
            Current Patch Set:
            Commit: 595b51179eaafdc1c50ab8348cb83d9429aa2bfa

            gerrit Gerrit Updater added a comment - Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/18999/ Subject: LU-7867 debugfs: fix check for out-of-bound xattr value Project: tools/e2fsprogs Branch: master-lustre Current Patch Set: Commit: 595b51179eaafdc1c50ab8348cb83d9429aa2bfa

            I suspect that e2fsck will just consider these files as hard linked, so you will find them by looking for regular files under O/0/ with nlink=2.

            adilger Andreas Dilger added a comment - I suspect that e2fsck will just consider these files as hard linked, so you will find them by looking for regular files under O/0/ with nlink=2.
            ezell Matt Ezell added a comment -

            I think the strange PFID EA is coming from fid_flatten. It takes the FID and does

            ino = (seq << 24) + ((seq >> 24) & 0xffffff0000ULL) + fid_oid(fid);

            to make the 128-bit FID fit in a 64-bit inode. I didn't specifically see this in the log, but I believe this is what the client is setting as PFID. Since the 1.8 clients are going away, it's not worth looking into it further. We'll make sure to test LFSCK in 2.8 against objects created under 1.8 before upgrading the server.

            So, back to the original issue at hand. I guess we need to take a downtime to run e2fsck. There could be two outcomes:

            • e2fsck identifies these objects as damaged and repairs or unlinks them
            • It doesn't notice the issue and runs cleanly

            If situation #2 happens, should we manually unlink them? Then, how do we ensure there aren't more damaged objects on this OST?

            ezell Matt Ezell added a comment - I think the strange PFID EA is coming from fid_flatten . It takes the FID and does ino = (seq << 24) + ((seq >> 24) & 0xffffff0000ULL) + fid_oid(fid); to make the 128-bit FID fit in a 64-bit inode. I didn't specifically see this in the log, but I believe this is what the client is setting as PFID. Since the 1.8 clients are going away, it's not worth looking into it further. We'll make sure to test LFSCK in 2.8 against objects created under 1.8 before upgrading the server. So, back to the original issue at hand. I guess we need to take a downtime to run e2fsck. There could be two outcomes: e2fsck identifies these objects as damaged and repairs or unlinks them It doesn't notice the issue and runs cleanly If situation #2 happens, should we manually unlink them? Then, how do we ensure there aren't more damaged objects on this OST?

            It is the client to send the PFID information to the OST when write/setattr. According to your description, the 1.8 client may sent "ostid" formatted PFID to the OST, then caused the strange PFID EA. Usually, as long as the OST-object is NOT real orphan, then after upgrading your system to Lustre-2.6 or newer, the layout LFSCK will handle it as unmatched MDT-OST objects pairs, and will correct the PFID EA with the right MDT-object's FID.

            yong.fan nasf (Inactive) added a comment - It is the client to send the PFID information to the OST when write/setattr. According to your description, the 1.8 client may sent "ostid" formatted PFID to the OST, then caused the strange PFID EA. Usually, as long as the OST-object is NOT real orphan, then after upgrading your system to Lustre-2.6 or newer, the layout LFSCK will handle it as unmatched MDT-OST objects pairs, and will correct the PFID EA with the right MDT-object's FID.
            ezell Matt Ezell added a comment -

            For this file system, we have a mixture of clients - most are 2.5, but we also have some 1.8 and 2.7. I just ran some tests, and it seems that new files created under 2.5 or 2.7 are OK. But the file I created on the 1.8 client shows the strange PFID EA.

            The 1.8 clients are going away soon, so no need to debug that client, but we do have a ton of files on disk with the strange PFID EA. My main concern is that a future lfsck run (under 2.8 servers, eventually) might mark these files as damaged and/or orphaned.

            I'll run some tests with debugging later today and upload relevant logs. Thanks.

            ezell Matt Ezell added a comment - For this file system, we have a mixture of clients - most are 2.5, but we also have some 1.8 and 2.7. I just ran some tests, and it seems that new files created under 2.5 or 2.7 are OK. But the file I created on the 1.8 client shows the strange PFID EA. The 1.8 clients are going away soon, so no need to debug that client, but we do have a ton of files on disk with the strange PFID EA. My main concern is that a future lfsck run (under 2.8 servers, eventually) might mark these files as damaged and/or orphaned. I'll run some tests with debugging later today and upload relevant logs. Thanks.

            Matt,

            Is the "test" an existing file (updated from old device) or new created one? If it is the latter case, how do you generate it? and would you please to retest test with -1 level debug collected on the client and related OST?

            Your system is Lustre-2.5.5 based (both client and server), right? any additional patches on that?

            (BTW, I have tried Lustre-2.5.5 in my local VM environment, but cannot reproduce your trouble - strange PFID EA. So I need more logs from you.)

            Thanks!

            yong.fan nasf (Inactive) added a comment - Matt, Is the "test" an existing file (updated from old device) or new created one? If it is the latter case, how do you generate it? and would you please to retest test with -1 level debug collected on the client and related OST? Your system is Lustre-2.5.5 based (both client and server), right? any additional patches on that? (BTW, I have tried Lustre-2.5.5 in my local VM environment, but cannot reproduce your trouble - strange PFID EA. So I need more logs from you.) Thanks!
            adilger Andreas Dilger added a comment - - edited

            No, the parent doesn't look correct. It should exactly match the output from lfs fid2path, but it looks offset by 3 bytes for some reason. We can add the fix into the same patch, no need for a new ticket.

            adilger Andreas Dilger added a comment - - edited No, the parent doesn't look correct. It should exactly match the output from lfs fid2path , but it looks offset by 3 bytes for some reason. We can add the fix into the same patch, no need for a new ticket.
            ezell Matt Ezell added a comment -

            Andreas - is the parent fid being printed correctly? I ran a quick test on a non-corrupted OST:

            # lfs getstripe test
            test
            lmm_stripe_count:   4
            lmm_stripe_size:    1048576
            lmm_pattern:        1
            lmm_layout_gen:     0
            lmm_stripe_offset:  123
                    obdidx           objid           objid           group
                       123          120572        0x1d6fc                0
                       124          120522        0x1d6ca                0
                       125          120541        0x1d6dd                0
                       126          120607        0x1d71f                0
            # lfs path2fid test
            [0x200000410:0x2254:0x0]
            
            # debugfs -c -R 'stat O/0/d28/120572' /dev/mapper/f1-ddn1a-l24
            ...
            Extended attributes stored in inode body: 
              lma = "00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 fc d6 01 00 00 00 00 00 " (24)
              lma: fid=[0x100000000:0x1d6fc:0x0] compat=0 incompat=0
              fid = "54 22 00 10 04 00 00 02 00 00 00 00 00 00 00 00 " (16)
              fid: parent=[0x200000410002254:0x0:0x0] stripe=0
            

            Should I open a separate ticket for that?

            ezell Matt Ezell added a comment - Andreas - is the parent fid being printed correctly? I ran a quick test on a non-corrupted OST: # lfs getstripe test test lmm_stripe_count: 4 lmm_stripe_size: 1048576 lmm_pattern: 1 lmm_layout_gen: 0 lmm_stripe_offset: 123 obdidx objid objid group 123 120572 0x1d6fc 0 124 120522 0x1d6ca 0 125 120541 0x1d6dd 0 126 120607 0x1d71f 0 # lfs path2fid test [0x200000410:0x2254:0x0] # debugfs -c -R 'stat O/0/d28/120572' /dev/mapper/f1-ddn1a-l24 ... Extended attributes stored in inode body: lma = "00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 fc d6 01 00 00 00 00 00 " (24) lma: fid=[0x100000000:0x1d6fc:0x0] compat=0 incompat=0 fid = "54 22 00 10 04 00 00 02 00 00 00 00 00 00 00 00 " (16) fid: parent=[0x200000410002254:0x0:0x0] stripe=0 Should I open a separate ticket for that?
            ezell Matt Ezell added a comment -

            I used the build from Jenkins and it now correctly shows the EAs for /O/0/d8/54354408:

              lma = "08 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 01 e4 20 03 00 00 00 00 " (24)
              lma: fid=[0x100000000:0x320e401:0x0] compat=8 incompat=0
              fid = "f8 95 00 22 29 03 00 02 00 00 00 00 02 00 00 00 " (16)
              fid: parent=[0x2000329220095f8:0x0:0x0] stripe=2
            
            ezell Matt Ezell added a comment - I used the build from Jenkins and it now correctly shows the EAs for /O/0/d8/54354408: lma = "08 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 01 e4 20 03 00 00 00 00 " (24) lma: fid=[0x100000000:0x320e401:0x0] compat=8 incompat=0 fid = "f8 95 00 22 29 03 00 02 00 00 00 00 02 00 00 00 " (16) fid: parent=[0x2000329220095f8:0x0:0x0] stripe=2

            Updated patch http://review.whamcloud.com/18999 should now resolve the problem introduced in the upstream code.

            adilger Andreas Dilger added a comment - Updated patch http://review.whamcloud.com/18999 should now resolve the problem introduced in the upstream code.

            People

              yong.fan nasf (Inactive)
              dustb100 Dustin Leverman
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: