Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7867

OI scrubber causing performance issues

Details

    • Question/Request
    • Resolution: Fixed
    • Major
    • None
    • Lustre 2.5.5
    • None
    • 9223372036854775807

    Description

      OI scrubber causing performance issues.

      • There are orphaned objects that are causing the OI scrubber to start.
      • Is there an online or offline way of running lfsck at the moment?

      Attachments

        Issue Links

          Activity

            [LU-7867] OI scrubber causing performance issues
            ezell Matt Ezell added a comment -

            I used the build from Jenkins and it now correctly shows the EAs for /O/0/d8/54354408:

              lma = "08 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 01 e4 20 03 00 00 00 00 " (24)
              lma: fid=[0x100000000:0x320e401:0x0] compat=8 incompat=0
              fid = "f8 95 00 22 29 03 00 02 00 00 00 00 02 00 00 00 " (16)
              fid: parent=[0x2000329220095f8:0x0:0x0] stripe=2
            
            ezell Matt Ezell added a comment - I used the build from Jenkins and it now correctly shows the EAs for /O/0/d8/54354408: lma = "08 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 01 e4 20 03 00 00 00 00 " (24) lma: fid=[0x100000000:0x320e401:0x0] compat=8 incompat=0 fid = "f8 95 00 22 29 03 00 02 00 00 00 00 02 00 00 00 " (16) fid: parent=[0x2000329220095f8:0x0:0x0] stripe=2

            Updated patch http://review.whamcloud.com/18999 should now resolve the problem introduced in the upstream code.

            adilger Andreas Dilger added a comment - Updated patch http://review.whamcloud.com/18999 should now resolve the problem introduced in the upstream code.
            adilger Andreas Dilger added a comment - - edited

            Sorry, committed this comment too quickly.

            Something strange is going on here. I'm just checking my local MDT and OST filesystems with debugfs 1.42.12.wc1 and it is printing the xattrs values properly. Checking with debugfs 1.42.13.wc4 appears to have the same problem, looking into it more closely.

            adilger Andreas Dilger added a comment - - edited Sorry, committed this comment too quickly. Something strange is going on here. I'm just checking my local MDT and OST filesystems with debugfs 1.42.12.wc1 and it is printing the xattrs values properly. Checking with debugfs 1.42.13.wc4 appears to have the same problem, looking into it more closely.

            Matt,

            Would you please to apply above patch on your e2fsprogs that fixed the debugfs issue, and please run the patched debugfs on your system.

            Thanks!

            yong.fan nasf (Inactive) added a comment - Matt, Would you please to apply above patch on your e2fsprogs that fixed the debugfs issue, and please run the patched debugfs on your system. Thanks!

            Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/18999
            Subject: LU-7867 debugfs: locate EA value correctly
            Project: tools/e2fsprogs
            Branch: master-lustre
            Current Patch Set: 1
            Commit: 4f64aa8135d0865fba88c3ee49981362ec0cb994

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/18999 Subject: LU-7867 debugfs: locate EA value correctly Project: tools/e2fsprogs Branch: master-lustre Current Patch Set: 1 Commit: 4f64aa8135d0865fba88c3ee49981362ec0cb994
            ezell Matt Ezell added a comment -

            Note from my previous comment that I manually set the value length to 23 to avoid the 'invalid EA' message

            I ran debugfs under gdb and manually set entry->e_value_size = 23.

            We are using the latest version available from Intel: e2fsprogs-1.42.13.wc4-7.el6.x86_64.rpm

            [root@f1-oss1d8 ~]# rpm -qi e2fsprogs
            Name        : e2fsprogs                    Relocations: (not relocatable)
            Version     : 1.42.13.wc4                       Vendor: (none)
            Release     : 7.el6                         Build Date: Fri Dec 11 19:51:16 2015
            Install Date: Wed Feb 24 17:44:21 2016         Build Host: onyx-8-sde1-el6-x8664.onyx.hpdd.intel.com
            Group       : System Environment/Base       Source RPM: e2fsprogs-1.42.13.wc4-7.el6.src.rpm
            Size        : 3170704                          License: GPLv2
            Signature   : (none)
            URL         : https://downloads.hpdd.intel.com/public/e2fsprogs/
            Summary     : Utilities for managing ext2, ext3, and ext4 filesystems
            Description :
            The e2fsprogs package contains a number of utilities for creating,
            checking, modifying, and correcting any inconsistencies in second,
            third and fourth extended (ext2/ext3/ext4) filesystems. E2fsprogs
            contains e2fsck (used to repair filesystem inconsistencies after an
            unclean shutdown), mke2fs (used to initialize a partition to contain
            an empty ext2 filesystem), debugfs (used to examine the internal
            structure of a filesystem, to manually repair a corrupted
            filesystem, or to create test cases for e2fsck), tune2fs (used to
            modify filesystem parameters), and most of the other core ext2fs
            filesystem utilities.
            
            You should install the e2fsprogs package if you need to manage the
            performance of an ext2, ext3, or ext4 filesystem.
            
            ezell Matt Ezell added a comment - Note from my previous comment that I manually set the value length to 23 to avoid the 'invalid EA' message I ran debugfs under gdb and manually set entry->e_value_size = 23. We are using the latest version available from Intel : e2fsprogs-1.42.13.wc4-7.el6.x86_64.rpm [root@f1-oss1d8 ~]# rpm -qi e2fsprogs Name : e2fsprogs Relocations: (not relocatable) Version : 1.42.13.wc4 Vendor: (none) Release : 7.el6 Build Date: Fri Dec 11 19:51:16 2015 Install Date: Wed Feb 24 17:44:21 2016 Build Host: onyx-8-sde1-el6-x8664.onyx.hpdd.intel.com Group : System Environment/Base Source RPM: e2fsprogs-1.42.13.wc4-7.el6.src.rpm Size : 3170704 License: GPLv2 Signature : (none) URL : https://downloads.hpdd.intel.com/public/e2fsprogs/ Summary : Utilities for managing ext2, ext3, and ext4 filesystems Description : The e2fsprogs package contains a number of utilities for creating, checking, modifying, and correcting any inconsistencies in second, third and fourth extended (ext2/ext3/ext4) filesystems. E2fsprogs contains e2fsck (used to repair filesystem inconsistencies after an unclean shutdown), mke2fs (used to initialize a partition to contain an empty ext2 filesystem), debugfs (used to examine the internal structure of a filesystem, to manually repair a corrupted filesystem, or to create test cases for e2fsck), tune2fs (used to modify filesystem parameters), and most of the other core ext2fs filesystem utilities. You should install the e2fsprogs package if you need to manage the performance of an ext2, ext3, or ext4 filesystem.

            The output of debugfs seems wrong, because the OSD got the self FID as [0x100000000:0x320e401:0x0], but the debugfs output is: lma = "08 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 01 e4 20 03 00 00 00 " (23). So the value offset is wrong.

            Which version e2fsprogs are you using? Is it possible to try newer e2fsprogs?
            Before resolve the debugfs issue, please NOT remove these OST-objects manually, because we are not sure whether they are really the targets to be destroyed.

            yong.fan nasf (Inactive) added a comment - The output of debugfs seems wrong, because the OSD got the self FID as [0x100000000:0x320e401:0x0] , but the debugfs output is: lma = "08 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 01 e4 20 03 00 00 00 " (23). So the value offset is wrong. Which version e2fsprogs are you using? Is it possible to try newer e2fsprogs? Before resolve the debugfs issue, please NOT remove these OST-objects manually, because we are not sure whether they are really the targets to be destroyed.
            ezell Matt Ezell added a comment -

            The check that is causing debugfs to mark the EA data as invalid is

            (!ea_inode && value + entry->e_value_size >= end)
            

            because for LMA, entry->e_value_size = 24 and value + entry->e_value_size = end. I ran debugfs under gdb and manually set entry->e_value_size = 23. This resulted in the following output:

            Inode: 562889   Type: regular    Mode:  0666   Flags: 0x80000
            Generation: 833716884    Version: 0x0000000e:00c7ec1c
            User:   800   Group:   502   Size: 1048576
            File ACL: 0    Directory ACL: 0
            Links: 1   Blockcount: 2048
            Fragment:  Address: 0    Number: 0    Size: 0
             ctime: 0x552a389a:00000000 -- Sun Apr 12 09:19:22 2015
             atime: 0x566c289a:c40af440 -- Sat Dec 12 14:00:58 2015
             mtime: 0x552a389a:00000000 -- Sun Apr 12 09:19:22 2015
            crtime: 0x552a3831:79d980f4 -- Sun Apr 12 09:17:37 2015
            Size of extra inode fields: 28
            Extended attributes stored in inode body: 
              lma = "08 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 01 e4 20 03 00 00 00 " (23)
              fid = "f8 95 00 22 29 03 00 02 00 00 00 00 02 00 00 00 " (16)
              fid: parent=[0x2000329220095f8:0x0:0x0] stripe=2
            EXTENTS:
            (0-254):17259-17513, (255):17514
            

            I don't think that formatting for the parent fid is correct, and feeding it directly into fid2path results in invalid argument. I think the sequence provided is actually the sequence+objid (is this a bug? should this print a 'valid' fid?) so I tried:

            # lfs fid2path /lustre/f1 0x200032922:0x95f8:0x0
            fid2path: error on FID 0x200032922:0x95f8:0x0: No such file or directory
            

            This is expected, since the RPC that causes this OI lookup is trying to destroy the object anyway. My guess is that the MDS has deleted its inode and it added all the objects to the llog for removal. llog processing sent an unlink RPC to the OST, but due to this damaged object returned -115. I'm not familiar with llog processing, but I would hazard to guess that it retries the unlink (since we are seeing this RPC every ~600 seconds).

            So I guess the options at this point are:

            • Run the newer e2fsck, hope it finds and fixes or unlinks these problematic objects
            • Just unlink these objects, since that's what the MDS is trying to do anyway

            Since this has happened before (see LU-7378) we are worried that this might keep happening. Is there any way to scan the objects and determine if any more are in this state?

            ezell Matt Ezell added a comment - The check that is causing debugfs to mark the EA data as invalid is (!ea_inode && value + entry->e_value_size >= end) because for LMA, entry->e_value_size = 24 and value + entry->e_value_size = end. I ran debugfs under gdb and manually set entry->e_value_size = 23. This resulted in the following output: Inode: 562889 Type: regular Mode: 0666 Flags: 0x80000 Generation: 833716884 Version: 0x0000000e:00c7ec1c User: 800 Group: 502 Size: 1048576 File ACL: 0 Directory ACL: 0 Links: 1 Blockcount: 2048 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x552a389a:00000000 -- Sun Apr 12 09:19:22 2015 atime: 0x566c289a:c40af440 -- Sat Dec 12 14:00:58 2015 mtime: 0x552a389a:00000000 -- Sun Apr 12 09:19:22 2015 crtime: 0x552a3831:79d980f4 -- Sun Apr 12 09:17:37 2015 Size of extra inode fields: 28 Extended attributes stored in inode body: lma = "08 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 01 e4 20 03 00 00 00 " (23) fid = "f8 95 00 22 29 03 00 02 00 00 00 00 02 00 00 00 " (16) fid: parent=[0x2000329220095f8:0x0:0x0] stripe=2 EXTENTS: (0-254):17259-17513, (255):17514 I don't think that formatting for the parent fid is correct, and feeding it directly into fid2path results in invalid argument. I think the sequence provided is actually the sequence+objid (is this a bug? should this print a 'valid' fid?) so I tried: # lfs fid2path /lustre/f1 0x200032922:0x95f8:0x0 fid2path: error on FID 0x200032922:0x95f8:0x0: No such file or directory This is expected, since the RPC that causes this OI lookup is trying to destroy the object anyway. My guess is that the MDS has deleted its inode and it added all the objects to the llog for removal. llog processing sent an unlink RPC to the OST, but due to this damaged object returned -115. I'm not familiar with llog processing, but I would hazard to guess that it retries the unlink (since we are seeing this RPC every ~600 seconds). So I guess the options at this point are: Run the newer e2fsck, hope it finds and fixes or unlinks these problematic objects Just unlink these objects, since that's what the MDS is trying to do anyway Since this has happened before (see LU-7378 ) we are worried that this might keep happening. Is there any way to scan the objects and determine if any more are in this state?

            [root@f1-oss1d8 ~]# debugfs -c -R 'stat O/0/d8/54354408' /dev/mapper/f1-ddn1d-l53
            debugfs 1.42.13.wc4 (28-Nov-2015)
            /dev/mapper/f1-ddn1d-l53: catastrophic mode - not reading inode or group bitmaps
            Inode: 562889 Type: regular Mode: 0666 Flags: 0x80000
            Generation: 833716884 Version: 0x0000000e:00c7ec1c
            User: 800 Group: 502 Size: 1048576
            File ACL: 0 Directory ACL: 0
            Links: 1 Blockcount: 2048
            Fragment: Address: 0 Number: 0 Size: 0
            ctime: 0x552a389a:00000000 – Sun Apr 12 09:19:22 2015
            atime: 0x566c289a:c40af440 – Sat Dec 12 14:00:58 2015
            mtime: 0x552a389a:00000000 – Sun Apr 12 09:19:22 2015
            crtime: 0x552a3831:79d980f4 – Sun Apr 12 09:17:37 2015
            Size of extra inode fields: 28
            Extended attributes stored in inode body:
            invalid EA entry in inode
            EXTENTS:
            (0-254):17259-17513, (255):17514

            That means the OI mapping for FID [0x100000000:0x33d61e8:0x0] is mapped to the inode 562889 with the name entry "/O/0/d8/54354408", but such name entry is wrong. The inode is referenced by the name entry "/O/0/d1/52487169", that has been verified via the FID-in-LMA. The info of "invalid EA entry in inode" does not means the inode corruption, because the OSD could read LMA EA. I would suggest you to update the e2fsprogs, that may be helpful, and if we can get the PFID EA from the inode 562889, then we can know whether related MDT-object is still there or not.

            Lustre OST-object is always single referenced, so the unrecognised mapping entry "/O/0/d8/54354408" is wrong. I am not sure how the corruption was generated, but your site is not the first one to hit such trouble. Anyway, run e2fsck to fix the disk level corruption is the first step.

            yong.fan nasf (Inactive) added a comment - [root@f1-oss1d8 ~] # debugfs -c -R 'stat O/0/d8/54354408' /dev/mapper/f1-ddn1d-l53 debugfs 1.42.13.wc4 (28-Nov-2015) /dev/mapper/f1-ddn1d-l53: catastrophic mode - not reading inode or group bitmaps Inode: 562889 Type: regular Mode: 0666 Flags: 0x80000 Generation: 833716884 Version: 0x0000000e:00c7ec1c User: 800 Group: 502 Size: 1048576 File ACL: 0 Directory ACL: 0 Links: 1 Blockcount: 2048 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x552a389a:00000000 – Sun Apr 12 09:19:22 2015 atime: 0x566c289a:c40af440 – Sat Dec 12 14:00:58 2015 mtime: 0x552a389a:00000000 – Sun Apr 12 09:19:22 2015 crtime: 0x552a3831:79d980f4 – Sun Apr 12 09:17:37 2015 Size of extra inode fields: 28 Extended attributes stored in inode body: invalid EA entry in inode EXTENTS: (0-254):17259-17513, (255):17514 That means the OI mapping for FID [0x100000000:0x33d61e8:0x0] is mapped to the inode 562889 with the name entry "/O/0/d8/54354408", but such name entry is wrong. The inode is referenced by the name entry "/O/0/d1/52487169", that has been verified via the FID-in-LMA. The info of "invalid EA entry in inode" does not means the inode corruption, because the OSD could read LMA EA. I would suggest you to update the e2fsprogs, that may be helpful, and if we can get the PFID EA from the inode 562889, then we can know whether related MDT-object is still there or not. Lustre OST-object is always single referenced, so the unrecognised mapping entry "/O/0/d8/54354408" is wrong. I am not sure how the corruption was generated, but your site is not the first one to hit such trouble. Anyway, run e2fsck to fix the disk level corruption is the first step.

            Andreas,
            This LUN has had corruption issues in the past (last April to be exact). We did run an e2fsck on it until it ran clean, however that was using an older version of e2fsprogs. Perhaps the new version would find issues that the last version could not, and we can check that. It will take a week or two to schedule a downtime with the customer to perform maintenance.

            I don't know if this is telling or not but $(degugfs -c -R 'stats' <LUN>) is telling us that the LUN is clean. Also, we did not open a Jira ticket on this last corruption issue, it was handled with DDN.

            Thanks,
            Dustin

            dustb100 Dustin Leverman added a comment - Andreas, This LUN has had corruption issues in the past (last April to be exact). We did run an e2fsck on it until it ran clean, however that was using an older version of e2fsprogs. Perhaps the new version would find issues that the last version could not, and we can check that. It will take a week or two to schedule a downtime with the customer to perform maintenance. I don't know if this is telling or not but $(degugfs -c -R 'stats' <LUN>) is telling us that the LUN is clean. Also, we did not open a Jira ticket on this last corruption issue, it was handled with DDN. Thanks, Dustin

            Depending on what kind of corruption is detected by e2fsck, it might be best to just let "e2fsck -fy" run and fix the problems, then ll_recover_lost_found_objs to recover anything left in lost+found.

            Has there been any corruption or other disk errors reported on this system?

            adilger Andreas Dilger added a comment - Depending on what kind of corruption is detected by e2fsck, it might be best to just let "e2fsck -fy" run and fix the problems, then ll_recover_lost_found_objs to recover anything left in lost+found. Has there been any corruption or other disk errors reported on this system?

            People

              yong.fan nasf (Inactive)
              dustb100 Dustin Leverman
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: