Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11737

LustreError: 11060:0:(osd_handler.c:3985:osd_xattr_set()) ASSERTION( handle ) failed:

Details

    • 1
    • 9223372036854775807

    Description

      This is related to LU-11584.

      1. umounted the targets
      2. mount mdt as ldiskfs
      3. deleted all the quarantined files
      4. reboot and remount targets

      Hit LBUG.

      [ 1781.324368] LustreError: 11060:0:(osd_handler.c:3985:osd_xattr_set()) ASSERTION( handle ) failed:
      [ 1781.351312] LustreError: 11060:0:(osd_handler.c:3985:osd_xattr_set()) LBUG

       

      I tried to mount with noscurb still hitting the lbug.

       

      Filesystem keeps crashing with lbug.

       

      Attachments

        Issue Links

          Activity

            [LU-11737] LustreError: 11060:0:(osd_handler.c:3985:osd_xattr_set()) ASSERTION( handle ) failed:

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33827/
            Subject: LU-11737 lfsck: do not ignore dryrun
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: 231649a4e93e5a630af8c0b5715da51a92cfb679

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33827/ Subject: LU-11737 lfsck: do not ignore dryrun Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: 231649a4e93e5a630af8c0b5715da51a92cfb679
            pjones Peter Jones added a comment -

            Landed for 2.13

            pjones Peter Jones added a comment - Landed for 2.13

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33826/
            Subject: LU-11737 lfsck: do not ignore dryrun
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 875f3fc03aa15049892fe19d6a4fc1132848fced

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33826/ Subject: LU-11737 lfsck: do not ignore dryrun Project: fs/lustre-release Branch: master Current Patch Set: Commit: 875f3fc03aa15049892fe19d6a4fc1132848fced

            Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33827
            Subject: LU-11737 lfsck: do not ignore dryrun
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: 5ed85a4a03eb39f04d7b7379acc8b52f046eeb28

            gerrit Gerrit Updater added a comment - Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33827 Subject: LU-11737 lfsck: do not ignore dryrun Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: 5ed85a4a03eb39f04d7b7379acc8b52f046eeb28

            Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33826
            Subject: LU-11737 lfsck: do not ignore dryrun
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: e7a7d9836e8b3d51b917e84e425195130db7d2c5

            gerrit Gerrit Updater added a comment - Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33826 Subject: LU-11737 lfsck: do not ignore dryrun Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: e7a7d9836e8b3d51b917e84e425195130db7d2c5
            bzzz Alex Zhuravlev added a comment - - edited

            I found one possible cause.. trying to reproduce locally.

            bzzz Alex Zhuravlev added a comment - - edited I found one possible cause.. trying to reproduce locally.

            Any update on getting a patch to deal with the LBUG?

            mhanafi Mahmoud Hanafi added a comment - Any update on getting a patch to deal with the LBUG?

            These are probably the objects of the corrupted files that I deleted.

            mhanafi Mahmoud Hanafi added a comment - These are probably the objects of the corrupted files that I deleted.
            adilger Andreas Dilger added a comment - - edited

            Mahmoud, I also filed LUDOC-421 to add documentation for the files in .lustre/lost+found. In this case, the "-R-" means "The orphan OST-object knows its parent MDT-object FID, but does not know the position (the file name) in the layout."

            Just to confirm that the file is what we expect, you could do "lfs getstripe -F /nobackupp13/.lustre/lost+found/MDT0000/[0x20000205c:0x15dc1:0x0]-R-0" to get the FID from that file (it should probably be [0x20000205c:0x15dc1:0x0]), and given this object has no data in it (size=0 and blocks=0, and no "filename" exists for it, you can probably just delete it.

            adilger Andreas Dilger added a comment - - edited Mahmoud, I also filed LUDOC-421 to add documentation for the files in .lustre/lost+found . In this case, the " -R- " means "The orphan OST-object knows its parent MDT-object FID, but does not know the position (the file name) in the layout." Just to confirm that the file is what we expect, you could do " lfs getstripe -F /nobackupp13/.lustre/lost+found/MDT0000/ [0x20000205c:0x15dc1:0x0] -R-0 " to get the FID from that file (it should probably be [0x20000205c:0x15dc1:0x0] ), and given this object has no data in it (size=0 and blocks=0, and no "filename" exists for it, you can probably just delete it.
            mhanafi Mahmoud Hanafi added a comment - - edited

            We can lower the prio down to 2.

            Found 0x20000205c:0x15dc1:0x0
            it was in lost+found
            .lustre/lost+found/MDT0000/'[0x20000205c:0x15dc1:0x0]-R-0'

            tpfe2 /nobackupp13/.lustre/lost+found/MDT0000 # stat '[0x20000205c:0x15dc1:0x0]-R-0' 
              File: '[0x20000205c:0x15dc1:0x0]-R-0'
              Size: 0               Blocks: 0          IO Block: 4194304 regular empty file
            Device: d7d58d2eh/3621096750d   Inode: 144115327058402753  Links: 1
            Access: (0400/-r--------)  Uid: (30757/ spocops)   Gid: (41548/   s1548)
            Access: 2018-12-06 12:09:17.000000000 -0800
            Modify: 2018-12-06 12:09:17.000000000 -0800
            Change: 2018-12-06 12:09:17.000000000 -0800
             Birth: -
            
            mhanafi Mahmoud Hanafi added a comment - - edited We can lower the prio down to 2. Found 0x20000205c:0x15dc1:0x0 it was in lost+found .lustre/lost+found/MDT0000/' [0x20000205c:0x15dc1:0x0] -R-0' tpfe2 /nobackupp13/.lustre/lost+found/MDT0000 # stat '[0x20000205c:0x15dc1:0x0]-R-0' File: '[0x20000205c:0x15dc1:0x0]-R-0' Size: 0 Blocks: 0 IO Block: 4194304 regular empty file Device: d7d58d2eh/3621096750d Inode: 144115327058402753 Links: 1 Access: (0400/-r--------) Uid: (30757/ spocops) Gid: (41548/ s1548) Access: 2018-12-06 12:09:17.000000000 -0800 Modify: 2018-12-06 12:09:17.000000000 -0800 Change: 2018-12-06 12:09:17.000000000 -0800 Birth: -
            mhanafi Mahmoud Hanafi added a comment - - edited

            Here is the object info

            Inode: 1762289   Type: regular    Mode:  07666   Flags: 0x80000
            Generation: 2205332265    Version: 0x00000000:00000000
            User: 30757   Group: 41548   Project:     0   Size: 0
            File ACL: 0
            Links: 1   Blockcount: 0
            Fragment:  Address: 0    Number: 0    Size: 0
             ctime: 0x5bd00854:00000000 -- Tue Oct 23 22:51:16 2018
             atime: 0x00000000:00000000 -- Wed Dec 31 16:00:00 1969
             mtime: 0x00000000:00000000 -- Wed Dec 31 16:00:00 1969
            crtime: 0x5bd00837:b8168050 -- Tue Oct 23 22:50:47 2018
            Size of extra inode fields: 32
            Extended attributes:
              trusted.lma (24) = 08 00 00 00 00 00 00 00 00 00 14 00 01 00 00 00 60 1b 21 00 00 00 00 00 
              lma: fid=[0x100140000:0x211b60:0x0] compat=8 incompat=0
              trusted.fid (44)
              fid: parent=[0x20000205c:0x15dc1:0x0] stripe=0 stripe_size=1048576 stripe_count=8 component_id=3 component_start=17179869184 component_end=68719476736
            EXTENTS:
            

            \I scanned the MDT this 0x20000205c:0x15dc1:0x0 doesn't exist.

            mhanafi Mahmoud Hanafi added a comment - - edited Here is the object info Inode: 1762289 Type: regular Mode: 07666 Flags: 0x80000 Generation: 2205332265 Version: 0x00000000:00000000 User: 30757 Group: 41548 Project: 0 Size: 0 File ACL: 0 Links: 1 Blockcount: 0 Fragment: Address: 0 Number : 0 Size: 0 ctime: 0x5bd00854:00000000 -- Tue Oct 23 22:51:16 2018 atime: 0x00000000:00000000 -- Wed Dec 31 16:00:00 1969 mtime: 0x00000000:00000000 -- Wed Dec 31 16:00:00 1969 crtime: 0x5bd00837:b8168050 -- Tue Oct 23 22:50:47 2018 Size of extra inode fields: 32 Extended attributes: trusted.lma (24) = 08 00 00 00 00 00 00 00 00 00 14 00 01 00 00 00 60 1b 21 00 00 00 00 00 lma: fid=[0x100140000:0x211b60:0x0] compat=8 incompat=0 trusted.fid (44) fid: parent=[0x20000205c:0x15dc1:0x0] stripe=0 stripe_size=1048576 stripe_count=8 component_id=3 component_start=17179869184 component_end=68719476736 EXTENTS: \I scanned the MDT this 0x20000205c:0x15dc1:0x0 doesn't exist.

            People

              bzzz Alex Zhuravlev
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: