Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11737

LustreError: 11060:0:(osd_handler.c:3985:osd_xattr_set()) ASSERTION( handle ) failed:

Details

    • 1
    • 9223372036854775807

    Description

      This is related to LU-11584.

      1. umounted the targets
      2. mount mdt as ldiskfs
      3. deleted all the quarantined files
      4. reboot and remount targets

      Hit LBUG.

      [ 1781.324368] LustreError: 11060:0:(osd_handler.c:3985:osd_xattr_set()) ASSERTION( handle ) failed:
      [ 1781.351312] LustreError: 11060:0:(osd_handler.c:3985:osd_xattr_set()) LBUG

       

      I tried to mount with noscurb still hitting the lbug.

       

      Filesystem keeps crashing with lbug.

       

      Attachments

        Issue Links

          Activity

            [LU-11737] LustreError: 11060:0:(osd_handler.c:3985:osd_xattr_set()) ASSERTION( handle ) failed:

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33826/
            Subject: LU-11737 lfsck: do not ignore dryrun
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 875f3fc03aa15049892fe19d6a4fc1132848fced

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33826/ Subject: LU-11737 lfsck: do not ignore dryrun Project: fs/lustre-release Branch: master Current Patch Set: Commit: 875f3fc03aa15049892fe19d6a4fc1132848fced

            Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33827
            Subject: LU-11737 lfsck: do not ignore dryrun
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: 5ed85a4a03eb39f04d7b7379acc8b52f046eeb28

            gerrit Gerrit Updater added a comment - Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33827 Subject: LU-11737 lfsck: do not ignore dryrun Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: 5ed85a4a03eb39f04d7b7379acc8b52f046eeb28

            Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33826
            Subject: LU-11737 lfsck: do not ignore dryrun
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: e7a7d9836e8b3d51b917e84e425195130db7d2c5

            gerrit Gerrit Updater added a comment - Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33826 Subject: LU-11737 lfsck: do not ignore dryrun Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: e7a7d9836e8b3d51b917e84e425195130db7d2c5
            bzzz Alex Zhuravlev added a comment - - edited

            I found one possible cause.. trying to reproduce locally.

            bzzz Alex Zhuravlev added a comment - - edited I found one possible cause.. trying to reproduce locally.

            Any update on getting a patch to deal with the LBUG?

            mhanafi Mahmoud Hanafi added a comment - Any update on getting a patch to deal with the LBUG?

            These are probably the objects of the corrupted files that I deleted.

            mhanafi Mahmoud Hanafi added a comment - These are probably the objects of the corrupted files that I deleted.
            adilger Andreas Dilger added a comment - - edited

            Mahmoud, I also filed LUDOC-421 to add documentation for the files in .lustre/lost+found. In this case, the "-R-" means "The orphan OST-object knows its parent MDT-object FID, but does not know the position (the file name) in the layout."

            Just to confirm that the file is what we expect, you could do "lfs getstripe -F /nobackupp13/.lustre/lost+found/MDT0000/[0x20000205c:0x15dc1:0x0]-R-0" to get the FID from that file (it should probably be [0x20000205c:0x15dc1:0x0]), and given this object has no data in it (size=0 and blocks=0, and no "filename" exists for it, you can probably just delete it.

            adilger Andreas Dilger added a comment - - edited Mahmoud, I also filed LUDOC-421 to add documentation for the files in .lustre/lost+found . In this case, the " -R- " means "The orphan OST-object knows its parent MDT-object FID, but does not know the position (the file name) in the layout." Just to confirm that the file is what we expect, you could do " lfs getstripe -F /nobackupp13/.lustre/lost+found/MDT0000/ [0x20000205c:0x15dc1:0x0] -R-0 " to get the FID from that file (it should probably be [0x20000205c:0x15dc1:0x0] ), and given this object has no data in it (size=0 and blocks=0, and no "filename" exists for it, you can probably just delete it.
            mhanafi Mahmoud Hanafi added a comment - - edited

            We can lower the prio down to 2.

            Found 0x20000205c:0x15dc1:0x0
            it was in lost+found
            .lustre/lost+found/MDT0000/'[0x20000205c:0x15dc1:0x0]-R-0'

            tpfe2 /nobackupp13/.lustre/lost+found/MDT0000 # stat '[0x20000205c:0x15dc1:0x0]-R-0' 
              File: '[0x20000205c:0x15dc1:0x0]-R-0'
              Size: 0               Blocks: 0          IO Block: 4194304 regular empty file
            Device: d7d58d2eh/3621096750d   Inode: 144115327058402753  Links: 1
            Access: (0400/-r--------)  Uid: (30757/ spocops)   Gid: (41548/   s1548)
            Access: 2018-12-06 12:09:17.000000000 -0800
            Modify: 2018-12-06 12:09:17.000000000 -0800
            Change: 2018-12-06 12:09:17.000000000 -0800
             Birth: -
            
            mhanafi Mahmoud Hanafi added a comment - - edited We can lower the prio down to 2. Found 0x20000205c:0x15dc1:0x0 it was in lost+found .lustre/lost+found/MDT0000/' [0x20000205c:0x15dc1:0x0] -R-0' tpfe2 /nobackupp13/.lustre/lost+found/MDT0000 # stat '[0x20000205c:0x15dc1:0x0]-R-0' File: '[0x20000205c:0x15dc1:0x0]-R-0' Size: 0 Blocks: 0 IO Block: 4194304 regular empty file Device: d7d58d2eh/3621096750d Inode: 144115327058402753 Links: 1 Access: (0400/-r--------) Uid: (30757/ spocops) Gid: (41548/ s1548) Access: 2018-12-06 12:09:17.000000000 -0800 Modify: 2018-12-06 12:09:17.000000000 -0800 Change: 2018-12-06 12:09:17.000000000 -0800 Birth: -
            mhanafi Mahmoud Hanafi added a comment - - edited

            Here is the object info

            Inode: 1762289   Type: regular    Mode:  07666   Flags: 0x80000
            Generation: 2205332265    Version: 0x00000000:00000000
            User: 30757   Group: 41548   Project:     0   Size: 0
            File ACL: 0
            Links: 1   Blockcount: 0
            Fragment:  Address: 0    Number: 0    Size: 0
             ctime: 0x5bd00854:00000000 -- Tue Oct 23 22:51:16 2018
             atime: 0x00000000:00000000 -- Wed Dec 31 16:00:00 1969
             mtime: 0x00000000:00000000 -- Wed Dec 31 16:00:00 1969
            crtime: 0x5bd00837:b8168050 -- Tue Oct 23 22:50:47 2018
            Size of extra inode fields: 32
            Extended attributes:
              trusted.lma (24) = 08 00 00 00 00 00 00 00 00 00 14 00 01 00 00 00 60 1b 21 00 00 00 00 00 
              lma: fid=[0x100140000:0x211b60:0x0] compat=8 incompat=0
              trusted.fid (44)
              fid: parent=[0x20000205c:0x15dc1:0x0] stripe=0 stripe_size=1048576 stripe_count=8 component_id=3 component_start=17179869184 component_end=68719476736
            EXTENTS:
            

            \I scanned the MDT this 0x20000205c:0x15dc1:0x0 doesn't exist.

            mhanafi Mahmoud Hanafi added a comment - - edited Here is the object info Inode: 1762289 Type: regular Mode: 07666 Flags: 0x80000 Generation: 2205332265 Version: 0x00000000:00000000 User: 30757 Group: 41548 Project: 0 Size: 0 File ACL: 0 Links: 1 Blockcount: 0 Fragment: Address: 0 Number : 0 Size: 0 ctime: 0x5bd00854:00000000 -- Tue Oct 23 22:51:16 2018 atime: 0x00000000:00000000 -- Wed Dec 31 16:00:00 1969 mtime: 0x00000000:00000000 -- Wed Dec 31 16:00:00 1969 crtime: 0x5bd00837:b8168050 -- Tue Oct 23 22:50:47 2018 Size of extra inode fields: 32 Extended attributes: trusted.lma (24) = 08 00 00 00 00 00 00 00 00 00 14 00 01 00 00 00 60 1b 21 00 00 00 00 00 lma: fid=[0x100140000:0x211b60:0x0] compat=8 incompat=0 trusted.fid (44) fid: parent=[0x20000205c:0x15dc1:0x0] stripe=0 stripe_size=1048576 stripe_count=8 component_id=3 component_start=17179869184 component_end=68719476736 EXTENTS: \I scanned the MDT this 0x20000205c:0x15dc1:0x0 doesn't exist.

            this is definitely useful, thanks.

            bzzz Alex Zhuravlev added a comment - this is definitely useful, thanks.
            mhanafi Mahmoud Hanafi added a comment - - edited

            ok will do.

            some additional info. Not sure if the delete record is relevant.

            Dec  6 12:09:17 nbp13-srv1 kernel: [ 5780.906822] Lustre: 16159:0:(lfsck_layout.c:1618:lfsck_layout_del_dangling_rec()) nbp13-MDT0000-osd: delete the dangling record for [0x20000205c:0x15dc1:0x0], comp_id = 4, ea_off = 15 from the trace file: rc = -2
            Dec  6 12:09:17 nbp13-srv1 kernel: [ 5780.906827] LustreError: 16159:0:(osd_handler.c:3985:osd_xattr_set()) ASSERTION( handle ) failed: 
            Dec  6 12:09:17 nbp13-srv1 kernel: [ 5780.933775] LustreError: 16159:0:(osd_handler.c:3985:osd_xattr_set()) LBUG
            Dec  6 12:09:17 nbp13-srv1 kernel: [ 5780.954442] Pid: 16159, comm: lfsck_layout 3.10.0-693.21.1.el7.20180508.x86_64.lustre2105 #1 SMP Mon Aug 27 23:04:41 UTC 2018
            Dec  6 12:09:17 nbp13-srv1 kernel: [ 5780.954442] Call Trace:
            Dec  6 12:09:17 nbp13-srv1 kernel: [ 5780.954458]  [<ffffffff8103a1f2>] save_stack_trace_tsk+0x22/0x40
            Dec  6 12:09:17 nbp13-srv1 kernel: [ 5780.954467]  [<ffffffffa06d17cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
            Dec  6 12:09:17 nbp13-srv1 kernel: [ 5780.954470]  [<ffffffffa06d187c>] lbug_with_loc+0x4c/0xa0 [libcfs]
            Dec  6 12:09:17 nbp13-srv1 kernel: [ 5780.954479]  [<ffffffffa110acdf>] osd_xattr_set+0x95f/0xc10 [osd_ldiskfs]
            Dec  6 12:09:17 nbp13-srv1 kernel: [ 5780.954492]  [<ffffffffa12093c2>] lfsck_layout_refill_lovea.isra.63+0x1c2/0x480 [lfsck]
            Dec  6 12:09:17 nbp13-srv1 kernel: [ 5780.954499]  [<ffffffffa121d0db>] lfsck_layout_recreate_lovea+0xd8b/0x2370 [lfsck]
            Dec  6 12:09:17 nbp13-srv1 kernel: [ 5780.954505]  [<ffffffffa121fada>] lfsck_layout_assistant_handler_p2+0x141a/0x1650 [lfsck]
            Dec  6 12:09:17 nbp13-srv1 kernel: [ 5780.954509]  [<ffffffffa11e27ce>] lfsck_assistant_engine+0xfce/0x20b0 [lfsck]
            Dec  6 12:09:17 nbp13-srv1 kernel: [ 5780.954511]  [<ffffffff810b1131>] kthread+0xd1/0xe0
            Dec  6 12:09:17 nbp13-srv1 kernel: [ 5780.954513]  [<ffffffff816a14dd>] ret_from_fork+0x5d/0xb0
            Dec  6 12:09:17 nbp13-srv1 kernel: [ 5780.954528]  [<ffffffffffffffff>] 0xffffffffffffffff
            
            mhanafi Mahmoud Hanafi added a comment - - edited ok will do. some additional info. Not sure if the delete record is relevant. Dec 6 12:09:17 nbp13-srv1 kernel: [ 5780.906822] Lustre: 16159:0:(lfsck_layout.c:1618:lfsck_layout_del_dangling_rec()) nbp13-MDT0000-osd: delete the dangling record for [0x20000205c:0x15dc1:0x0], comp_id = 4, ea_off = 15 from the trace file: rc = -2 Dec 6 12:09:17 nbp13-srv1 kernel: [ 5780.906827] LustreError: 16159:0:(osd_handler.c:3985:osd_xattr_set()) ASSERTION( handle ) failed: Dec 6 12:09:17 nbp13-srv1 kernel: [ 5780.933775] LustreError: 16159:0:(osd_handler.c:3985:osd_xattr_set()) LBUG Dec 6 12:09:17 nbp13-srv1 kernel: [ 5780.954442] Pid: 16159, comm: lfsck_layout 3.10.0-693.21.1.el7.20180508.x86_64.lustre2105 #1 SMP Mon Aug 27 23:04:41 UTC 2018 Dec 6 12:09:17 nbp13-srv1 kernel: [ 5780.954442] Call Trace: Dec 6 12:09:17 nbp13-srv1 kernel: [ 5780.954458] [<ffffffff8103a1f2>] save_stack_trace_tsk+0x22/0x40 Dec 6 12:09:17 nbp13-srv1 kernel: [ 5780.954467] [<ffffffffa06d17cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] Dec 6 12:09:17 nbp13-srv1 kernel: [ 5780.954470] [<ffffffffa06d187c>] lbug_with_loc+0x4c/0xa0 [libcfs] Dec 6 12:09:17 nbp13-srv1 kernel: [ 5780.954479] [<ffffffffa110acdf>] osd_xattr_set+0x95f/0xc10 [osd_ldiskfs] Dec 6 12:09:17 nbp13-srv1 kernel: [ 5780.954492] [<ffffffffa12093c2>] lfsck_layout_refill_lovea.isra.63+0x1c2/0x480 [lfsck] Dec 6 12:09:17 nbp13-srv1 kernel: [ 5780.954499] [<ffffffffa121d0db>] lfsck_layout_recreate_lovea+0xd8b/0x2370 [lfsck] Dec 6 12:09:17 nbp13-srv1 kernel: [ 5780.954505] [<ffffffffa121fada>] lfsck_layout_assistant_handler_p2+0x141a/0x1650 [lfsck] Dec 6 12:09:17 nbp13-srv1 kernel: [ 5780.954509] [<ffffffffa11e27ce>] lfsck_assistant_engine+0xfce/0x20b0 [lfsck] Dec 6 12:09:17 nbp13-srv1 kernel: [ 5780.954511] [<ffffffff810b1131>] kthread+0xd1/0xe0 Dec 6 12:09:17 nbp13-srv1 kernel: [ 5780.954513] [<ffffffff816a14dd>] ret_from_fork+0x5d/0xb0 Dec 6 12:09:17 nbp13-srv1 kernel: [ 5780.954528] [<ffffffffffffffff>] 0xffffffffffffffff

            People

              bzzz Alex Zhuravlev
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: