Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11737

LustreError: 11060:0:(osd_handler.c:3985:osd_xattr_set()) ASSERTION( handle ) failed:

Details

    • 1
    • 9223372036854775807

    Description

      This is related to LU-11584.

      1. umounted the targets
      2. mount mdt as ldiskfs
      3. deleted all the quarantined files
      4. reboot and remount targets

      Hit LBUG.

      [ 1781.324368] LustreError: 11060:0:(osd_handler.c:3985:osd_xattr_set()) ASSERTION( handle ) failed:
      [ 1781.351312] LustreError: 11060:0:(osd_handler.c:3985:osd_xattr_set()) LBUG

       

      I tried to mount with noscurb still hitting the lbug.

       

      Filesystem keeps crashing with lbug.

       

      Attachments

        Issue Links

          Activity

            [LU-11737] LustreError: 11060:0:(osd_handler.c:3985:osd_xattr_set()) ASSERTION( handle ) failed:

            this is definitely useful, thanks.

            bzzz Alex Zhuravlev added a comment - this is definitely useful, thanks.
            mhanafi Mahmoud Hanafi added a comment - - edited

            ok will do.

            some additional info. Not sure if the delete record is relevant.

            Dec  6 12:09:17 nbp13-srv1 kernel: [ 5780.906822] Lustre: 16159:0:(lfsck_layout.c:1618:lfsck_layout_del_dangling_rec()) nbp13-MDT0000-osd: delete the dangling record for [0x20000205c:0x15dc1:0x0], comp_id = 4, ea_off = 15 from the trace file: rc = -2
            Dec  6 12:09:17 nbp13-srv1 kernel: [ 5780.906827] LustreError: 16159:0:(osd_handler.c:3985:osd_xattr_set()) ASSERTION( handle ) failed: 
            Dec  6 12:09:17 nbp13-srv1 kernel: [ 5780.933775] LustreError: 16159:0:(osd_handler.c:3985:osd_xattr_set()) LBUG
            Dec  6 12:09:17 nbp13-srv1 kernel: [ 5780.954442] Pid: 16159, comm: lfsck_layout 3.10.0-693.21.1.el7.20180508.x86_64.lustre2105 #1 SMP Mon Aug 27 23:04:41 UTC 2018
            Dec  6 12:09:17 nbp13-srv1 kernel: [ 5780.954442] Call Trace:
            Dec  6 12:09:17 nbp13-srv1 kernel: [ 5780.954458]  [<ffffffff8103a1f2>] save_stack_trace_tsk+0x22/0x40
            Dec  6 12:09:17 nbp13-srv1 kernel: [ 5780.954467]  [<ffffffffa06d17cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
            Dec  6 12:09:17 nbp13-srv1 kernel: [ 5780.954470]  [<ffffffffa06d187c>] lbug_with_loc+0x4c/0xa0 [libcfs]
            Dec  6 12:09:17 nbp13-srv1 kernel: [ 5780.954479]  [<ffffffffa110acdf>] osd_xattr_set+0x95f/0xc10 [osd_ldiskfs]
            Dec  6 12:09:17 nbp13-srv1 kernel: [ 5780.954492]  [<ffffffffa12093c2>] lfsck_layout_refill_lovea.isra.63+0x1c2/0x480 [lfsck]
            Dec  6 12:09:17 nbp13-srv1 kernel: [ 5780.954499]  [<ffffffffa121d0db>] lfsck_layout_recreate_lovea+0xd8b/0x2370 [lfsck]
            Dec  6 12:09:17 nbp13-srv1 kernel: [ 5780.954505]  [<ffffffffa121fada>] lfsck_layout_assistant_handler_p2+0x141a/0x1650 [lfsck]
            Dec  6 12:09:17 nbp13-srv1 kernel: [ 5780.954509]  [<ffffffffa11e27ce>] lfsck_assistant_engine+0xfce/0x20b0 [lfsck]
            Dec  6 12:09:17 nbp13-srv1 kernel: [ 5780.954511]  [<ffffffff810b1131>] kthread+0xd1/0xe0
            Dec  6 12:09:17 nbp13-srv1 kernel: [ 5780.954513]  [<ffffffff816a14dd>] ret_from_fork+0x5d/0xb0
            Dec  6 12:09:17 nbp13-srv1 kernel: [ 5780.954528]  [<ffffffffffffffff>] 0xffffffffffffffff
            
            mhanafi Mahmoud Hanafi added a comment - - edited ok will do. some additional info. Not sure if the delete record is relevant. Dec 6 12:09:17 nbp13-srv1 kernel: [ 5780.906822] Lustre: 16159:0:(lfsck_layout.c:1618:lfsck_layout_del_dangling_rec()) nbp13-MDT0000-osd: delete the dangling record for [0x20000205c:0x15dc1:0x0], comp_id = 4, ea_off = 15 from the trace file: rc = -2 Dec 6 12:09:17 nbp13-srv1 kernel: [ 5780.906827] LustreError: 16159:0:(osd_handler.c:3985:osd_xattr_set()) ASSERTION( handle ) failed: Dec 6 12:09:17 nbp13-srv1 kernel: [ 5780.933775] LustreError: 16159:0:(osd_handler.c:3985:osd_xattr_set()) LBUG Dec 6 12:09:17 nbp13-srv1 kernel: [ 5780.954442] Pid: 16159, comm: lfsck_layout 3.10.0-693.21.1.el7.20180508.x86_64.lustre2105 #1 SMP Mon Aug 27 23:04:41 UTC 2018 Dec 6 12:09:17 nbp13-srv1 kernel: [ 5780.954442] Call Trace: Dec 6 12:09:17 nbp13-srv1 kernel: [ 5780.954458] [<ffffffff8103a1f2>] save_stack_trace_tsk+0x22/0x40 Dec 6 12:09:17 nbp13-srv1 kernel: [ 5780.954467] [<ffffffffa06d17cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] Dec 6 12:09:17 nbp13-srv1 kernel: [ 5780.954470] [<ffffffffa06d187c>] lbug_with_loc+0x4c/0xa0 [libcfs] Dec 6 12:09:17 nbp13-srv1 kernel: [ 5780.954479] [<ffffffffa110acdf>] osd_xattr_set+0x95f/0xc10 [osd_ldiskfs] Dec 6 12:09:17 nbp13-srv1 kernel: [ 5780.954492] [<ffffffffa12093c2>] lfsck_layout_refill_lovea.isra.63+0x1c2/0x480 [lfsck] Dec 6 12:09:17 nbp13-srv1 kernel: [ 5780.954499] [<ffffffffa121d0db>] lfsck_layout_recreate_lovea+0xd8b/0x2370 [lfsck] Dec 6 12:09:17 nbp13-srv1 kernel: [ 5780.954505] [<ffffffffa121fada>] lfsck_layout_assistant_handler_p2+0x141a/0x1650 [lfsck] Dec 6 12:09:17 nbp13-srv1 kernel: [ 5780.954509] [<ffffffffa11e27ce>] lfsck_assistant_engine+0xfce/0x20b0 [lfsck] Dec 6 12:09:17 nbp13-srv1 kernel: [ 5780.954511] [<ffffffff810b1131>] kthread+0xd1/0xe0 Dec 6 12:09:17 nbp13-srv1 kernel: [ 5780.954513] [<ffffffff816a14dd>] ret_from_fork+0x5d/0xb0 Dec 6 12:09:17 nbp13-srv1 kernel: [ 5780.954528] [<ffffffffffffffff>] 0xffffffffffffffff

            my understanding is that we want to get the filesystem back first? in the mean time I'll try to get a fix for this issue.
            it seems a part of the code ignores read-only option.

            bzzz Alex Zhuravlev added a comment - my understanding is that we want to get the filesystem back first? in the mean time I'll try to get a fix for this issue. it seems a part of the code ignores read-only option.

            Do you want me to try re-running lfsck after remounting?

            mhanafi Mahmoud Hanafi added a comment - Do you want me to try re-running lfsck after remounting?

            please, try to mount MDS with skip_lfsck mount option.

            bzzz Alex Zhuravlev added a comment - please, try to mount MDS with skip_lfsck mount option.

            ok, thanks. working on this..

            bzzz Alex Zhuravlev added a comment - ok, thanks. working on this..
            mhanafi Mahmoud Hanafi added a comment - - edited

            I was running lfsck in dryrun

            lctl lfsc_start -r -n -o -t layout

             PID: 11060  TASK: ffff882f2ed10fd0  CPU: 5   COMMAND: "lfsck_layout"
             #0 [ffff882bcca3f868] machine_kexec at ffffffff8105b64b
             #1 [ffff882bcca3f8c8] __crash_kexec at ffffffff81105342
             #2 [ffff882bcca3f998] panic at ffffffff81689aad
             #3 [ffff882bcca3fa18] lbug_with_loc at ffffffffa06e08cb [libcfs]
             #4 [ffff882bcca3fa38] osd_xattr_set at ffffffffa10dfcdf [osd_ldiskfs]
             #5 [ffff882bcca3fab8] lfsck_layout_refill_lovea at ffffffffa11de3c2 [lfsck]
             #6 [ffff882bcca3fb40] lfsck_layout_recreate_lovea at ffffffffa11f20db [lfsck]
             #7 [ffff882bcca3fc88] lfsck_layout_assistant_handler_p2 at ffffffffa11f4ada [lfsck]
             #8 [ffff882bcca3fd80] lfsck_assistant_engine at ffffffffa11b77ce [lfsck]
             #9 [ffff882bcca3fec8] kthread at ffffffff810b1131
            #10 [ffff882bcca3ff50] ret_from_fork at ffffffff816a14dd
            

             

            mhanafi Mahmoud Hanafi added a comment - - edited I was running lfsck in dryrun lctl lfsc_start -r -n -o -t layout PID: 11060 TASK: ffff882f2ed10fd0 CPU: 5 COMMAND: "lfsck_layout" #0 [ffff882bcca3f868] machine_kexec at ffffffff8105b64b #1 [ffff882bcca3f8c8] __crash_kexec at ffffffff81105342 #2 [ffff882bcca3f998] panic at ffffffff81689aad #3 [ffff882bcca3fa18] lbug_with_loc at ffffffffa06e08cb [libcfs] #4 [ffff882bcca3fa38] osd_xattr_set at ffffffffa10dfcdf [osd_ldiskfs] #5 [ffff882bcca3fab8] lfsck_layout_refill_lovea at ffffffffa11de3c2 [lfsck] #6 [ffff882bcca3fb40] lfsck_layout_recreate_lovea at ffffffffa11f20db [lfsck] #7 [ffff882bcca3fc88] lfsck_layout_assistant_handler_p2 at ffffffffa11f4ada [lfsck] #8 [ffff882bcca3fd80] lfsck_assistant_engine at ffffffffa11b77ce [lfsck] #9 [ffff882bcca3fec8] kthread at ffffffff810b1131 #10 [ffff882bcca3ff50] ret_from_fork at ffffffff816a14dd  

            please, show the full stack trace.

            bzzz Alex Zhuravlev added a comment - please, show the full stack trace.

            People

              bzzz Alex Zhuravlev
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: