Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7021

osd_trans_exec_check()) LBUG for ofd_write_attr_set

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.17.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Running a current master snapshot +;

      pick 496b566 LU-6903 lov: call lov_object_find_cbdata() inside lock
      pick fd5b2da LU-6727 osd: auto locate BH_DXLock bit
      pick 174ca9c LU-6119 test: Add testing for Lustre exported via Samba
      pick 7ccf098 LU-6356 ptlrpc: do not sleep if encpool reached max capacity
      pick db3981e LU-6215 llite: handle backing_dev_info removal
      pick c0aafaa LU-6846 llog: combine cancel rec and destroy
      

      I just hit this LBUG:

      <4>[ 5532.419038] Lustre: 5319:0:(osd_internal.h:1087:osd_trans_exec_check()) op 0: used 8, used now 8, reserved 1
      <4>[ 5532.419630] Lustre: 5319:0:(osd_handler.c:902:osd_trans_dump_creds())   create: 0/0/0, destroy: 0/0/0
      <4>[ 5532.420063] Lustre: 5319:0:(osd_handler.c:909:osd_trans_dump_creds())   attr_set: 1/1/8, xattr_set: 1/216/0
      <4>[ 5532.420419] Lustre: 5319:0:(osd_handler.c:919:osd_trans_dump_creds())   write: 0/0/0, punch: 0/0/0, quota 2/2/0
      <4>[ 5532.420804] Lustre: 5319:0:(osd_handler.c:926:osd_trans_dump_creds())   insert: 0/0/0, delete: 0/0/0
      <4>[ 5532.421728] Lustre: 5319:0:(osd_handler.c:933:osd_trans_dump_creds())   ref_add: 0/0/0, ref_del: 0/0/0
      <0>[ 5532.422069] LustreError: 5319:0:(osd_internal.h:1090:osd_trans_exec_check()) LBUG
      <4>[ 5532.436922] Pid: 5319, comm: ll_ost_io01_004
      <4>[ 5532.437212] 
      <4>[ 5532.437213] Call Trace:
      <4>[ 5532.437656]  [<ffffffffa12d9885>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      <4>[ 5532.442062]  [<ffffffffa12d9e87>] lbug_with_loc+0x47/0xb0 [libcfs]
      <4>[ 5532.442360]  [<ffffffffa078b6c4>] osd_attr_set+0x284/0x620 [osd_ldiskfs]
      <4>[ 5532.442653]  [<ffffffffa0adad4d>] dt_attr_set+0x8d/0x1a0 [ofd]
      <4>[ 5532.442938]  [<ffffffffa0adb379>] ofd_write_attr_set+0x519/0x8b0 [ofd]
      <4>[ 5532.446791]  [<ffffffffa0adb966>] ofd_commitrw_write+0x256/0xfc0 [ofd]
      <4>[ 5532.447128]  [<ffffffffa0ae0f3d>] ? ofd_fmd_find_nolock+0xad/0xd0 [ofd]
      <4>[ 5532.447427]  [<ffffffffa0adcc82>] ofd_commitrw+0x5b2/0xba0 [ofd]
      <4>[ 5532.447736]  [<ffffffffa13cfac9>] ? lprocfs_counter_add+0x119/0x1c0 [obdclass]
      <4>[ 5532.448233]  [<ffffffffa1676504>] obd_commitrw+0x114/0x380 [ptlrpc]
      <4>[ 5532.448553]  [<ffffffffa167f8b0>] tgt_brw_write+0xc80/0x1550 [ptlrpc]
      <4>[ 5532.448842]  [<ffffffff812918f0>] ? string+0x40/0x100
      <4>[ 5532.449151]  [<ffffffffa15d2ce0>] ? target_bulk_timeout+0x0/0xc0 [ptlrpc]
      <4>[ 5532.449627]  [<ffffffffa167e442>] tgt_request_handle+0xa42/0x1230 [ptlrpc]
      <4>[ 5532.450133]  [<ffffffffa1629ab4>] ptlrpc_main+0xd74/0x1850 [ptlrpc]
      <4>[ 5532.450687]  [<ffffffffa1628d40>] ? ptlrpc_main+0x0/0x1850 [ptlrpc]
      <4>[ 5532.453814]  [<ffffffff8109ce4e>] kthread+0x9e/0xc0
      <4>[ 5532.454167]  [<ffffffff8100c24a>] child_rip+0xa/0x20
      <4>[ 5532.454431]  [<ffffffff8109cdb0>] ? kthread+0x0/0xc0
      <4>[ 5532.454711]  [<ffffffff8100c240>] ? child_rip+0x0/0x20
      <4>[ 5532.454901] 
      <0>[ 5532.687417] Kernel panic - not syncing: LBUG
      <4>[ 5532.687691] Pid: 5319, comm: ll_ost_io01_004 Tainted: P           ---------------    2.6.32-rhe6.6-debug #1
      <4>[ 5532.688234] Call Trace:
      <4>[ 5532.688461]  [<ffffffff8151dcd9>] ? panic+0xa7/0x16f
      <4>[ 5532.688757]  [<ffffffffa12d9edb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
      <4>[ 5532.689072]  [<ffffffffa078b6c4>] ? osd_attr_set+0x284/0x620 [osd_ldiskfs]
      <4>[ 5532.689376]  [<ffffffffa0adad4d>] ? dt_attr_set+0x8d/0x1a0 [ofd]
      <4>[ 5532.689686]  [<ffffffffa0adb379>] ? ofd_write_attr_set+0x519/0x8b0 [ofd]
      <4>[ 5532.690040]  [<ffffffffa0adb966>] ? ofd_commitrw_write+0x256/0xfc0 [ofd]
      <4>[ 5532.690365]  [<ffffffffa0ae0f3d>] ? ofd_fmd_find_nolock+0xad/0xd0 [ofd]
      <4>[ 5532.691261]  [<ffffffffa0adcc82>] ? ofd_commitrw+0x5b2/0xba0 [ofd]
      <4>[ 5532.691261]  [<ffffffffa13cfac9>] ? lprocfs_counter_add+0x119/0x1c0 [obdclass]
      <4>[ 5532.691261]  [<ffffffffa1676504>] ? obd_commitrw+0x114/0x380 [ptlrpc]
      <4>[ 5532.700994]  [<ffffffffa167f8b0>] ? tgt_brw_write+0xc80/0x1550 [ptlrpc]
      <4>[ 5532.700994]  [<ffffffff812918f0>] ? string+0x40/0x100
      <4>[ 5532.700994]  [<ffffffffa15d2ce0>] ? target_bulk_timeout+0x0/0xc0 [ptlrpc]
      <4>[ 5532.700994]  [<ffffffffa167e442>] ? tgt_request_handle+0xa42/0x1230 [ptlrpc]
      <4>[ 5532.700994]  [<ffffffffa1629ab4>] ? ptlrpc_main+0xd74/0x1850 [ptlrpc]
      <4>[ 5532.700994]  [<ffffffffa1628d40>] ? ptlrpc_main+0x0/0x1850 [ptlrpc]
      <4>[ 5532.700994]  [<ffffffff8109ce4e>] ? kthread+0x9e/0xc0
      <4>[ 5532.700994]  [<ffffffff8100c24a>] ? child_rip+0xa/0x20
      <4>[ 5532.700994]  [<ffffffff8109cdb0>] ? kthread+0x0/0xc0
      <4>[ 5532.700994]  [<ffffffff8100c240>] ? child_rip+0x0/0x20
      

      Attachments

        Issue Links

          Activity

            [LU-7021] osd_trans_exec_check()) LBUG for ofd_write_attr_set
            pjones Peter Jones added a comment -

            Merged for 2.17

            pjones Peter Jones added a comment - Merged for 2.17

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/57570/
            Subject: LU-7021 osd: fix credits optimization
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 2e525833a6df942ab55757edfa84bbcfd093a9e3

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/57570/ Subject: LU-7021 osd: fix credits optimization Project: fs/lustre-release Branch: master Current Patch Set: Commit: 2e525833a6df942ab55757edfa84bbcfd093a9e3

            "Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57570
            Subject: LU-7021 osd: fix credits optimization
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: f375a5832b22d42e1fa7fb1a47d3aadbaf91a0fd

            gerrit Gerrit Updater added a comment - "Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57570 Subject: LU-7021 osd: fix credits optimization Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: f375a5832b22d42e1fa7fb1a47d3aadbaf91a0fd

            I'm still hitting this from time to time, with additional debugging found that:

            decl: 
            0->0
            0->0
            0->0
            
            exec: 
            3202->0
            29997->0
            0->0
            

            i.e. at declaration no ID was going to change, but UID/GID got changed by execution.

            bzzz Alex Zhuravlev added a comment - I'm still hitting this from time to time, with additional debugging found that: decl: 0->0 0->0 0->0 exec: 3202->0 29997->0 0->0 i.e. at declaration no ID was going to change, but UID/GID got changed by execution.

            Alex Zhuravlev (alexey.zhuravlev@intel.com) uploaded a new patch: http://review.whamcloud.com/23043
            Subject: LU-7021 osd: remove credits optimization from osd_declare_qid()
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: e332b64edc725db035ca5a3991274100bcda41c1

            gerrit Gerrit Updater added a comment - Alex Zhuravlev (alexey.zhuravlev@intel.com) uploaded a new patch: http://review.whamcloud.com/23043 Subject: LU-7021 osd: remove credits optimization from osd_declare_qid() Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: e332b64edc725db035ca5a3991274100bcda41c1
            green Oleg Drokin added a comment -

            I think I hit this once again (in racer):

            [202560.034938] Lustre: 21256:0:(osd_internal.h:1136:osd_trans_exec_check()) lustre-OST0000: opcode 0: used 5, used now 5, reserved 1
            [202560.036170] Lustre: 21256:0:(osd_handler.c:1639:osd_trans_dump_creds())   create: 0/0/0, destroy: 0/0/0
            [202560.037325] Lustre: 21256:0:(osd_handler.c:1646:osd_trans_dump_creds())   attr_set: 1/1/5, xattr_set: 1/14/0
            [202560.040563] Lustre: 21256:0:(osd_handler.c:1656:osd_trans_dump_creds())   write: 0/0/0, punch: 0/0/0, quota 2/2/0
            [202560.041736] Lustre: 21256:0:(osd_handler.c:1663:osd_trans_dump_creds())   insert: 0/0/0, delete: 0/0/0
            [202560.042886] Lustre: 21256:0:(osd_handler.c:1670:osd_trans_dump_creds())   ref_add: 0/0/0, ref_del: 0/0/0
            [202560.044024] LustreError: 21256:0:(osd_internal.h:1139:osd_trans_exec_check()) LBUG
            [202560.055301] Pid: 21256, comm: ll_ost_io00_005
            [202560.055873] 
            Call Trace:
            [202560.056930]  [<ffffffffa02227b3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs]
            [202560.057572]  [<ffffffffa0222d55>] lbug_with_loc+0x45/0xc0 [libcfs]
            [202560.058198]  [<ffffffffa0af750b>] osd_it_ea_rec.part.94+0x0/0x36 [osd_ldiskfs]
            [202560.080192]  [<ffffffffa0ac640d>] osd_attr_set+0x55d/0x9b0 [osd_ldiskfs]
            [202560.080832]  [<ffffffffa0d59a6d>] ofd_commitrw_write+0x128d/0x1bf0 [ofd]
            [202560.081469]  [<ffffffffa0d5cf4d>] ofd_commitrw+0x51d/0xa30 [ofd]
            [202560.082111]  [<ffffffffa04fb185>] ? lprocfs_counter_add+0x195/0x1e0 [obdclass]
            [202560.083575]  [<ffffffffa098efb3>] obd_commitrw+0x2ec/0x32f [ptlrpc]
            [202560.084236]  [<ffffffffa09686ea>] tgt_brw_write+0xeaa/0x1650 [ptlrpc]
            [202560.084881]  [<ffffffffa095f194>] ? req_can_reconstruct+0x44/0x110 [ptlrpc]
            [202560.085540]  [<ffffffffa08c0220>] ? target_bulk_timeout+0x0/0xb0 [ptlrpc]
            [202560.100228]  [<ffffffffa0964965>] tgt_request_handle+0x925/0x1330 [ptlrpc]
            [202560.100824]  [<ffffffffa0911be1>] ptlrpc_server_handle_request+0x231/0xac0 [ptlrpc]
            [202560.101838]  [<ffffffffa090f798>] ? ptlrpc_wait_event+0xb8/0x370 [ptlrpc]
            [202560.102422]  [<ffffffffa09159f8>] ptlrpc_main+0xa58/0x1dd0 [ptlrpc]
            [202560.102977]  [<ffffffffa0914fa0>] ? ptlrpc_main+0x0/0x1dd0 [ptlrpc]
            [202560.103785]  [<ffffffff810a2eda>] kthread+0xea/0xf0
            [202560.104304]  [<ffffffff810a2df0>] ? kthread+0x0/0xf0
            [202560.104829]  [<ffffffff8170fbd8>] ret_from_fork+0x58/0x90
            [202560.105356]  [<ffffffff810a2df0>] ? kthread+0x0/0xf0
            [202560.106077] 
            [202560.137168] Kernel panic - not syncing: LBUG
            
            green Oleg Drokin added a comment - I think I hit this once again (in racer): [202560.034938] Lustre: 21256:0:(osd_internal.h:1136:osd_trans_exec_check()) lustre-OST0000: opcode 0: used 5, used now 5, reserved 1 [202560.036170] Lustre: 21256:0:(osd_handler.c:1639:osd_trans_dump_creds()) create: 0/0/0, destroy: 0/0/0 [202560.037325] Lustre: 21256:0:(osd_handler.c:1646:osd_trans_dump_creds()) attr_set: 1/1/5, xattr_set: 1/14/0 [202560.040563] Lustre: 21256:0:(osd_handler.c:1656:osd_trans_dump_creds()) write: 0/0/0, punch: 0/0/0, quota 2/2/0 [202560.041736] Lustre: 21256:0:(osd_handler.c:1663:osd_trans_dump_creds()) insert: 0/0/0, delete: 0/0/0 [202560.042886] Lustre: 21256:0:(osd_handler.c:1670:osd_trans_dump_creds()) ref_add: 0/0/0, ref_del: 0/0/0 [202560.044024] LustreError: 21256:0:(osd_internal.h:1139:osd_trans_exec_check()) LBUG [202560.055301] Pid: 21256, comm: ll_ost_io00_005 [202560.055873] Call Trace: [202560.056930] [<ffffffffa02227b3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs] [202560.057572] [<ffffffffa0222d55>] lbug_with_loc+0x45/0xc0 [libcfs] [202560.058198] [<ffffffffa0af750b>] osd_it_ea_rec.part.94+0x0/0x36 [osd_ldiskfs] [202560.080192] [<ffffffffa0ac640d>] osd_attr_set+0x55d/0x9b0 [osd_ldiskfs] [202560.080832] [<ffffffffa0d59a6d>] ofd_commitrw_write+0x128d/0x1bf0 [ofd] [202560.081469] [<ffffffffa0d5cf4d>] ofd_commitrw+0x51d/0xa30 [ofd] [202560.082111] [<ffffffffa04fb185>] ? lprocfs_counter_add+0x195/0x1e0 [obdclass] [202560.083575] [<ffffffffa098efb3>] obd_commitrw+0x2ec/0x32f [ptlrpc] [202560.084236] [<ffffffffa09686ea>] tgt_brw_write+0xeaa/0x1650 [ptlrpc] [202560.084881] [<ffffffffa095f194>] ? req_can_reconstruct+0x44/0x110 [ptlrpc] [202560.085540] [<ffffffffa08c0220>] ? target_bulk_timeout+0x0/0xb0 [ptlrpc] [202560.100228] [<ffffffffa0964965>] tgt_request_handle+0x925/0x1330 [ptlrpc] [202560.100824] [<ffffffffa0911be1>] ptlrpc_server_handle_request+0x231/0xac0 [ptlrpc] [202560.101838] [<ffffffffa090f798>] ? ptlrpc_wait_event+0xb8/0x370 [ptlrpc] [202560.102422] [<ffffffffa09159f8>] ptlrpc_main+0xa58/0x1dd0 [ptlrpc] [202560.102977] [<ffffffffa0914fa0>] ? ptlrpc_main+0x0/0x1dd0 [ptlrpc] [202560.103785] [<ffffffff810a2eda>] kthread+0xea/0xf0 [202560.104304] [<ffffffff810a2df0>] ? kthread+0x0/0xf0 [202560.104829] [<ffffffff8170fbd8>] ret_from_fork+0x58/0x90 [202560.105356] [<ffffffff810a2df0>] ? kthread+0x0/0xf0 [202560.106077] [202560.137168] Kernel panic - not syncing: LBUG

            HI Alex,
            Can you look into this?
            Thanks.
            Joe

            jgmitter Joseph Gmitter (Inactive) added a comment - HI Alex, Can you look into this? Thanks. Joe

            People

              bzzz Alex Zhuravlev
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: