[LU-7021] osd_trans_exec_check()) LBUG for ofd_write_attr_set Created: 18/Aug/15  Updated: 06/May/21

Status: Reopened
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Oleg Drokin Assignee: Alex Zhuravlev
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-6969 osd_internal.h:1090:osd_trans_exec_ch... Resolved
is related to LU-13093 osd_trans_exec_check()) lustre-OST000... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Running a current master snapshot +;

pick 496b566 LU-6903 lov: call lov_object_find_cbdata() inside lock
pick fd5b2da LU-6727 osd: auto locate BH_DXLock bit
pick 174ca9c LU-6119 test: Add testing for Lustre exported via Samba
pick 7ccf098 LU-6356 ptlrpc: do not sleep if encpool reached max capacity
pick db3981e LU-6215 llite: handle backing_dev_info removal
pick c0aafaa LU-6846 llog: combine cancel rec and destroy

I just hit this LBUG:

<4>[ 5532.419038] Lustre: 5319:0:(osd_internal.h:1087:osd_trans_exec_check()) op 0: used 8, used now 8, reserved 1
<4>[ 5532.419630] Lustre: 5319:0:(osd_handler.c:902:osd_trans_dump_creds())   create: 0/0/0, destroy: 0/0/0
<4>[ 5532.420063] Lustre: 5319:0:(osd_handler.c:909:osd_trans_dump_creds())   attr_set: 1/1/8, xattr_set: 1/216/0
<4>[ 5532.420419] Lustre: 5319:0:(osd_handler.c:919:osd_trans_dump_creds())   write: 0/0/0, punch: 0/0/0, quota 2/2/0
<4>[ 5532.420804] Lustre: 5319:0:(osd_handler.c:926:osd_trans_dump_creds())   insert: 0/0/0, delete: 0/0/0
<4>[ 5532.421728] Lustre: 5319:0:(osd_handler.c:933:osd_trans_dump_creds())   ref_add: 0/0/0, ref_del: 0/0/0
<0>[ 5532.422069] LustreError: 5319:0:(osd_internal.h:1090:osd_trans_exec_check()) LBUG
<4>[ 5532.436922] Pid: 5319, comm: ll_ost_io01_004
<4>[ 5532.437212] 
<4>[ 5532.437213] Call Trace:
<4>[ 5532.437656]  [<ffffffffa12d9885>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
<4>[ 5532.442062]  [<ffffffffa12d9e87>] lbug_with_loc+0x47/0xb0 [libcfs]
<4>[ 5532.442360]  [<ffffffffa078b6c4>] osd_attr_set+0x284/0x620 [osd_ldiskfs]
<4>[ 5532.442653]  [<ffffffffa0adad4d>] dt_attr_set+0x8d/0x1a0 [ofd]
<4>[ 5532.442938]  [<ffffffffa0adb379>] ofd_write_attr_set+0x519/0x8b0 [ofd]
<4>[ 5532.446791]  [<ffffffffa0adb966>] ofd_commitrw_write+0x256/0xfc0 [ofd]
<4>[ 5532.447128]  [<ffffffffa0ae0f3d>] ? ofd_fmd_find_nolock+0xad/0xd0 [ofd]
<4>[ 5532.447427]  [<ffffffffa0adcc82>] ofd_commitrw+0x5b2/0xba0 [ofd]
<4>[ 5532.447736]  [<ffffffffa13cfac9>] ? lprocfs_counter_add+0x119/0x1c0 [obdclass]
<4>[ 5532.448233]  [<ffffffffa1676504>] obd_commitrw+0x114/0x380 [ptlrpc]
<4>[ 5532.448553]  [<ffffffffa167f8b0>] tgt_brw_write+0xc80/0x1550 [ptlrpc]
<4>[ 5532.448842]  [<ffffffff812918f0>] ? string+0x40/0x100
<4>[ 5532.449151]  [<ffffffffa15d2ce0>] ? target_bulk_timeout+0x0/0xc0 [ptlrpc]
<4>[ 5532.449627]  [<ffffffffa167e442>] tgt_request_handle+0xa42/0x1230 [ptlrpc]
<4>[ 5532.450133]  [<ffffffffa1629ab4>] ptlrpc_main+0xd74/0x1850 [ptlrpc]
<4>[ 5532.450687]  [<ffffffffa1628d40>] ? ptlrpc_main+0x0/0x1850 [ptlrpc]
<4>[ 5532.453814]  [<ffffffff8109ce4e>] kthread+0x9e/0xc0
<4>[ 5532.454167]  [<ffffffff8100c24a>] child_rip+0xa/0x20
<4>[ 5532.454431]  [<ffffffff8109cdb0>] ? kthread+0x0/0xc0
<4>[ 5532.454711]  [<ffffffff8100c240>] ? child_rip+0x0/0x20
<4>[ 5532.454901] 
<0>[ 5532.687417] Kernel panic - not syncing: LBUG
<4>[ 5532.687691] Pid: 5319, comm: ll_ost_io01_004 Tainted: P           ---------------    2.6.32-rhe6.6-debug #1
<4>[ 5532.688234] Call Trace:
<4>[ 5532.688461]  [<ffffffff8151dcd9>] ? panic+0xa7/0x16f
<4>[ 5532.688757]  [<ffffffffa12d9edb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
<4>[ 5532.689072]  [<ffffffffa078b6c4>] ? osd_attr_set+0x284/0x620 [osd_ldiskfs]
<4>[ 5532.689376]  [<ffffffffa0adad4d>] ? dt_attr_set+0x8d/0x1a0 [ofd]
<4>[ 5532.689686]  [<ffffffffa0adb379>] ? ofd_write_attr_set+0x519/0x8b0 [ofd]
<4>[ 5532.690040]  [<ffffffffa0adb966>] ? ofd_commitrw_write+0x256/0xfc0 [ofd]
<4>[ 5532.690365]  [<ffffffffa0ae0f3d>] ? ofd_fmd_find_nolock+0xad/0xd0 [ofd]
<4>[ 5532.691261]  [<ffffffffa0adcc82>] ? ofd_commitrw+0x5b2/0xba0 [ofd]
<4>[ 5532.691261]  [<ffffffffa13cfac9>] ? lprocfs_counter_add+0x119/0x1c0 [obdclass]
<4>[ 5532.691261]  [<ffffffffa1676504>] ? obd_commitrw+0x114/0x380 [ptlrpc]
<4>[ 5532.700994]  [<ffffffffa167f8b0>] ? tgt_brw_write+0xc80/0x1550 [ptlrpc]
<4>[ 5532.700994]  [<ffffffff812918f0>] ? string+0x40/0x100
<4>[ 5532.700994]  [<ffffffffa15d2ce0>] ? target_bulk_timeout+0x0/0xc0 [ptlrpc]
<4>[ 5532.700994]  [<ffffffffa167e442>] ? tgt_request_handle+0xa42/0x1230 [ptlrpc]
<4>[ 5532.700994]  [<ffffffffa1629ab4>] ? ptlrpc_main+0xd74/0x1850 [ptlrpc]
<4>[ 5532.700994]  [<ffffffffa1628d40>] ? ptlrpc_main+0x0/0x1850 [ptlrpc]
<4>[ 5532.700994]  [<ffffffff8109ce4e>] ? kthread+0x9e/0xc0
<4>[ 5532.700994]  [<ffffffff8100c24a>] ? child_rip+0xa/0x20
<4>[ 5532.700994]  [<ffffffff8109cdb0>] ? kthread+0x0/0xc0
<4>[ 5532.700994]  [<ffffffff8100c240>] ? child_rip+0x0/0x20


 Comments   
Comment by Joseph Gmitter (Inactive) [ 19/Aug/15 ]

HI Alex,
Can you look into this?
Thanks.
Joe

Comment by Oleg Drokin [ 03/Oct/16 ]

I think I hit this once again (in racer):

[202560.034938] Lustre: 21256:0:(osd_internal.h:1136:osd_trans_exec_check()) lustre-OST0000: opcode 0: used 5, used now 5, reserved 1
[202560.036170] Lustre: 21256:0:(osd_handler.c:1639:osd_trans_dump_creds())   create: 0/0/0, destroy: 0/0/0
[202560.037325] Lustre: 21256:0:(osd_handler.c:1646:osd_trans_dump_creds())   attr_set: 1/1/5, xattr_set: 1/14/0
[202560.040563] Lustre: 21256:0:(osd_handler.c:1656:osd_trans_dump_creds())   write: 0/0/0, punch: 0/0/0, quota 2/2/0
[202560.041736] Lustre: 21256:0:(osd_handler.c:1663:osd_trans_dump_creds())   insert: 0/0/0, delete: 0/0/0
[202560.042886] Lustre: 21256:0:(osd_handler.c:1670:osd_trans_dump_creds())   ref_add: 0/0/0, ref_del: 0/0/0
[202560.044024] LustreError: 21256:0:(osd_internal.h:1139:osd_trans_exec_check()) LBUG
[202560.055301] Pid: 21256, comm: ll_ost_io00_005
[202560.055873] 
Call Trace:
[202560.056930]  [<ffffffffa02227b3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs]
[202560.057572]  [<ffffffffa0222d55>] lbug_with_loc+0x45/0xc0 [libcfs]
[202560.058198]  [<ffffffffa0af750b>] osd_it_ea_rec.part.94+0x0/0x36 [osd_ldiskfs]
[202560.080192]  [<ffffffffa0ac640d>] osd_attr_set+0x55d/0x9b0 [osd_ldiskfs]
[202560.080832]  [<ffffffffa0d59a6d>] ofd_commitrw_write+0x128d/0x1bf0 [ofd]
[202560.081469]  [<ffffffffa0d5cf4d>] ofd_commitrw+0x51d/0xa30 [ofd]
[202560.082111]  [<ffffffffa04fb185>] ? lprocfs_counter_add+0x195/0x1e0 [obdclass]
[202560.083575]  [<ffffffffa098efb3>] obd_commitrw+0x2ec/0x32f [ptlrpc]
[202560.084236]  [<ffffffffa09686ea>] tgt_brw_write+0xeaa/0x1650 [ptlrpc]
[202560.084881]  [<ffffffffa095f194>] ? req_can_reconstruct+0x44/0x110 [ptlrpc]
[202560.085540]  [<ffffffffa08c0220>] ? target_bulk_timeout+0x0/0xb0 [ptlrpc]
[202560.100228]  [<ffffffffa0964965>] tgt_request_handle+0x925/0x1330 [ptlrpc]
[202560.100824]  [<ffffffffa0911be1>] ptlrpc_server_handle_request+0x231/0xac0 [ptlrpc]
[202560.101838]  [<ffffffffa090f798>] ? ptlrpc_wait_event+0xb8/0x370 [ptlrpc]
[202560.102422]  [<ffffffffa09159f8>] ptlrpc_main+0xa58/0x1dd0 [ptlrpc]
[202560.102977]  [<ffffffffa0914fa0>] ? ptlrpc_main+0x0/0x1dd0 [ptlrpc]
[202560.103785]  [<ffffffff810a2eda>] kthread+0xea/0xf0
[202560.104304]  [<ffffffff810a2df0>] ? kthread+0x0/0xf0
[202560.104829]  [<ffffffff8170fbd8>] ret_from_fork+0x58/0x90
[202560.105356]  [<ffffffff810a2df0>] ? kthread+0x0/0xf0
[202560.106077] 
[202560.137168] Kernel panic - not syncing: LBUG
Comment by Gerrit Updater [ 10/Oct/16 ]

Alex Zhuravlev (alexey.zhuravlev@intel.com) uploaded a new patch: http://review.whamcloud.com/23043
Subject: LU-7021 osd: remove credits optimization from osd_declare_qid()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: e332b64edc725db035ca5a3991274100bcda41c1

Comment by Alex Zhuravlev [ 06/May/21 ]

I'm still hitting this from time to time, with additional debugging found that:

decl: 
0->0
0->0
0->0

exec: 
3202->0
29997->0
0->0

i.e. at declaration no ID was going to change, but UID/GID got changed by execution.

Generated at Sat Feb 10 02:05:19 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.