[LU-7021] osd_trans_exec_check()) LBUG for ofd_write_attr_set Created: 18/Aug/15 Updated: 06/May/21 |
|
| Status: | Reopened |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Oleg Drokin | Assignee: | Alex Zhuravlev |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
Running a current master snapshot +; pick 496b566 LU-6903 lov: call lov_object_find_cbdata() inside lock pick fd5b2da LU-6727 osd: auto locate BH_DXLock bit pick 174ca9c LU-6119 test: Add testing for Lustre exported via Samba pick 7ccf098 LU-6356 ptlrpc: do not sleep if encpool reached max capacity pick db3981e LU-6215 llite: handle backing_dev_info removal pick c0aafaa LU-6846 llog: combine cancel rec and destroy I just hit this LBUG: <4>[ 5532.419038] Lustre: 5319:0:(osd_internal.h:1087:osd_trans_exec_check()) op 0: used 8, used now 8, reserved 1 <4>[ 5532.419630] Lustre: 5319:0:(osd_handler.c:902:osd_trans_dump_creds()) create: 0/0/0, destroy: 0/0/0 <4>[ 5532.420063] Lustre: 5319:0:(osd_handler.c:909:osd_trans_dump_creds()) attr_set: 1/1/8, xattr_set: 1/216/0 <4>[ 5532.420419] Lustre: 5319:0:(osd_handler.c:919:osd_trans_dump_creds()) write: 0/0/0, punch: 0/0/0, quota 2/2/0 <4>[ 5532.420804] Lustre: 5319:0:(osd_handler.c:926:osd_trans_dump_creds()) insert: 0/0/0, delete: 0/0/0 <4>[ 5532.421728] Lustre: 5319:0:(osd_handler.c:933:osd_trans_dump_creds()) ref_add: 0/0/0, ref_del: 0/0/0 <0>[ 5532.422069] LustreError: 5319:0:(osd_internal.h:1090:osd_trans_exec_check()) LBUG <4>[ 5532.436922] Pid: 5319, comm: ll_ost_io01_004 <4>[ 5532.437212] <4>[ 5532.437213] Call Trace: <4>[ 5532.437656] [<ffffffffa12d9885>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4>[ 5532.442062] [<ffffffffa12d9e87>] lbug_with_loc+0x47/0xb0 [libcfs] <4>[ 5532.442360] [<ffffffffa078b6c4>] osd_attr_set+0x284/0x620 [osd_ldiskfs] <4>[ 5532.442653] [<ffffffffa0adad4d>] dt_attr_set+0x8d/0x1a0 [ofd] <4>[ 5532.442938] [<ffffffffa0adb379>] ofd_write_attr_set+0x519/0x8b0 [ofd] <4>[ 5532.446791] [<ffffffffa0adb966>] ofd_commitrw_write+0x256/0xfc0 [ofd] <4>[ 5532.447128] [<ffffffffa0ae0f3d>] ? ofd_fmd_find_nolock+0xad/0xd0 [ofd] <4>[ 5532.447427] [<ffffffffa0adcc82>] ofd_commitrw+0x5b2/0xba0 [ofd] <4>[ 5532.447736] [<ffffffffa13cfac9>] ? lprocfs_counter_add+0x119/0x1c0 [obdclass] <4>[ 5532.448233] [<ffffffffa1676504>] obd_commitrw+0x114/0x380 [ptlrpc] <4>[ 5532.448553] [<ffffffffa167f8b0>] tgt_brw_write+0xc80/0x1550 [ptlrpc] <4>[ 5532.448842] [<ffffffff812918f0>] ? string+0x40/0x100 <4>[ 5532.449151] [<ffffffffa15d2ce0>] ? target_bulk_timeout+0x0/0xc0 [ptlrpc] <4>[ 5532.449627] [<ffffffffa167e442>] tgt_request_handle+0xa42/0x1230 [ptlrpc] <4>[ 5532.450133] [<ffffffffa1629ab4>] ptlrpc_main+0xd74/0x1850 [ptlrpc] <4>[ 5532.450687] [<ffffffffa1628d40>] ? ptlrpc_main+0x0/0x1850 [ptlrpc] <4>[ 5532.453814] [<ffffffff8109ce4e>] kthread+0x9e/0xc0 <4>[ 5532.454167] [<ffffffff8100c24a>] child_rip+0xa/0x20 <4>[ 5532.454431] [<ffffffff8109cdb0>] ? kthread+0x0/0xc0 <4>[ 5532.454711] [<ffffffff8100c240>] ? child_rip+0x0/0x20 <4>[ 5532.454901] <0>[ 5532.687417] Kernel panic - not syncing: LBUG <4>[ 5532.687691] Pid: 5319, comm: ll_ost_io01_004 Tainted: P --------------- 2.6.32-rhe6.6-debug #1 <4>[ 5532.688234] Call Trace: <4>[ 5532.688461] [<ffffffff8151dcd9>] ? panic+0xa7/0x16f <4>[ 5532.688757] [<ffffffffa12d9edb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] <4>[ 5532.689072] [<ffffffffa078b6c4>] ? osd_attr_set+0x284/0x620 [osd_ldiskfs] <4>[ 5532.689376] [<ffffffffa0adad4d>] ? dt_attr_set+0x8d/0x1a0 [ofd] <4>[ 5532.689686] [<ffffffffa0adb379>] ? ofd_write_attr_set+0x519/0x8b0 [ofd] <4>[ 5532.690040] [<ffffffffa0adb966>] ? ofd_commitrw_write+0x256/0xfc0 [ofd] <4>[ 5532.690365] [<ffffffffa0ae0f3d>] ? ofd_fmd_find_nolock+0xad/0xd0 [ofd] <4>[ 5532.691261] [<ffffffffa0adcc82>] ? ofd_commitrw+0x5b2/0xba0 [ofd] <4>[ 5532.691261] [<ffffffffa13cfac9>] ? lprocfs_counter_add+0x119/0x1c0 [obdclass] <4>[ 5532.691261] [<ffffffffa1676504>] ? obd_commitrw+0x114/0x380 [ptlrpc] <4>[ 5532.700994] [<ffffffffa167f8b0>] ? tgt_brw_write+0xc80/0x1550 [ptlrpc] <4>[ 5532.700994] [<ffffffff812918f0>] ? string+0x40/0x100 <4>[ 5532.700994] [<ffffffffa15d2ce0>] ? target_bulk_timeout+0x0/0xc0 [ptlrpc] <4>[ 5532.700994] [<ffffffffa167e442>] ? tgt_request_handle+0xa42/0x1230 [ptlrpc] <4>[ 5532.700994] [<ffffffffa1629ab4>] ? ptlrpc_main+0xd74/0x1850 [ptlrpc] <4>[ 5532.700994] [<ffffffffa1628d40>] ? ptlrpc_main+0x0/0x1850 [ptlrpc] <4>[ 5532.700994] [<ffffffff8109ce4e>] ? kthread+0x9e/0xc0 <4>[ 5532.700994] [<ffffffff8100c24a>] ? child_rip+0xa/0x20 <4>[ 5532.700994] [<ffffffff8109cdb0>] ? kthread+0x0/0xc0 <4>[ 5532.700994] [<ffffffff8100c240>] ? child_rip+0x0/0x20 |
| Comments |
| Comment by Joseph Gmitter (Inactive) [ 19/Aug/15 ] |
|
HI Alex, |
| Comment by Oleg Drokin [ 03/Oct/16 ] |
|
I think I hit this once again (in racer): [202560.034938] Lustre: 21256:0:(osd_internal.h:1136:osd_trans_exec_check()) lustre-OST0000: opcode 0: used 5, used now 5, reserved 1 [202560.036170] Lustre: 21256:0:(osd_handler.c:1639:osd_trans_dump_creds()) create: 0/0/0, destroy: 0/0/0 [202560.037325] Lustre: 21256:0:(osd_handler.c:1646:osd_trans_dump_creds()) attr_set: 1/1/5, xattr_set: 1/14/0 [202560.040563] Lustre: 21256:0:(osd_handler.c:1656:osd_trans_dump_creds()) write: 0/0/0, punch: 0/0/0, quota 2/2/0 [202560.041736] Lustre: 21256:0:(osd_handler.c:1663:osd_trans_dump_creds()) insert: 0/0/0, delete: 0/0/0 [202560.042886] Lustre: 21256:0:(osd_handler.c:1670:osd_trans_dump_creds()) ref_add: 0/0/0, ref_del: 0/0/0 [202560.044024] LustreError: 21256:0:(osd_internal.h:1139:osd_trans_exec_check()) LBUG [202560.055301] Pid: 21256, comm: ll_ost_io00_005 [202560.055873] Call Trace: [202560.056930] [<ffffffffa02227b3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs] [202560.057572] [<ffffffffa0222d55>] lbug_with_loc+0x45/0xc0 [libcfs] [202560.058198] [<ffffffffa0af750b>] osd_it_ea_rec.part.94+0x0/0x36 [osd_ldiskfs] [202560.080192] [<ffffffffa0ac640d>] osd_attr_set+0x55d/0x9b0 [osd_ldiskfs] [202560.080832] [<ffffffffa0d59a6d>] ofd_commitrw_write+0x128d/0x1bf0 [ofd] [202560.081469] [<ffffffffa0d5cf4d>] ofd_commitrw+0x51d/0xa30 [ofd] [202560.082111] [<ffffffffa04fb185>] ? lprocfs_counter_add+0x195/0x1e0 [obdclass] [202560.083575] [<ffffffffa098efb3>] obd_commitrw+0x2ec/0x32f [ptlrpc] [202560.084236] [<ffffffffa09686ea>] tgt_brw_write+0xeaa/0x1650 [ptlrpc] [202560.084881] [<ffffffffa095f194>] ? req_can_reconstruct+0x44/0x110 [ptlrpc] [202560.085540] [<ffffffffa08c0220>] ? target_bulk_timeout+0x0/0xb0 [ptlrpc] [202560.100228] [<ffffffffa0964965>] tgt_request_handle+0x925/0x1330 [ptlrpc] [202560.100824] [<ffffffffa0911be1>] ptlrpc_server_handle_request+0x231/0xac0 [ptlrpc] [202560.101838] [<ffffffffa090f798>] ? ptlrpc_wait_event+0xb8/0x370 [ptlrpc] [202560.102422] [<ffffffffa09159f8>] ptlrpc_main+0xa58/0x1dd0 [ptlrpc] [202560.102977] [<ffffffffa0914fa0>] ? ptlrpc_main+0x0/0x1dd0 [ptlrpc] [202560.103785] [<ffffffff810a2eda>] kthread+0xea/0xf0 [202560.104304] [<ffffffff810a2df0>] ? kthread+0x0/0xf0 [202560.104829] [<ffffffff8170fbd8>] ret_from_fork+0x58/0x90 [202560.105356] [<ffffffff810a2df0>] ? kthread+0x0/0xf0 [202560.106077] [202560.137168] Kernel panic - not syncing: LBUG |
| Comment by Gerrit Updater [ 10/Oct/16 ] |
|
Alex Zhuravlev (alexey.zhuravlev@intel.com) uploaded a new patch: http://review.whamcloud.com/23043 |
| Comment by Alex Zhuravlev [ 06/May/21 ] |
|
I'm still hitting this from time to time, with additional debugging found that: decl: 0->0 0->0 0->0 exec: 3202->0 29997->0 0->0 i.e. at declaration no ID was going to change, but UID/GID got changed by execution. |