[LU-6967] racer test_1: (osd_internal.h:1036:osd_trans_exec_op()) LBUG Created: 06/Aug/15  Updated: 10/Oct/21  Resolved: 10/Oct/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Ashish Purkar (Inactive) Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

<lustre-server>
Version : 2.7.56
Release : 2.6.32_431.29.2.el6_lustremaster_master__49_g87a28e4

<lustre-client>
Version : 2.7.56
Release : 2.6.32_431.29.2.el6.x86_64_g87a28e4


Attachments: File racer_LBUG.tar    
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

<cmd>
SLOW=YES NAME=ncli NETTYPE=tcp
mds1_HOST=fre0101 MDSDEV1=/dev/vdb mds_HOST=fre0101 MDSDEV=/dev/vdb MDSCOUNT=1
ost1_HOST=fre0102 OSTDEV1=/dev/vdb ost2_HOST=fre0102 OSTDEV2=/dev/vdc OSTCOUNT=2
CLIENTS=fre0103 RCLIENTS="fre0104" DIR=/mnt/lustre
ONLY=1
MDS_MOUNT_OPTS="-o rw,user_xattr"
OST_MOUNT_OPTS="-o user_xattr"
ENABLE_QUOTA="yes" MAXFREE="1400000" MDSJOURNALSIZE="22" /usr/lib64/lustre/tests/racer.sh

<stdout>
== racer test 1: racer on clients: fre0103,fre0104 DURATION=900 == 19:01:31 (1438282891)
racers pids: 5604 5605
fre0103: ./file_exec.sh: line 12: 11096 Segmentation fault $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
fre0104: ./file_exec.sh: line 12: 6621 Segmentation fault $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
fre0104: ./file_exec.sh: line 12: 8187 Segmentation fault $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
fre0103: ./file_exec.sh: line 12: 22080 Segmentation fault $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
fre0103: ./file_exec.sh: line 12: 22082 Segmentation fault $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
fre0104: ./file_exec.sh: line 12: 22389 Segmentation fault $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
fre0104: ./file_exec.sh: line 12: 29217 Segmentation fault $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
fre0104: ./file_exec.sh: line 12: 31092 Segmentation fault $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
fre0103: ./file_exec.sh: line 12: 11679 Bus error $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null

Lustre: DEBUG MARKER: == racer test 1: racer on clients: fre0103,fre0104 DURATION=900 == 19:01:31 (1438282891)
Lustre: 7205:0:(osd_handler.c:902:osd_trans_dump_creds()) create: 0/0/0, destroy: 0/0/0
Lustre: 7205:0:(osd_handler.c:909:osd_trans_dump_creds()) attr_set: 1/1/0, xattr_set: 1/1/0
Lustre: 7205:0:(osd_handler.c:919:osd_trans_dump_creds()) write: 2/12/0, punch: 1/4/0, quota 2/2/0
Lustre: 7205:0:(osd_handler.c:926:osd_trans_dump_creds()) insert: 0/0/0, delete: 0/0/0
Lustre: 7205:0:(osd_handler.c:933:osd_trans_dump_creds()) ref_add: 0/0/0, ref_del: 0/0/0
LustreError: 7205:0:(osd_internal.h:1028:osd_trans_exec_op()) lustre-OST0001-osd: op = 7, rb = 7
LustreError: 7205:0:(osd_internal.h:1036:osd_trans_exec_op()) LBUG
Pid: 7205, comm: ll_ost_io00_005

Call Trace:
[<ffffffffa032a875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[<ffffffffa032ae77>] lbug_with_loc+0x47/0xb0 [libcfs]
[<ffffffffa0b8557e>] osd_write+0x41e/0x5b0 [osd_ldiskfs]
[<ffffffffa047d8ed>] dt_record_write+0x3d/0x130 [obdclass]
[<ffffffffa06fe9e5>] tgt_client_data_write+0x165/0x1b0 [ptlrpc]
[<ffffffffa06ff98d>] tgt_txn_stop_cb+0x8bd/0x1340 [ptlrpc]
[<ffffffffa047d7be>] dt_txn_hook_stop+0x5e/0x90 [obdclass]
[<ffffffffa0b625de>] osd_trans_stop+0x18e/0x5c0 [osd_ldiskfs]
[<ffffffffa0cefa2f>] ofd_trans_stop+0x1f/0x60 [ofd]
[<ffffffffa0cf145a>] ofd_object_punch+0x35a/0xa30 [ofd]
[<ffffffffa0cdd6fe>] ofd_punch_hdl+0x36e/0xb20 [ofd]
[<ffffffffa070e872>] tgt_request_handle+0xa42/0x1230 [ptlrpc]
[<ffffffffa06b64c1>] ptlrpc_main+0xe41/0x1920 [ptlrpc]
[<ffffffffa06b5680>] ? ptlrpc_main+0x0/0x1920 [ptlrpc]
[<ffffffff8109abf6>] kthread+0x96/0xa0
[<ffffffff8100c20a>] child_rip+0xa/0x20
[<ffffffff8109ab60>] ? kthread+0x0/0xa0
[<ffffffff8100c200>] ? child_rip+0x0/0x20



 Comments   
Comment by Vinayak (Inactive) [ 06/Oct/15 ]

We hit this issue much frequently if we multi run racer, test_1

Lustre: DEBUG MARKER: == racer test 1: racer on clients: fre0311,fre0312 DURATION=900 == 17:57:32 (1443722252)
Lustre: 7274:0:(osd_handler.c:912:osd_trans_dump_creds())   create: 0/0/0, destroy: 0/0/0
Lustre: 7274:0:(osd_handler.c:919:osd_trans_dump_creds())   attr_set: 1/1/0, xattr_set: 1/1/0
Lustre: 7274:0:(osd_handler.c:929:osd_trans_dump_creds())   write: 2/12/0, punch: 1/4/0, quota 2/2/0
Lustre: 7274:0:(osd_handler.c:936:osd_trans_dump_creds())   insert: 0/0/0, delete: 0/0/0
Lustre: 7274:0:(osd_handler.c:943:osd_trans_dump_creds())   ref_add: 0/0/0, ref_del: 0/0/0
LustreError: 7274:0:(osd_internal.h:1040:osd_trans_exec_op()) lustre-OST0001-osd: op = 7, rb = 7
LustreError: 7274:0:(osd_internal.h:1048:osd_trans_exec_op()) LBUG
Pid: 7274, comm: ll_ost_io00_020

Call Trace:
 [<ffffffffa0328875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa0328e77>] lbug_with_loc+0x47/0xb0 [libcfs]
 [<ffffffffa0b8b0be>] osd_write+0x41e/0x5b0 [osd_ldiskfs]
 [<ffffffffa0475c4d>] dt_record_write+0x3d/0x130 [obdclass]
 [<ffffffffa06f6545>] tgt_client_data_write+0x165/0x1b0 [ptlrpc]
 [<ffffffffa06f7517>] tgt_txn_stop_cb+0x477/0x1110 [ptlrpc]
 [<ffffffffa0475b1e>] dt_txn_hook_stop+0x5e/0x90 [obdclass]
 [<ffffffffa0b680ce>] osd_trans_stop+0x1ae/0x990 [osd_ldiskfs]
 [<ffffffffa0b78d58>] ? osd_attr_set+0x148/0x620 [osd_ldiskfs]
 [<ffffffffa0cf4a7f>] ofd_trans_stop+0x1f/0x60 [ofd]
 [<ffffffffa0cf64aa>] ofd_object_punch+0x35a/0xa30 [ofd]
 [<ffffffffa0ce273e>] ofd_punch_hdl+0x36e/0xb20 [ofd]
 [<ffffffffa07064bc>] tgt_request_handle+0x8bc/0x12e0 [ptlrpc]
 [<ffffffffa06adb41>] ptlrpc_main+0xe41/0x1910 [ptlrpc]
 [<ffffffffa06acd00>] ? ptlrpc_main+0x0/0x1910 [ptlrpc]
 [<ffffffff8109abf6>] kthread+0x96/0xa0
 [<ffffffff8100c20a>] child_rip+0xa/0x20
 [<ffffffff8109ab60>] ? kthread+0x0/0xa0
 [<ffffffff8100c200>] ? child_rip+0x0/0x20

Correct me If I am wrong.

Generated at Sat Feb 10 02:04:50 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.