Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.15.0
-
None
-
3
-
9223372036854775807
Description
See on v2_15_0-RC2-25-g9884f37985. The failed assertion of LU-14618 is still reachable. To reproduce:
$LUSTRE/tests/llmount.sh lfs setstripe \ -E 1M -c 4 \ -E 2M -c 4 \ -E 3M -c 4 \ -E 4M -c 4 \ -E 5M -c 4 \ -E 6M -c 4 \ -E 7M -c 4 \ -E 8M -c 4 \ -E 9M -c 4 \ -E 10M -c 4 \ -E 11M -c 4 \ -E 12M -c 4 \ -E 13M -c 4 \ -E 14M -c 4 \ -E 15M -c 4 \ -E eof -c 4 \ /mnt/lustre $LUSTRE/tests/racer.sh
[ 248.230101] LustreError: 50541:0:(lov_lock.c:206:lov_lock_sub_init()) LBUG [ 248.230626] Pid: 50541, comm: truncate 4.18.0-348.7.1.el8.x86_64+debug #1 SMP Mon Mar 7 11:31:18 CST 2022 [ 248.231401] Call Trace TBD: [ 248.231636] [<0>] libcfs_call_trace+0x99/0x140 [libcfs] [ 248.232050] [<0>] lbug_with_loc+0x8f/0x100 [libcfs] [ 248.232664] [<0>] lov_lock_sub_init+0x2115/0x2330 [lov] [ 248.233081] [<0>] lov_lock_init_composite+0xf1/0x1f0 [lov] [ 248.233560] [<0>] cl_lock_init+0x23d/0x410 [obdclass] [ 248.234029] [<0>] cl_lock_request+0x11f/0x370 [obdclass] [ 248.234619] [<0>] cl_io_lock+0x36e/0xa20 [obdclass] [ 248.235045] [<0>] cl_io_loop+0x16d/0x480 [obdclass] [ 248.235471] [<0>] cl_setattr_ost+0x77e/0xaa0 [lustre] [ 248.235876] [<0>] ll_setattr_raw+0x15fe/0x2d30 [lustre] [ 248.236290] [<0>] notify_change+0x743/0xd78 [ 248.236613] [<0>] do_truncate+0xe2/0x180 [ 248.236912] [<0>] vfs_truncate+0x368/0x400 [ 248.237217] [<0>] do_sys_truncate.part.10+0xe0/0x100 [ 248.237576] [<0>] do_syscall_64+0xa5/0x430 [ 248.237877] [<0>] entry_SYSCALL_64_after_hwframe+0x6a/0xdf [ 248.238279] Kernel panic - not syncing: LBUG [ 248.238596] CPU: 2 PID: 50541 Comm: truncate Kdump: loaded Tainted: G W OE
This is caused by a data race on the lo_trunc_stripeno member of struct lov_layout_raid0.
It is possible to cause slab out of bounds errors (and hence memory corruption) in the same way:
[ 1494.352844] ================================================================== [ 1494.353591] BUG: KASAN: slab-out-of-bounds in lov_lock_sub_init+0x201b/0x2640 [lov] [ 1494.354310] Read of size 8 at addr ffff888059512700 by task truncate/28602 [ 1494.354928] [ 1494.355058] CPU: 3 PID: 28602 Comm: truncate Kdump: loaded Tainted: G W OE --------- - - 4.18.0-348.7\ .1.el8.x86_64+debug #1 [ 1494.356192] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 1494.356990] Call Trace: [ 1494.357253] dump_stack+0x8e/0xd0 [ 1494.357549] ? lov_lock_sub_init+0x201b/0x2640 [lov] [ 1494.357984] print_address_description.constprop.5+0x1e/0x230 [ 1494.358521] ? kmsg_dump_rewind_nolock+0xd9/0xd9 [ 1494.359006] ? lov_lock_sub_init+0x201b/0x2640 [lov] [ 1494.359455] ? lov_lock_sub_init+0x201b/0x2640 [lov] [ 1494.359917] ? lov_lock_sub_init+0x201b/0x2640 [lov] [ 1494.360389] __kasan_report.cold.7+0x37/0x86 [ 1494.360757] ? lov_lock_sub_init+0x201b/0x2640 [lov] [ 1494.361211] kasan_report+0x37/0x50 [ 1494.361493] lov_lock_sub_init+0x201b/0x2640 [lov] [ 1494.361895] ? lov_lock_fini+0x810/0x810 [lov] [ 1494.362269] ? lov_io_iter_init+0x901/0x1960 [lov] [ 1494.362684] ? libcfs_debug_msg+0x9d/0xd0 [libcfs] [ 1494.362866] LustreError: 27720:0:(lov_lock.c:209:lov_lock_sub_init()) result=0, nr=28, lls_nr=27 [ 1494.363141] lov_lock_init_composite+0xf5/0x200 [lov] [ 1494.363183] cl_lock_init+0x23d/0x3f0 [obdclass] [ 1494.364977] cl_lock_request+0x120/0x350 [obdclass] [ 1494.365397] cl_io_lock+0x85c/0x1240 [obdclass] [ 1494.365795] ? vvp_io_init+0x5aa/0xb90 [lustre] [ 1494.366261] cl_io_loop+0x16d/0x490 [obdclass] [ 1494.366718] cl_setattr_ost+0x76a/0xa90 [lustre] [ 1494.367136] ? cl_glimpse_size0+0x680/0x680 [lustre] [ 1494.367621] ? _raw_spin_unlock+0x1f/0x30 [ 1494.367987] ? __ptlrpc_req_finished+0x50f/0x1320 [ptlrpc] [ 1494.368483] ? up_write+0x15c/0x490 [ 1494.368795] ll_setattr_raw+0x1a1d/0x2ee0 [lustre] [ 1494.369265] ? ll_finish_md_op_data+0x210/0x210 [lustre] [ 1494.369687] ? ktime_get_coarse_real_ts64+0x127/0x1b0 [ 1494.370119] notify_change+0x743/0xd78 [ 1494.370421] ? libcfs_debug_vmsg2+0x23e0/0x23e0 [libcfs] [ 1494.370866] do_truncate+0xe2/0x180 [ 1494.371187] ? file_open_root+0x1b0/0x1b0 [ 1494.371507] ? inode_permission+0x25b/0x390 [ 1494.371847] vfs_truncate+0x368/0x400 [ 1494.372193] ? do_truncate+0x180/0x180 [ 1494.372499] do_sys_truncate.part.10+0xe0/0x100 [ 1494.372863] ? vfs_truncate+0x400/0x400 [ 1494.373224] ? do_syscall_64+0x22/0x430 [ 1494.373520] do_syscall_64+0xa5/0x430 [ 1494.373814] entry_SYSCALL_64_after_hwframe+0x6a/0xdf [ 1494.374257] RIP: 0033:0x7f87d92a5e2b [ 1494.374550] Code: 8b 15 61 90 2c 00 f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 d6 e9 f0 fe ff ff f3 0f 1e fa b8 4c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 29 90 2c 00 f7
This information (lo_trunc_stripno) must be moved out of the object and into the io. We do not have locking available to us to protect it from its initialization in lov_io_iter_init() to its use in lov_lock_sub_init().
Attachments
Issue Links
- is related to
-
LU-14618 when lov sub-lock initialization fails, LBUG: ASSERTION( (!(result == 0) || (nr == lovlck->lls_nr)) ) ensues
- Closed