Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.16.0
-
None
-
3
-
9223372036854775807
Description
In ldlm_namespace_new() allocation of ns->ns_bucket_bits, an array
of 'struct ldlm_ns_bucket' objects is not known or understood by
cfs_hash_buckets_realloc() so it is not included in the reallocation.
Further the use of cfs_hash_bd_extra_get() in ldlm_reclaim_lock_cb()
to access struct ldlm_ns_bucket suggests that struct ldlm_ns_bucket
should be allocated as part of the 'extra bits' and not handled
as a separate array ns->ns_bucket_bits.
[24635.842309] Lustre: DEBUG MARKER: == sanity test 134a: Server reclaims locks when reaching lock_reclaim_threshold ========================================================== 22:22:04 (1724944924) [24637.529022] Lustre: DEBUG MARKER: /sbin/lctl get_param -n debug [24639.842070] Lustre: DEBUG MARKER: /sbin/lctl set_param -n debug=0 [24651.685867] Lustre: DEBUG MARKER: /sbin/lctl set_param -n debug=trace+inode+super+iotrace+malloc+cache+info+ioctl+neterror+net+warning+buffs+other+dentry+nettrace+page+dlmtrace+error+emerg+ha+rpctrace+vfstrace+reada+mmap+config+console+quota+sec+lfsck+hsm+snapshot+layout [24653.380300] systemd-journald[534]: Data hash table of /run/log/journal/6444c384ecd94ca7835367c9a510ccb1/system.journal has a fill level at 75.0 (10267 of 13688 items, 7884800 file size, 767 bytes per hash table item), suggesting rotation. [24653.392432] systemd-journald[534]: /run/log/journal/6444c384ecd94ca7835367c9a510ccb1/system.journal: Journal header limits reached or header out-of-date, rotating. [24654.426243] Lustre: DEBUG MARKER: /sbin/lctl set_param fail_loc=0x327 [24656.376443] Lustre: DEBUG MARKER: /sbin/lctl set_param fail_val=500 [24657.241314] Lustre: *** cfs_fail_loc=327, val=500*** [24657.244887] ================================================================== [24657.248093] BUG: KASAN: vmalloc-out-of-bounds in ldlm_reclaim_lock_cb+0xa46/0xa50 [ptlrpc] [24657.252137] Read of size 4 at addr ffffc90001c79120 by task mdt00_003/126704 [24657.259627] CPU: 2 PID: 126704 Comm: mdt00_003 Kdump: loaded Tainted: G W OE 6.10.6-1.ldiskfs.el9.x86_64 #1 [24657.263709] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 [24657.267368] Call Trace: [24657.271333] <TASK> [24657.276588] dump_stack_lvl+0x75/0xb0 [24657.281782] print_address_description.constprop.0+0x2c/0x390 [24657.285386] ? ldlm_reclaim_lock_cb+0xa46/0xa50 [ptlrpc] [24657.288734] print_report+0xb4/0x270 [24657.291603] ? ldlm_reclaim_lock_cb+0xa46/0xa50 [ptlrpc] [24657.294797] ? kasan_addr_to_slab+0x9/0xa0 [24657.297719] kasan_report+0x89/0xc0 [24657.300556] ? ldlm_reclaim_lock_cb+0xa46/0xa50 [ptlrpc] [24657.303703] ldlm_reclaim_lock_cb+0xa46/0xa50 [ptlrpc] [24657.307347] ? rcu_is_watching+0x11/0xb0 [24657.309941] cfs_hash_for_each_relax+0x708/0xf10 [libcfs] [24657.312902] ? __pfx_ldlm_reclaim_lock_cb+0x10/0x10 [ptlrpc] [24657.316063] ? __pfx_cfs_hash_for_each_relax+0x10/0x10 [libcfs] [24657.318706] ? __pfx_ldlm_reclaim_lock_cb+0x10/0x10 [ptlrpc] [24657.321435] cfs_hash_for_each_nolock+0x33d/0x590 [libcfs] [24657.323761] ? __pfx_cfs_hash_for_each_nolock+0x10/0x10 [libcfs] [24657.326158] ? __pfx_server_name2index+0x10/0x10 [obdclass] [24657.328986] ? __mutex_lock+0x261/0x1660 [24657.331676] ldlm_reclaim_res+0x45b/0xa00 [ptlrpc] [24657.334171] ? __pfx_ldlm_reclaim_res+0x10/0x10 [ptlrpc] [24657.336653] ? __pfx___mutex_unlock_slowpath+0x10/0x10 [24657.338626] ldlm_reclaim_ns+0x213/0x5a0 [ptlrpc] [24657.340860] ? __pfx_ldlm_reclaim_ns+0x10/0x10 [ptlrpc] [24657.343053] ? _raw_spin_unlock_irqrestore+0x3d/0x60 [24657.344828] ? __percpu_counter_sum+0x145/0x1e0 [24657.346729] ldlm_reclaim_full+0x150/0x350 [ptlrpc] [24657.348931] ldlm_handle_enqueue+0x4bf/0x4190 [ptlrpc] [24657.351133] ? __pfx_ldlm_handle_enqueue+0x10/0x10 [ptlrpc] [24657.353287] ? __req_capsule_get+0x249/0x7a0 [ptlrpc] [24657.355495] tgt_enqueue+0x17d/0x610 [ptlrpc] [24657.357735] tgt_handle_request0+0x2d4/0x1390 [ptlrpc] [24657.359832] tgt_request_handle+0x714/0x1e70 [ptlrpc] [24657.361767] ? __pfx_tgt_request_handle+0x10/0x10 [ptlrpc] [24657.363892] ptlrpc_server_handle_request.isra.0+0xa87/0x2270 [ptlrpc] [24657.365775] ptlrpc_main+0x1ae7/0x2df0 [ptlrpc] [24657.367831] ? __kthread_parkme+0xc4/0x200 [24657.369288] ? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc] [24657.371245] kthread+0x2f3/0x3e0 [24657.372633] ? trace_irq_enable.constprop.0+0xd2/0x110 [24657.374173] ? __pfx_kthread+0x10/0x10 [24657.375693] ret_from_fork+0x2d/0x70 [24657.377288] ? __pfx_kthread+0x10/0x10 [24657.378641] ret_from_fork_asm+0x1a/0x30 [24657.379996] </TASK> [24657.382888] The buggy address belongs to the virtual mapping at [ffffc90001c39000, ffffc90001c7b000) created by: cfs_hash_buckets_realloc.part.0+0x840/0x1050 [libcfs] [24657.388752] The buggy address belongs to the physical page: [24657.390063] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88811873d600 pfn:0x11873c [24657.391829] flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff) [24657.393267] raw: 0017ffffc0000000 0000000000000000 dead000000000122 0000000000000000 [24657.394683] raw: ffff88811873d600 0000000000000000 00000001ffffffff 0000000000000000 [24657.396367] page dumped because: kasan: bad access detected [24657.398947] Memory state around the buggy address: [24657.400247] ffffc90001c79000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [24657.401551] ffffc90001c79080: 00 00 00 00 00 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 [24657.403276] >ffffc90001c79100: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 [24657.404545] ^ [24657.405820] ffffc90001c79180: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 [24657.407267] ffffc90001c79200: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 [24657.408567] ================================================================== [24658.406808] Lustre: *** cfs_fail_loc=327, val=500***
Attachments
Issue Links
- is related to
-
LU-8130 Migrate from libcfs hash to rhashtable
- Open