Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18187

BUG: KASAN: vmalloc-out-of-bounds in ldlm_reclaim_lock_cb+0xa46/0xa50 [ptlrpc]

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.16.0
    • None
    • 3
    • 9223372036854775807

    Description

      In ldlm_namespace_new() allocation of ns->ns_bucket_bits, an array
      of 'struct ldlm_ns_bucket' objects is not known or understood by
      cfs_hash_buckets_realloc() so it is not included in the reallocation.

      Further the use of cfs_hash_bd_extra_get() in ldlm_reclaim_lock_cb()
      to access struct ldlm_ns_bucket suggests that struct ldlm_ns_bucket
      should be allocated as part of the 'extra bits' and not handled
      as a separate array ns->ns_bucket_bits.

      [24635.842309] Lustre: DEBUG MARKER: == sanity test 134a: Server reclaims locks when reaching lock_reclaim_threshold ========================================================== 22:22:04 (1724944924)
      [24637.529022] Lustre: DEBUG MARKER: /sbin/lctl get_param -n debug
      [24639.842070] Lustre: DEBUG MARKER: /sbin/lctl set_param -n debug=0
      [24651.685867] Lustre: DEBUG MARKER: /sbin/lctl set_param -n debug=trace+inode+super+iotrace+malloc+cache+info+ioctl+neterror+net+warning+buffs+other+dentry+nettrace+page+dlmtrace+error+emerg+ha+rpctrace+vfstrace+reada+mmap+config+console+quota+sec+lfsck+hsm+snapshot+layout
      [24653.380300] systemd-journald[534]: Data hash table of /run/log/journal/6444c384ecd94ca7835367c9a510ccb1/system.journal has a fill level at 75.0 (10267 of 13688 items, 7884800 file size, 767 bytes per hash table item), suggesting rotation.
      [24653.392432] systemd-journald[534]: /run/log/journal/6444c384ecd94ca7835367c9a510ccb1/system.journal: Journal header limits reached or header out-of-date, rotating.
      [24654.426243] Lustre: DEBUG MARKER: /sbin/lctl set_param fail_loc=0x327
      [24656.376443] Lustre: DEBUG MARKER: /sbin/lctl set_param fail_val=500
      [24657.241314] Lustre: *** cfs_fail_loc=327, val=500***
      [24657.244887] ==================================================================
      [24657.248093] BUG: KASAN: vmalloc-out-of-bounds in ldlm_reclaim_lock_cb+0xa46/0xa50 [ptlrpc]
      [24657.252137] Read of size 4 at addr ffffc90001c79120 by task mdt00_003/126704
      
      [24657.259627] CPU: 2 PID: 126704 Comm: mdt00_003 Kdump: loaded Tainted: G        W  OE      6.10.6-1.ldiskfs.el9.x86_64 #1
      [24657.263709] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
      [24657.267368] Call Trace:
      [24657.271333]  <TASK>
      [24657.276588]  dump_stack_lvl+0x75/0xb0
      [24657.281782]  print_address_description.constprop.0+0x2c/0x390
      [24657.285386]  ? ldlm_reclaim_lock_cb+0xa46/0xa50 [ptlrpc]
      [24657.288734]  print_report+0xb4/0x270
      [24657.291603]  ? ldlm_reclaim_lock_cb+0xa46/0xa50 [ptlrpc]
      [24657.294797]  ? kasan_addr_to_slab+0x9/0xa0
      [24657.297719]  kasan_report+0x89/0xc0
      [24657.300556]  ? ldlm_reclaim_lock_cb+0xa46/0xa50 [ptlrpc]
      [24657.303703]  ldlm_reclaim_lock_cb+0xa46/0xa50 [ptlrpc]
      [24657.307347]  ? rcu_is_watching+0x11/0xb0
      [24657.309941]  cfs_hash_for_each_relax+0x708/0xf10 [libcfs]
      [24657.312902]  ? __pfx_ldlm_reclaim_lock_cb+0x10/0x10 [ptlrpc]
      [24657.316063]  ? __pfx_cfs_hash_for_each_relax+0x10/0x10 [libcfs]
      [24657.318706]  ? __pfx_ldlm_reclaim_lock_cb+0x10/0x10 [ptlrpc]
      [24657.321435]  cfs_hash_for_each_nolock+0x33d/0x590 [libcfs]
      [24657.323761]  ? __pfx_cfs_hash_for_each_nolock+0x10/0x10 [libcfs]
      [24657.326158]  ? __pfx_server_name2index+0x10/0x10 [obdclass]
      [24657.328986]  ? __mutex_lock+0x261/0x1660
      [24657.331676]  ldlm_reclaim_res+0x45b/0xa00 [ptlrpc]
      [24657.334171]  ? __pfx_ldlm_reclaim_res+0x10/0x10 [ptlrpc]
      [24657.336653]  ? __pfx___mutex_unlock_slowpath+0x10/0x10
      [24657.338626]  ldlm_reclaim_ns+0x213/0x5a0 [ptlrpc]
      [24657.340860]  ? __pfx_ldlm_reclaim_ns+0x10/0x10 [ptlrpc]
      [24657.343053]  ? _raw_spin_unlock_irqrestore+0x3d/0x60
      [24657.344828]  ? __percpu_counter_sum+0x145/0x1e0
      [24657.346729]  ldlm_reclaim_full+0x150/0x350 [ptlrpc]
      [24657.348931]  ldlm_handle_enqueue+0x4bf/0x4190 [ptlrpc]
      [24657.351133]  ? __pfx_ldlm_handle_enqueue+0x10/0x10 [ptlrpc]
      [24657.353287]  ? __req_capsule_get+0x249/0x7a0 [ptlrpc]
      [24657.355495]  tgt_enqueue+0x17d/0x610 [ptlrpc]
      [24657.357735]  tgt_handle_request0+0x2d4/0x1390 [ptlrpc]
      [24657.359832]  tgt_request_handle+0x714/0x1e70 [ptlrpc]
      [24657.361767]  ? __pfx_tgt_request_handle+0x10/0x10 [ptlrpc]
      [24657.363892]  ptlrpc_server_handle_request.isra.0+0xa87/0x2270 [ptlrpc]
      [24657.365775]  ptlrpc_main+0x1ae7/0x2df0 [ptlrpc]
      [24657.367831]  ? __kthread_parkme+0xc4/0x200
      [24657.369288]  ? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc]
      [24657.371245]  kthread+0x2f3/0x3e0
      [24657.372633]  ? trace_irq_enable.constprop.0+0xd2/0x110
      [24657.374173]  ? __pfx_kthread+0x10/0x10
      [24657.375693]  ret_from_fork+0x2d/0x70
      [24657.377288]  ? __pfx_kthread+0x10/0x10
      [24657.378641]  ret_from_fork_asm+0x1a/0x30
      [24657.379996]  </TASK>
      
      [24657.382888] The buggy address belongs to the virtual mapping at
                      [ffffc90001c39000, ffffc90001c7b000) created by:
                      cfs_hash_buckets_realloc.part.0+0x840/0x1050 [libcfs]
      
      [24657.388752] The buggy address belongs to the physical page:
      [24657.390063] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88811873d600 pfn:0x11873c
      [24657.391829] flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
      [24657.393267] raw: 0017ffffc0000000 0000000000000000 dead000000000122 0000000000000000
      [24657.394683] raw: ffff88811873d600 0000000000000000 00000001ffffffff 0000000000000000
      [24657.396367] page dumped because: kasan: bad access detected
      
      [24657.398947] Memory state around the buggy address:
      [24657.400247]  ffffc90001c79000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [24657.401551]  ffffc90001c79080: 00 00 00 00 00 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
      [24657.403276] >ffffc90001c79100: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
      [24657.404545]                                ^
      [24657.405820]  ffffc90001c79180: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
      [24657.407267]  ffffc90001c79200: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
      [24657.408567] ==================================================================
      [24658.406808] Lustre: *** cfs_fail_loc=327, val=500***
      
      

      Attachments

        Issue Links

          Activity

            People

              stancheff Shaun Tancheff
              stancheff Shaun Tancheff
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: