Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17650

KASAN: slab-out-of-bounds in unix_find_other on RHEL9.3

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • Lustre 2.16.0
    • None
    • 3
    • 9223372036854775807

    Description

      I am trying to add rhel 9.3 support to janitor and off the start I am hitting a KASAN highlighted problem:

      [  111.603361] BUG: KASAN: slab-out-of-bounds in unix_find_other+0x41e/0x630
      [  111.603367] Write of size 1 at addr ffff88810fc70e6e by task insmod/2783
      [  111.603369] 
      [  111.603371] CPU: 2 PID: 2783 Comm: insmod Kdump: loaded Tainted: G           OE     -------  ---  5.14.0rocky93-debug #4
      [  111.603375] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-1.fc39 04/01/2014
      [  111.603376] Call Trace:
      [  111.603378]  <TASK>
      [  111.603380]  ? unix_find_other+0x41e/0x630
      [  111.603383]  dump_stack_lvl+0x57/0x7d
      [  111.603388]  print_address_description.constprop.0+0x1f/0x1e0
      [  111.603394]  ? unix_find_other+0x41e/0x630
      [  111.603396]  print_report.cold+0x55/0x240
      [  111.603401]  kasan_report+0xc8/0x200
      [  111.603405]  ? unix_find_other+0x41e/0x630
      [  111.603409]  unix_find_other+0x41e/0x630
      [  111.603411]  ? unix_create1+0x5e0/0x870
      [  111.603414]  ? unix_stream_sendpage+0xac0/0xac0
      [  111.603416]  ? do_raw_spin_unlock+0x149/0x1f0
      [  111.603421]  ? skb_set_owner_w+0x1d2/0x300
      [  111.603427]  unix_stream_connect+0x26b/0x11b0
      [  111.603434]  check_gssd_socket+0x292/0x41b [ptlrpc_gss]
      [  111.603455]  ? ctx_init_pack_request.cold+0x22/0x22 [ptlrpc_gss]
      [  111.603479]  gss_init_svc_upcall+0xc6/0x129 [ptlrpc_gss]
      [  111.603497]  sptlrpc_gss_init+0x7d/0x1ec [ptlrpc_gss]
      [  111.603515]  ? 0xffffffffc0b70000
      [  111.603529]  ? 0xffffffffc0b70000
      [  111.603531]  do_one_initcall+0xf9/0x550
      [  111.603535]  ? perf_trace_initcall_level+0x3f0/0x3f0
      [  111.603540]  ? rcu_read_lock_sched_held+0x3f/0x70
      [  111.603544]  ? trace_kmalloc+0x38/0x100
      [  111.603546]  ? kmem_cache_alloc_trace+0x221/0x430
      [  111.603549]  ? kasan_unpoison+0x23/0x50
      [  111.603553]  do_init_module+0x1c8/0x7a0
      [  111.603558]  load_module+0x1ac4/0x1ee0
      [  111.603563]  ? post_relocation+0x390/0x390
      [  111.603565]  ? __lock_release+0x4bd/0x9f0
      [  111.603570]  ? kernel_read_file_from_fd+0x86/0xe0
      [  111.603575]  __do_sys_finit_module+0x110/0x1a0
      [  111.603577]  ? __ia32_sys_init_module+0xa0/0xa0
      [  111.603581]  ? vm_mmap_pgoff+0x188/0x210
      [  111.603589]  ? lockdep_hardirqs_on_prepare.part.0+0x18c/0x370
      [  111.603592]  ? syscall_enter_from_user_mode+0x22/0xb0
      [  111.603601]  ? lockdep_hardirqs_on+0x79/0x100
      [  111.603605]  do_syscall_64+0x56/0x80
      [  111.603609]  ? asm_exc_page_fault+0x22/0x30
      [  111.603613]  ? lockdep_hardirqs_on+0x79/0x100
      [  111.603616]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
      [  111.603619] RIP: 0033:0x7fc9cea3ee5d
      [  111.603623] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 93 af 1b 00 f7 d8 64 89 01 48
      [  111.603625] RSP: 002b:00007fffa893b188 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      [  111.603629] RAX: ffffffffffffffda RBX: 0000560b2baab810 RCX: 00007fc9cea3ee5d
      [  111.603631] RDX: 0000000000000000 RSI: 0000560b29f0a962 RDI: 0000000000000003
      [  111.603632] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
      [  111.603633] R10: 0000000000000003 R11: 0000000000000246 R12: 0000560b29f0a962
      [  111.603635] R13: 0000560b2baab7c0 R14: 0000560b29f09550 R15: 0000560b2baab920
      [  111.603641]  </TASK>
      [  111.603642] 
      [  111.603643] Allocated by task 2783:
      [  111.603644]  kasan_save_stack+0x1e/0x40
      [  111.603647]  __kasan_kmalloc+0x81/0xa0
      [  111.603649]  check_gssd_socket+0x167/0x41b [ptlrpc_gss]
      [  111.603668]  gss_init_svc_upcall+0xc6/0x129 [ptlrpc_gss]
      [  111.603685]  sptlrpc_gss_init+0x7d/0x1ec [ptlrpc_gss]
      [  111.603701]  do_one_initcall+0xf9/0x550
      [  111.603703]  do_init_module+0x1c8/0x7a0
      [  111.603705]  load_module+0x1ac4/0x1ee0
      [  111.603706]  __do_sys_finit_module+0x110/0x1a0
      [  111.603708]  do_syscall_64+0x56/0x80
      [  111.603710]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
      [  111.603712] 
      [  111.603713] The buggy address belongs to the object at ffff88810fc70e00
      [  111.603713]  which belongs to the cache kmalloc-128 of size 128
      [  111.603715] The buggy address is located 110 bytes inside of
      [  111.603715]  128-byte region [ffff88810fc70e00, ffff88810fc70e80)
      [  111.603717] 
      [  111.603717] The buggy address belongs to the physical page:
      [  111.603719] page:ffffea00043f1c00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88810fc70380 pfn:0x10fc70
      [  111.603721] head:ffffea00043f1c00 order:1 compound_mapcount:0 compound_pincount:0
      [  111.603723] flags: 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
      [  111.603728] raw: 0017ffffc0010200 ffffea0001fc6988 ffff888100040b50 ffff888100042c80
      [  111.603730] raw: ffff88810fc70380 000000000015000d 00000001ffffffff 0000000000000000
      [  111.603731] page dumped because: kasan: bad access detected
      [  111.603732] 
      [  111.603732] Memory state around the buggy address:
      [  111.603734]  ffff88810fc70d00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  111.603735]  ffff88810fc70d80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  111.603736] >ffff88810fc70e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 06 fc fc
      [  111.603737]                                                           ^
      [  111.603738]  ffff88810fc70e80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  111.603740]  ffff88810fc70f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      

      So the actual hit is in this code:

      static void unix_mkname_bsd(struct sockaddr_un *sunaddr, int addr_len)
      {
              /* This may look like an off by one error but it is a bit more
               * subtle.  108 is the longest valid AF_UNIX path for a binding.
               * sun_path[108] doesn't as such exist.  However in kernel space
               * we are guaranteed that it is a valid memory location in our
               * kernel address buffer because syscall functions always pass
               * a pointer of struct sockaddr_storage which has a bigger buffer
               * than 108.
               */
              ((char *)sunaddr)[addr_len] = 0;
      }
      

      The lustre part is this:

      static int check_gssd_socket(void)
      {
              struct sockaddr_un *sun;
      ...
              OBD_ALLOC(sun, sizeof(*sun));
              strncpy(sun->sun_path, GSS_SOCKET_PATH, sizeof(sun->sun_path));
              /* Try to connect to the socket */
              while (tries++ < 6) {
                      err = kernel_connect(sock, (struct sockaddr *)sun,
                                           sizeof(*sun), 0);
      

      So based on the commend in unix_mkname_bsd it sounds that we might be need to be allocating sockaddr_storage here?

      Attachments

        Issue Links

          Activity

            People

              green Oleg Drokin
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: