Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17651

KASAN: use-after-free in lnet_net_remove_cpts

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.17.0
    • None
    • 3
    • 9223372036854775807

    Description

      trying out rhel9.3 with KASAN, hitting this in sanity-lnet test 301 on the client:

       [ 2173.880150] Lustre: DEBUG MARKER: == sanity-lnet test 301: Check for dynamic adds of same/wrong interface (memory leak) ========================================================== 04:31:20 (1710837080)
      [ 2173.990320] Lustre: DEBUG MARKER: /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl lnet unconfigure
      [ 2174.026387] Lustre: DEBUG MARKER: /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl lnet configure
      [ 2174.118312] Lustre: DEBUG MARKER: /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl net add --net tcp --if ens2
      [ 2174.141167] LNet: Added LNI 192.168.205.1@tcp [8/256/0/180]
      [ 2174.141947] LNet: Accept secure, port 988
      [ 2175.112023] Lustre: DEBUG MARKER: /home/green/git/lustre-release/lustre/../lnet/utils/lnetctl net add --net tcp --if ens2
      [ 2175.135198] ==================================================================
      [ 2175.135206] BUG: KASAN: use-after-free in lnet_net_remove_cpts.constprop.0+0x774/0x7f0 [lnet]
      [ 2175.135286] Read of size 8 at addr ffff888005be7a50 by task lnetctl/79304
      [ 2175.135290] 
      [ 2175.135294] CPU: 1 PID: 79304 Comm: lnetctl Kdump: loaded Tainted: G        W  OE     -------  ---  5.14.0rocky93-debug #4
      [ 2175.135299] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-1.fc39 04/01/2014
      [ 2175.135303] Call Trace:
      [ 2175.135306]  <TASK>
      [ 2175.135310]  ? lnet_net_remove_cpts.constprop.0+0x774/0x7f0 [lnet]
      [ 2175.135384]  dump_stack_lvl+0x57/0x7d
      [ 2175.135397]  print_address_description.constprop.0+0x1f/0x1e0
      [ 2175.135408]  ? lnet_net_remove_cpts.constprop.0+0x774/0x7f0 [lnet]
      [ 2175.135463]  print_report.cold+0x55/0x240
      [ 2175.135473]  kasan_report+0xc8/0x200
      [ 2175.135484]  ? lnet_net_remove_cpts.constprop.0+0x774/0x7f0 [lnet]
      [ 2175.135542]  lnet_net_remove_cpts.constprop.0+0x774/0x7f0 [lnet]
      [ 2175.135602]  lnet_ni_free+0x6a/0x620 [lnet]
      [ 2175.135660]  lnet_dyn_add_ni+0x29d/0x370 [lnet]
      [ 2175.135718]  lnet_genl_parse_local_ni+0x6ee/0x32c0 [lnet]
      [ 2175.135777]  ? kernel_text_address+0x116/0x130
      [ 2175.135786]  ? lnet_dyn_del_ni+0x980/0x980 [lnet]
      [ 2175.135841]  ? cfs_ip_addr_match+0xb0/0xb0 [lnet]
      [ 2175.135897]  ? arch_stack_walk+0x98/0xf0
      [ 2175.135909]  ? libcfs_str2net+0x5f/0x90 [lnet]
      [ 2175.135965]  ? libcfs_str2net_internal+0x2e0/0x2e0 [lnet]
      [ 2175.136021]  ? nla_strcmp+0x1c/0xe0
      [ 2175.136032]  lnet_net_cmd+0x7a8/0x1150 [lnet]
      [ 2175.136091]  ? lnet_dyn_del_net+0x410/0x410 [lnet]
      [ 2175.136181]  ? avc_has_extended_perms+0xe30/0xe30
      [ 2175.136187]  ? rcu_read_unlock+0x60/0x60
      [ 2175.136195]  ? unwind_next_frame+0xc6d/0x1e30
      [ 2175.136202]  ? entry_SYSCALL_64_after_hwframe+0x46/0xb0
      [ 2175.136213]  ? cred_has_capability.isra.0+0xfe/0x200
      [ 2175.136219]  ? is_bpf_text_address+0x6a/0xe0
      [ 2175.136229]  genl_family_rcv_msg_doit.isra.0+0x1be/0x290
      [ 2175.136237]  ? genl_validate_assign_mc_groups+0x650/0x650
      [ 2175.136248]  ? security_capable+0x50/0x90
      [ 2175.136257]  genl_family_rcv_msg+0x335/0x530
      [ 2175.136263]  ? genl_family_rcv_msg_doit.isra.0+0x290/0x290
      [ 2175.136269]  ? lnet_dyn_del_net+0x410/0x410 [lnet]
      [ 2175.136325]  ? __alloc_skb+0x10b/0x2b0
      [ 2175.136343]  ? netlink_sendmsg+0x817/0xc90
      [ 2175.136347]  ? sock_sendmsg+0xb2/0xe0
      [ 2175.136352]  ? ____sys_sendmsg+0x5d3/0x7c0
      [ 2175.136356]  ? ___sys_sendmsg+0xee/0x170
      [ 2175.136361]  ? __sys_sendmsg+0xc9/0x160
      [ 2175.136364]  ? do_syscall_64+0x56/0x80
      [ 2175.136371]  ? entry_SYSCALL_64_after_hwframe+0x46/0xb0
      [ 2175.136379]  genl_rcv_msg+0x9f/0x130
      [ 2175.136386]  netlink_rcv_skb+0x12b/0x390
      [ 2175.136390]  ? genl_family_rcv_msg+0x530/0x530
      [ 2175.136397]  ? netlink_ack+0x750/0x750
      [ 2175.136401]  ? rhashtable_rehash_table+0x4a0/0x4a0
      [ 2175.136415]  ? netlink_lookup+0x1c5/0x330
      [ 2175.136424]  genl_rcv+0x24/0x40
      [ 2175.136429]  netlink_unicast+0x430/0x710
      [ 2175.136436]  ? netlink_attachskb+0x740/0x740
      [ 2175.136440]  ? check_heap_object+0xee/0x480
      [ 2175.136452]  netlink_sendmsg+0x73c/0xc90
      [ 2175.136460]  ? netlink_unicast+0x710/0x710
      [ 2175.136466]  ? __import_iovec+0x69/0x690
      [ 2175.136474]  ? netlink_unicast+0x710/0x710
      [ 2175.136480]  sock_sendmsg+0xb2/0xe0
      [ 2175.136485]  ____sys_sendmsg+0x5d3/0x7c0
      [ 2175.136491]  ? kernel_sendmsg+0x30/0x30
      [ 2175.136495]  ? __copy_msghdr+0x3c0/0x3c0
      [ 2175.136502]  ? filemap_map_pages+0x6b0/0xf80
      [ 2175.136513]  ___sys_sendmsg+0xee/0x170
      [ 2175.136519]  ? __ia32_sys_recvmmsg+0x210/0x210
      [ 2175.136526]  ? netlink_setsockopt+0x2df/0x990
      [ 2175.136531]  ? genl_validate_ops+0x620/0x620
      [ 2175.136537]  ? filemap_map_pmd+0x850/0x850
      [ 2175.136542]  ? do_read_fault+0x23c/0x4e0
      [ 2175.136548]  ? netlink_realloc_groups+0x2c0/0x2c0
      [ 2175.136555]  ? do_fault+0x204/0x850
      [ 2175.136562]  ? __handle_mm_fault+0xa1f/0xe60
      [ 2175.136569]  ? __fget_light+0x51/0x230
      [ 2175.136577]  ? sockfd_lookup_light+0x1a/0x140
      [ 2175.136583]  __sys_sendmsg+0xc9/0x160
      [ 2175.136588]  ? __sys_sendmsg_sock+0x20/0x20
      [ 2175.136601]  ? rcu_read_lock_sched_held+0x12/0x70
      [ 2175.136606]  ? syscall_enter_from_user_mode+0x1d/0xb0
      [ 2175.136611]  ? trace_hardirqs_on+0x2d/0x160
      [ 2175.136618]  do_syscall_64+0x56/0x80
      [ 2175.136624]  ? do_user_addr_fault+0x367/0xde0
      [ 2175.136629]  ? rcu_read_lock_sched_held+0x12/0x70
      [ 2175.136634]  ? rcu_read_lock_sched_held+0x12/0x70
      [ 2175.136638]  ? irqentry_exit_to_user_mode+0xa/0x40
      [ 2175.136643]  ? trace_hardirqs_on_prepare+0xb5/0x210
      [ 2175.136649]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
      [ 2175.136654] RIP: 0033:0x7ff6e7f4f9a7
      [ 2175.136676] Code: 0a 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
      [ 2175.136681] RSP: 002b:00007ffd10923898 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [ 2175.136688] RAX: ffffffffffffffda RBX: 00000000004b1430 RCX: 00007ff6e7f4f9a7
      [ 2175.136691] RDX: 0000000000000000 RSI: 00007ffd109238d0 RDI: 0000000000000003
      [ 2175.136693] RBP: 00000000004b1340 R08: 00000000e3fbfff8 R09: 0000000000000000
      [ 2175.136696] R10: 00007ff6e80c70c0 R11: 0000000000000246 R12: 00000000004df1f0
      [ 2175.136698] R13: 00007ffd109238d0 R14: 000000000042a1d4 R15: 00000000004deb1a
      [ 2175.136708]  </TASK>
      [ 2175.136710] 
      [ 2175.136712] Allocated by task 79304:
      [ 2175.136715]  kasan_save_stack+0x1e/0x40
      [ 2175.136720]  __kasan_kmalloc+0x81/0xa0
      [ 2175.136723]  lnet_net_alloc+0x1b9/0x940 [lnet]
      [ 2175.136779]  lnet_dyn_add_ni+0x70/0x370 [lnet]
      [ 2175.136832]  lnet_genl_parse_local_ni+0x6ee/0x32c0 [lnet]
      [ 2175.136885]  lnet_net_cmd+0x7a8/0x1150 [lnet]
      [ 2175.136938]  genl_family_rcv_msg_doit.isra.0+0x1be/0x290
      [ 2175.136943]  genl_family_rcv_msg+0x335/0x530
      [ 2175.136947]  genl_rcv_msg+0x9f/0x130
      [ 2175.136950]  netlink_rcv_skb+0x12b/0x390
      [ 2175.136953]  genl_rcv+0x24/0x40
      [ 2175.136957]  netlink_unicast+0x430/0x710
      [ 2175.136960]  netlink_sendmsg+0x73c/0xc90
      [ 2175.136963]  sock_sendmsg+0xb2/0xe0
      [ 2175.136966]  ____sys_sendmsg+0x5d3/0x7c0
      [ 2175.136969]  ___sys_sendmsg+0xee/0x170
      [ 2175.136973]  __sys_sendmsg+0xc9/0x160
      [ 2175.136976]  do_syscall_64+0x56/0x80
      [ 2175.136979]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
      [ 2175.136984] 
      [ 2175.136985] Freed by task 79304:
      [ 2175.136987]  kasan_save_stack+0x1e/0x40
      [ 2175.136991]  kasan_set_track+0x21/0x30
      [ 2175.136994]  kasan_set_free_info+0x20/0x30
      [ 2175.136999]  ____kasan_slab_free+0x14a/0x1a0
      [ 2175.137003]  slab_free_freelist_hook+0x11d/0x1d0
      [ 2175.137006]  kfree+0xec/0x4a0
      [ 2175.137009]  lnet_startup_lndnet+0x531/0xa60 [lnet]
      [ 2175.137061]  lnet_add_net_common+0x115/0x7c0 [lnet]
      [ 2175.137114]  lnet_dyn_add_ni+0x27a/0x370 [lnet]
      [ 2175.137167]  lnet_genl_parse_local_ni+0x6ee/0x32c0 [lnet]
      [ 2175.137220]  lnet_net_cmd+0x7a8/0x1150 [lnet]
      [ 2175.137273]  genl_family_rcv_msg_doit.isra.0+0x1be/0x290
      [ 2175.137277]  genl_family_rcv_msg+0x335/0x530
      [ 2175.137281]  genl_rcv_msg+0x9f/0x130
      [ 2175.137285]  netlink_rcv_skb+0x12b/0x390
      [ 2175.137288]  genl_rcv+0x24/0x40
      [ 2175.137292]  netlink_unicast+0x430/0x710
      [ 2175.137295]  netlink_sendmsg+0x73c/0xc90
      [ 2175.137298]  sock_sendmsg+0xb2/0xe0
      [ 2175.137301]  ____sys_sendmsg+0x5d3/0x7c0
      [ 2175.137304]  ___sys_sendmsg+0xee/0x170
      [ 2175.137308]  __sys_sendmsg+0xc9/0x160
      [ 2175.137311]  do_syscall_64+0x56/0x80
      [ 2175.137314]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
      [ 2175.137318] 
      [ 2175.137319] Last potentially related work creation:
      [ 2175.137321]  kasan_save_stack+0x1e/0x40
      [ 2175.137325]  __kasan_record_aux_stack+0x96/0xa0
      [ 2175.137341]  kvfree_call_rcu+0x79/0x7a0
      [ 2175.137346]  drop_sysctl_table+0x334/0x460
      [ 2175.137350]  unregister_sysctl_table+0x9c/0x170
      [ 2175.137353]  neigh_sysctl_unregister+0x56/0x80
      [ 2175.137358]  addrconf_ifdown.isra.0+0xf3d/0x1370
      [ 2175.137364]  addrconf_notify+0x1f0/0x1000
      [ 2175.137367]  notifier_call_chain+0x99/0x170
      [ 2175.137371]  unregister_netdevice_many+0x580/0x11a0
      [ 2175.137374]  default_device_exit_batch+0x2ad/0x360
      [ 2175.137377]  cleanup_net+0x428/0x990
      [ 2175.137382]  process_one_work+0x8e2/0x1510
      [ 2175.137387]  worker_thread+0x598/0xf70
      [ 2175.137391]  kthread+0x2a4/0x340
      [ 2175.137395]  ret_from_fork+0x1f/0x30
      [ 2175.137401] 
      [ 2175.137402] Second to last potentially related work creation:
      [ 2175.137404]  kasan_save_stack+0x1e/0x40
      [ 2175.137407]  __kasan_record_aux_stack+0x96/0xa0
      [ 2175.137412]  kvfree_call_rcu+0x79/0x7a0
      [ 2175.137415]  drop_sysctl_table+0x334/0x460
      [ 2175.137418]  unregister_sysctl_table+0x9c/0x170
      [ 2175.137420]  addrconf_sysctl_unregister+0xe9/0x1b0
      [ 2175.137424]  addrconf_ifdown.isra.0+0xf3d/0x1370
      [ 2175.137427]  addrconf_notify+0x1f0/0x1000
      [ 2175.137431]  notifier_call_chain+0x99/0x170
      [ 2175.137433]  unregister_netdevice_many+0x580/0x11a0
      [ 2175.137437]  default_device_exit_batch+0x2ad/0x360
      [ 2175.137440]  cleanup_net+0x428/0x990
      [ 2175.137443]  process_one_work+0x8e2/0x1510
      [ 2175.137447]  worker_thread+0x598/0xf70
      [ 2175.137451]  kthread+0x2a4/0x340
      [ 2175.137455]  ret_from_fork+0x1f/0x30
      [ 2175.137459] 
      [ 2175.137460] The buggy address belongs to the object at ffff888005be7a00
      [ 2175.137460]  which belongs to the cache kmalloc-256 of size 256
      [ 2175.137463] The buggy address is located 80 bytes inside of
      [ 2175.137463]  256-byte region [ffff888005be7a00, ffff888005be7b00)
      [ 2175.137467] 
      [ 2175.137468] The buggy address belongs to the physical page:
      [ 2175.137470] page:ffffea000016f900 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888005be7700 pfn:0x5be4
      [ 2175.137475] head:ffffea000016f900 order:2 compound_mapcount:0 compound_pincount:0
      [ 2175.137478] flags: 0xfffffc0010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff)
      [ 2175.137488] raw: 000fffffc0010200 ffffea000436af08 ffff888100040e50 ffff888100043000
      [ 2175.137492] raw: ffff888005be7700 0000000000150013 00000001ffffffff 0000000000000000
      [ 2175.137494] page dumped because: kasan: bad access detected
      [ 2175.137495] 
      [ 2175.137496] Memory state around the buggy address:
      [ 2175.137498]  ffff888005be7900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [ 2175.137500]  ffff888005be7980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [ 2175.137503] >ffff888005be7a00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [ 2175.137505]                                                  ^
      [ 2175.137507]  ffff888005be7a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [ 2175.137509]  ffff888005be7b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [ 2175.137511] ==================================================================

      Attachments

        Issue Links

          Activity

            [LU-17651] KASAN: use-after-free in lnet_net_remove_cpts

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/57861/
            Subject: LU-17651 lnet: Fix KASAN use-after-free in lnet_net_remove_cpts
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 239c03e7157b801f0050ff13636aaed292263616

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/57861/ Subject: LU-17651 lnet: Fix KASAN use-after-free in lnet_net_remove_cpts Project: fs/lustre-release Branch: master Current Patch Set: Commit: 239c03e7157b801f0050ff13636aaed292263616

            James have a working patch for this issue. (putting it in James's name)

            arshad512 Arshad Hussain added a comment - James have a working patch for this issue. (putting it in James's name)

            "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57861
            Subject: LU-17651 lnet: set ni->ni_ncpts to zero after cpts are freed
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: c4a877b1e276214d09e60623b5da62679f119e29

            gerrit Gerrit Updater added a comment - "James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57861 Subject: LU-17651 lnet: set ni->ni_ncpts to zero after cpts are freed Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c4a877b1e276214d09e60623b5da62679f119e29

            "Arshad Hussain <arshad.hussain@aeoncomputing.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54961
            Subject: LU-17651 lnet: Debug1 KASAN use-after-free
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: d9185b4833f13663ebc58f26a799e224d51a0b3b

            gerrit Gerrit Updater added a comment - "Arshad Hussain <arshad.hussain@aeoncomputing.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54961 Subject: LU-17651 lnet: Debug1 KASAN use-after-free Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: d9185b4833f13663ebc58f26a799e224d51a0b3b

            "Arshad Hussain <arshad.hussain@aeoncomputing.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54960
            Subject: LU-17651 lnet: Debug KASAN use-after-free
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 605c664d134487a72d85bf9d37fbbc210d5930ed

            gerrit Gerrit Updater added a comment - "Arshad Hussain <arshad.hussain@aeoncomputing.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54960 Subject: LU-17651 lnet: Debug KASAN use-after-free Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 605c664d134487a72d85bf9d37fbbc210d5930ed

            Andreas, sure. I am looking into this.

            arshad512 Arshad Hussain added a comment - Andreas, sure. I am looking into this.

            arshad512, I'm not sure what your availability looks like, but Oleg has found a few KASAN (Kernel Address Sanitizer) bugs as he is bringing up el9.3 Janitor testing.

            I think these would be quite useful to fix, so that we can get "clean" runs on Janitor and make it easy to see new regressions, and hopefully not too time consuming.

            adilger Andreas Dilger added a comment - arshad512 , I'm not sure what your availability looks like, but Oleg has found a few KASAN (Kernel Address Sanitizer) bugs as he is bringing up el9.3 Janitor testing. I think these would be quite useful to fix, so that we can get "clean" runs on Janitor and make it easy to see new regressions, and hopefully not too time consuming.

            People

              simmonsja James A Simmons
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: