Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-20102

“BUG: scheduling while atomic: …” due to rtnl_lock() being called from a RCU read-side critical section

    XMLWordPrintable

Details

    • 3
    • 9223372036854775807

    Description

      Here are the messages/stacks that we can get during Lustre/LNet startup on some nodes of our cluster running with Kernel 6.17 and Lustre 2.14.0-ddn242 :

      [Wed Apr  1 10:48:18 2026] BUG: scheduling while atomic: modprobe/57648/0x00000002
      [Wed Apr  1 10:48:18 2026] Modules linked in: ko2iblnd(OE) ptlrpc(OE+) obdclass(OE) lnet(OE) crc32_cryptoapi libcfs(OE) nvidia_uvm(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace netfs 8021q garp mrp stp llc rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) nvidia_modeset(OE) video aes_ce_blk aes_ce_cipher polyval_ce ghash_ce sm4_ce_gcm sm4_ce_ccm sm4_ce sm4_ce_cipher sm4 sm3_ce sha3_ce nvidia_cspmu dax_hmem sunrpc arm_smccc_trng ipmi_ssif binfmt_misc acpi_power_meter coresight_trbe cxl_acpi arm_smmuv3_pmu coresight arm_cspmu_module arm_spe_pmu cxl_port acpi_ipmi bonding ipmi_devintf mlx5_ib(OE) mlx5_fwctl(OE) cxl_core xhci_pci_renesas ast macsec fwctl nvidia_t410_cmem_latency_pmu i2c_tegra nvidia_t410_c2c_pmu ipmi_msghandler einj cfg80211 cppc_cpufreq sch_fq_codel nvidia_peermem(OE) nvidia(OE) ib_uverbs(OE) ib_core(OE) nvme_fabrics efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 virtiofs cuse overlay igb i2c_algo_bit gnss libie cdc_subset r8153_ecm cdc_ether usbnet r8152 ixgbe mdio xfrm_algo
      [Wed Apr  1 10:48:18 2026]  raid0 mlx5_core(OE) mlxfw(OE) tls nvme pci_hyperv_intf nvme_core psample nvme_keyring mlxdevm(OE) nvme_auth mlx_compat(OE) [last unloaded: mst_pci(OE)]
      [Wed Apr  1 10:48:18 2026] CPU: 189 UID: 0 PID: 57648 Comm: modprobe Kdump: loaded Tainted: G           OE       6.17.0-1012-nvidia-64k #12-Ubuntu PREEMPT(none) 
      [Wed Apr  1 10:48:18 2026] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
      [Wed Apr  1 10:48:18 2026] Hardware name: NVIDIA VR NVL72/P3809-BMC, BIOS NV_SBIOS: 4.7.0.0, OEM_SBIOS: 4.7.0.0 Sun Mar 15 09:16:18 PM UTC 2026
      [Wed Apr  1 10:48:18 2026] Call trace:
      [Wed Apr  1 10:48:18 2026]  show_stack+0x24/0x50 (C)
      [Wed Apr  1 10:48:18 2026]  dump_stack_lvl+0x80/0x140
      [Wed Apr  1 10:48:18 2026]  dump_stack+0x1c/0x38
      [Wed Apr  1 10:48:18 2026]  __schedule_bug+0x80/0xc8
      [Wed Apr  1 10:48:18 2026]  schedule_debug.isra.0+0x14c/0x170
      [Wed Apr  1 10:48:18 2026]  __schedule+0x94/0x8c0
      [Wed Apr  1 10:48:18 2026]  schedule+0x40/0xd8
      [Wed Apr  1 10:48:18 2026]  schedule_preempt_disabled+0x30/0x68
      [Wed Apr  1 10:48:18 2026]  __mutex_lock.constprop.0+0x4b0/0xc58
      [Wed Apr  1 10:48:18 2026]  __mutex_lock_slowpath+0x20/0x48
      [Wed Apr  1 10:48:18 2026]  mutex_lock+0x8c/0xc0
      [Wed Apr  1 10:48:18 2026]  rtnl_lock+0x24/0x48
      [Wed Apr  1 10:48:18 2026]  lnet_get_link_status+0x2c/0x100 [lnet]
      [Wed Apr  1 10:48:18 2026]  kiblnd_startup+0x1708/0x2590 [ko2iblnd]
      [Wed Apr  1 10:48:18 2026]  lnet_startup_lndnet+0x12c/0xcb0 [lnet]
      [Wed Apr  1 10:48:18 2026]  LNetNIInit+0xa18/0x10e0 [lnet]
      [Wed Apr  1 10:48:18 2026]  ptlrpc_init_portals+0x6c/0x4e8 [ptlrpc]
      [Wed Apr  1 10:48:18 2026]  ptlrpc_init+0x290/0xfff8 [ptlrpc]
      [Wed Apr  1 10:48:18 2026]  do_one_initcall+0x64/0x380
      [Wed Apr  1 10:48:18 2026]  do_init_module+0xa0/0x2e0
      [Wed Apr  1 10:48:18 2026]  load_module+0x7c8/0x8e0
      [Wed Apr  1 10:48:18 2026]  init_module_from_file+0x98/0x118
      [Wed Apr  1 10:48:18 2026]  idempotent_init_module+0x1fc/0x300
      [Wed Apr  1 10:48:18 2026]  __arm64_sys_finit_module+0x74/0x108
      [Wed Apr  1 10:48:18 2026]  invoke_syscall.constprop.0+0x7c/0xf8
      [Wed Apr  1 10:48:18 2026]  do_el0_svc+0x4c/0x100
      [Wed Apr  1 10:48:18 2026]  el0_svc+0x48/0x200
      [Wed Apr  1 10:48:18 2026]  el0t_64_sync_handler+0xc0/0x108
      [Wed Apr  1 10:48:18 2026]  el0t_64_sync+0x1b8/0x1c0
      [Wed Apr  1 10:48:18 2026] ------------[ cut here ]------------
      [Wed Apr  1 10:48:18 2026] Voluntary context switch within RCU read-side critical section!
      [Wed Apr  1 10:48:18 2026] WARNING: CPU: 189 PID: 57648 at kernel/rcu/tree_plugin.h:332 rcu_note_context_switch+0x2f4/0x330
      [Wed Apr  1 10:48:18 2026] Modules linked in: ko2iblnd(OE) ptlrpc(OE+) obdclass(OE) lnet(OE) crc32_cryptoapi libcfs(OE) nvidia_uvm(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace netfs 8021q garp mrp stp llc rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) nvidia_modeset(OE) video aes_ce_blk aes_ce_cipher polyval_ce ghash_ce sm4_ce_gcm sm4_ce_ccm sm4_ce sm4_ce_cipher sm4 sm3_ce sha3_ce nvidia_cspmu dax_hmem sunrpc arm_smccc_trng ipmi_ssif binfmt_misc acpi_power_meter coresight_trbe cxl_acpi arm_smmuv3_pmu coresight arm_cspmu_module arm_spe_pmu cxl_port acpi_ipmi bonding ipmi_devintf mlx5_ib(OE) mlx5_fwctl(OE) cxl_core xhci_pci_renesas ast macsec fwctl nvidia_t410_cmem_latency_pmu i2c_tegra nvidia_t410_c2c_pmu ipmi_msghandler einj cfg80211 cppc_cpufreq sch_fq_codel nvidia_peermem(OE) nvidia(OE) ib_uverbs(OE) ib_core(OE) nvme_fabrics efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 virtiofs cuse overlay igb i2c_algo_bit gnss libie cdc_subset r8153_ecm cdc_ether usbnet r8152 ixgbe mdio xfrm_algo
      [Wed Apr  1 10:48:18 2026]  raid0 mlx5_core(OE) mlxfw(OE) tls nvme pci_hyperv_intf nvme_core psample nvme_keyring mlxdevm(OE) nvme_auth mlx_compat(OE) [last unloaded: mst_pci(OE)]
      [Wed Apr  1 10:48:18 2026] CPU: 189 UID: 0 PID: 57648 Comm: modprobe Kdump: loaded Tainted: G        W  OE       6.17.0-1012-nvidia-64k #12-Ubuntu PREEMPT(none) 
      [Wed Apr  1 10:48:18 2026] Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
      [Wed Apr  1 10:48:18 2026] Hardware name: NVIDIA VR NVL72/P3809-BMC, BIOS NV_SBIOS: 4.7.0.0, OEM_SBIOS: 4.7.0.0 Sun Mar 15 09:16:18 PM UTC 2026
      [Wed Apr  1 10:48:18 2026] pstate: 614010c9 (nZCv daIF +PAN -UAO -TCO +DIT +SSBS BTYPE=--)
      [Wed Apr  1 10:48:18 2026] pc : rcu_note_context_switch+0x2f4/0x330
      [Wed Apr  1 10:48:18 2026] lr : rcu_note_context_switch+0x2f4/0x330
      [Wed Apr  1 10:48:18 2026] sp : ffff8002868cf470
      [Wed Apr  1 10:48:18 2026] x29: ffff8002868cf470 x28: ffffbff1b99341e0 x27: ffff000908cdaf80
      [Wed Apr  1 10:48:18 2026] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
      [Wed Apr  1 10:48:18 2026] x23: ffffbff1b52134d7 x22: ffff000908cdaf80 x21: ffffbff1b5267bb0
      [Wed Apr  1 10:48:18 2026] x20: ffff01540b1c9d40 x19: ffffbff1b40f9d40 x18: ffff8002868e00c0
      [Wed Apr  1 10:48:18 2026] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
      [Wed Apr  1 10:48:18 2026] x14: 0000000000000000 x13: 216e6f6974636573 x12: 206c616369746972
      [Wed Apr  1 10:48:18 2026] x11: 6320656469732d64 x10: 6165722055435220 x9 : 0000000000000000
      [Wed Apr  1 10:48:18 2026] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
      [Wed Apr  1 10:48:18 2026] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
      [Wed Apr  1 10:48:18 2026] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
      [Wed Apr  1 10:48:18 2026] Call trace:
      [Wed Apr  1 10:48:18 2026]  rcu_note_context_switch+0x2f4/0x330 (P)
      [Wed Apr  1 10:48:18 2026]  __schedule+0xb0/0x8c0
      [Wed Apr  1 10:48:18 2026]  schedule+0x40/0xd8
      [Wed Apr  1 10:48:18 2026]  schedule_preempt_disabled+0x30/0x68
      [Wed Apr  1 10:48:18 2026]  __mutex_lock.constprop.0+0x4b0/0xc58
      [Wed Apr  1 10:48:18 2026]  __mutex_lock_slowpath+0x20/0x48
      [Wed Apr  1 10:48:18 2026]  mutex_lock+0x8c/0xc0
      [Wed Apr  1 10:48:18 2026]  rtnl_lock+0x24/0x48
      [Wed Apr  1 10:48:18 2026]  lnet_get_link_status+0x2c/0x100 [lnet]
      [Wed Apr  1 10:48:18 2026]  kiblnd_startup+0x1708/0x2590 [ko2iblnd]
      [Wed Apr  1 10:48:18 2026]  lnet_startup_lndnet+0x12c/0xcb0 [lnet]
      [Wed Apr  1 10:48:18 2026]  LNetNIInit+0xa18/0x10e0 [lnet]
      [Wed Apr  1 10:48:18 2026]  ptlrpc_init_portals+0x6c/0x4e8 [ptlrpc]
      [Wed Apr  1 10:48:18 2026]  ptlrpc_init+0x290/0xfff8 [ptlrpc]
      [Wed Apr  1 10:48:18 2026]  do_one_initcall+0x64/0x380
      [Wed Apr  1 10:48:18 2026]  do_init_module+0xa0/0x2e0
      [Wed Apr  1 10:48:18 2026]  load_module+0x7c8/0x8e0
      [Wed Apr  1 10:48:18 2026]  init_module_from_file+0x98/0x118
      [Wed Apr  1 10:48:18 2026]  idempotent_init_module+0x1fc/0x300
      [Wed Apr  1 10:48:18 2026]  __arm64_sys_finit_module+0x74/0x108
      [Wed Apr  1 10:48:18 2026]  invoke_syscall.constprop.0+0x7c/0xf8
      [Wed Apr  1 10:48:18 2026]  do_el0_svc+0x4c/0x100
      [Wed Apr  1 10:48:18 2026]  el0_svc+0x48/0x200
      [Wed Apr  1 10:48:18 2026]  el0t_64_sync_handler+0xc0/0x108
      [Wed Apr  1 10:48:18 2026]  el0t_64_sync+0x1b8/0x1c0
      [Wed Apr  1 10:48:18 2026] ---[ end trace 0000000000000000 ]--- 

       

      Digging into concerned source code, we have found that kiblnd_startup() calls lnet_get_link_status() from inside a RCU read-side critical section, and then rtnl_lock() will be called and will try to _{}schedule() due to _rtnl_mutex being already busy, thus triggering the two conditions causing messages/stacks to be printed respectively in schedule_debug() and rcu_note_context_switch() during the context switch caused by the blocking mutex.

      The fact that lnet_get_link_status() now calls rtnl_lock() comes from a recent change (https://review.whamcloud.com/c/fs/lustre-release/+/63038/ , "aebedbe750 LU-19707 lnet: fix RTNL assert in LNet on v6.15") for LU-19707 .

      I am presently cooking a patch and will try to push it soon in order to fix this regression.

      Attachments

        Issue Links

          Activity

            People

              bfaccini-nvda Bruno Faccini
              bfaccini-nvda Bruno Faccini
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: