Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
None
-
Rockylinux8.7 (4.18.0-425.13.1.el8_7.x86_64)
OFED 5.8-1.1.2.1
master (commit:d7d1644)
-
3
-
9223372036854775807
Description
kernel crashes if MR enabled with logical interfaces.
options lnet networks="o2ib12(ib0,ib0:1)" # modprobe lustre
[ 143.167995] LNet: Using FastReg for registration [ 143.648939] LNet: Added LNI 10.0.0.1@o2ib12 [8/512/0/180] [ 144.128240] BUG: unable to handle kernel NULL pointer dereference at 00000000000004f0 [ 144.136091] PGD 0 P4D 0 [ 144.138631] Oops: 0000 [#1] SMP NOPTI [ 144.142299] CPU: 7 PID: 2739 Comm: modprobe Kdump: loaded Tainted: G OE --------- - - 4.18.0-425.13.1.el8_7.x86_64 #1 [ 144.154133] Hardware name: Intel Corporation S2600BPB/S2600BPB, BIOS SE5C620.86B.02.01.0015.032120220358 03/21/2022 [ 144.164573] RIP: 0010:kiblnd_startup+0x1194/0x1720 [ko2iblnd] [ 144.170338] Code: 44 24 08 4c 8b a8 50 01 00 00 41 8b 4d 68 85 c9 0f 84 7d 02 00 00 49 8b 47 38 48 8b bb a0 01 00 00 48 8d 70 24 e8 9c e1 d5 e7 <80> b8 f0 04 00 00 02 74 0d 80 b8 34 02 00 00 06 0f 84 63 04 00 00 [ 144.189114] RSP: 0018:ffffac2a0952fb30 EFLAGS: 00010046 [ 144.194352] RAX: 0000000000000000 RBX: ffff9f37ac0a3400 RCX: 0000000000000028 [ 144.201494] RDX: ffff9f37582aa800 RSI: ffff9f3775d1d224 RDI: 00000000cbaad2c9 [ 144.208638] RBP: ffff9f3775d1d340 R08: a7c5921741031163 R09: 0000000000000005 [ 144.215776] R10: ffff9f3775e3fd80 R11: ffff9f37553718f0 R12: ffff9f3775d1d340 [ 144.222919] R13: ffff9f376d402a00 R14: 0000000000000007 R15: ffff9f3759f07600 [ 144.230061] FS: 00007fbb91ba0740(0000) GS:ffff9f4e20bc0000(0000) knlGS:0000000000000000 [ 144.238156] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 144.243906] CR2: 00000000000004f0 CR3: 0000000135c02005 CR4: 00000000007706e0 [ 144.251049] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 144.258189] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 144.265592] PKRU: 55555554 [ 144.268521] Call Trace: [ 144.271144] lnet_startup_lndnet+0x14f/0x7e0 [lnet] [ 144.276207] LNetNIInit+0x6e1/0xd70 [lnet] [ 144.280486] ? 0xffffffffc0c58000 [ 144.283964] ptlrpc_init_portals+0x27/0x250 [ptlrpc] [ 144.289160] ? 0xffffffffc0c58000 [ 144.292646] ptlrpc_init+0x196/0x1000 [ptlrpc] [ 144.297307] do_one_initcall+0x46/0x1d0 [ 144.301306] ? do_init_module+0x22/0x230 [ 144.305386] ? kmem_cache_alloc_trace+0x142/0x280 [ 144.310246] do_init_module+0x5a/0x230 [ 144.314149] load_module+0x14bf/0x17f0 [ 144.318053] ? __do_sys_finit_module+0xb1/0x110 [ 144.322745] __do_sys_finit_module+0xb1/0x110 [ 144.327259] do_syscall_64+0x5b/0x1b0 [ 144.331081] entry_SYSCALL_64_after_hwframe+0x61/0xc6 [ 144.336289] RIP: 0033:0x7fbb90ab59bd [ 144.340020] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 9b 64 38 00 f7 d8 64 89 01 48 [ 144.359098] RSP: 002b:00007ffe07f8ec88 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 144.366826] RAX: ffffffffffffffda RBX: 000055c022b91800 RCX: 00007fbb90ab59bd [ 144.374125] RDX: 0000000000000000 RSI: 000055c021ebd8b6 RDI: 0000000000000006 [ 144.381423] RBP: 000055c021ebd8b6 R08: 0000000000000000 R09: 0000000000000000 [ 144.388718] R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000000 [ 144.396011] R13: 000055c022b917a0 R14: 0000000000040000 R15: 0000000000000000 [ 144.403304] Modules linked in: ko2iblnd(OE) ptlrpc(OE+) obdclass(OE) lnet(OE) libcfs(OE) beegfs(OE) uio_pci_generic uio vfio_pci vfio_virqfd vfio_iommu_type1 vfio irqbypass cuse rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) sunrpc vfat fat intel_rapl_msr intel_rapl_common isst_if_common ipmi_ssif skx_edac iTCO_wdt nfit iTCO_vendor_support libnvdimm ast i2c_algo_bit drm_vram_helper drm_ttm_helper ttm x86_pkg_temp_thermal intel_powerclamp coretemp drm_kms_helper crct10dif_pclmul crc32_pclmul syscopyarea ghash_clmulni_intel sysfillrect rapl sysimgblt acpi_ipmi intel_cstate fb_sys_fops ipmi_si mei_me drm joydev pcspkr ipmi_devintf mei intel_uncore ioatdma i2c_i801 lpc_ich wmi ipmi_msghandler acpi_power_meter acpi_pad binfmt_misc knem(OE) ext4 mbcache jbd2 mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) sd_mod t10_pi sg mlx5_core(OE) mlxfw(OE) pci_hyperv_intf ixgbe tls ahci libahci psample mlxdevm(OE) mdio libata crc32c_intel mlx_compat(OE) dca xpmem(OE) fuse [ 144.403358] [last unloaded: libcfs] [ 144.494056] CR2: 00000000000004f0
If lnet setup is normal without logical interfaces like 'options lnet networks="o2ib12(ib0)"', that works.
Attachments
Issue Links
- is related to
-
LU-16836 LNet: initial ni status is "up" if starting with link disconnected
- Resolved