[LU-17235] kernel panic on kiblnd_startup with logical interfaces Created: 28/Oct/23  Updated: 10/Nov/23  Resolved: 10/Nov/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Major
Reporter: Shuichi Ihara Assignee: Serguei Smirnov
Resolution: Fixed Votes: 0
Labels: None
Environment:

Rockylinux8.7 (4.18.0-425.13.1.el8_7.x86_64)
OFED 5.8-1.1.2.1
master (commit:d7d1644)


Issue Links:
Related
is related to LU-16836 LNet: initial ni status is "up" if st... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

kernel crashes if MR enabled with logical interfaces.

options lnet networks="o2ib12(ib0,ib0:1)"
# modprobe lustre
[  143.167995] LNet: Using FastReg for registration
[  143.648939] LNet: Added LNI 10.0.0.1@o2ib12 [8/512/0/180]
[  144.128240] BUG: unable to handle kernel NULL pointer dereference at 00000000000004f0
[  144.136091] PGD 0 P4D 0  [  144.138631] Oops: 0000 [#1] SMP NOPTI
[  144.142299] CPU: 7 PID: 2739 Comm: modprobe Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-425.13.1.el8_7.x86_64 #1
[  144.154133] Hardware name: Intel Corporation S2600BPB/S2600BPB, BIOS SE5C620.86B.02.01.0015.032120220358 03/21/2022
[  144.164573] RIP: 0010:kiblnd_startup+0x1194/0x1720 [ko2iblnd]
[  144.170338] Code: 44 24 08 4c 8b a8 50 01 00 00 41 8b 4d 68 85 c9 0f 84 7d 02 00 00 49 8b 47 38 48 8b bb a0 01 00 00 48 8d 70 24 e8 9c e1 d5 e7 <80> b8 f0 04 00 00 02 74 0d 80 b8 34 02 00 00 06 0f 84 63 04 00 00
[  144.189114] RSP: 0018:ffffac2a0952fb30 EFLAGS: 00010046
[  144.194352] RAX: 0000000000000000 RBX: ffff9f37ac0a3400 RCX: 0000000000000028
[  144.201494] RDX: ffff9f37582aa800 RSI: ffff9f3775d1d224 RDI: 00000000cbaad2c9
[  144.208638] RBP: ffff9f3775d1d340 R08: a7c5921741031163 R09: 0000000000000005
[  144.215776] R10: ffff9f3775e3fd80 R11: ffff9f37553718f0 R12: ffff9f3775d1d340
[  144.222919] R13: ffff9f376d402a00 R14: 0000000000000007 R15: ffff9f3759f07600
[  144.230061] FS:  00007fbb91ba0740(0000) GS:ffff9f4e20bc0000(0000) knlGS:0000000000000000
[  144.238156] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  144.243906] CR2: 00000000000004f0 CR3: 0000000135c02005 CR4: 00000000007706e0
[  144.251049] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  144.258189] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  144.265592] PKRU: 55555554
[  144.268521] Call Trace:
[  144.271144]  lnet_startup_lndnet+0x14f/0x7e0 [lnet]
[  144.276207]  LNetNIInit+0x6e1/0xd70 [lnet]
[  144.280486]  ? 0xffffffffc0c58000
[  144.283964]  ptlrpc_init_portals+0x27/0x250 [ptlrpc]
[  144.289160]  ? 0xffffffffc0c58000
[  144.292646]  ptlrpc_init+0x196/0x1000 [ptlrpc]
[  144.297307]  do_one_initcall+0x46/0x1d0
[  144.301306]  ? do_init_module+0x22/0x230
[  144.305386]  ? kmem_cache_alloc_trace+0x142/0x280
[  144.310246]  do_init_module+0x5a/0x230
[  144.314149]  load_module+0x14bf/0x17f0
[  144.318053]  ? __do_sys_finit_module+0xb1/0x110
[  144.322745]  __do_sys_finit_module+0xb1/0x110
[  144.327259]  do_syscall_64+0x5b/0x1b0
[  144.331081]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[  144.336289] RIP: 0033:0x7fbb90ab59bd
[  144.340020] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 9b 64 38 00 f7 d8 64 89 01 48
[  144.359098] RSP: 002b:00007ffe07f8ec88 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[  144.366826] RAX: ffffffffffffffda RBX: 000055c022b91800 RCX: 00007fbb90ab59bd
[  144.374125] RDX: 0000000000000000 RSI: 000055c021ebd8b6 RDI: 0000000000000006
[  144.381423] RBP: 000055c021ebd8b6 R08: 0000000000000000 R09: 0000000000000000
[  144.388718] R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000000
[  144.396011] R13: 000055c022b917a0 R14: 0000000000040000 R15: 0000000000000000
[  144.403304] Modules linked in: ko2iblnd(OE) ptlrpc(OE+) obdclass(OE) lnet(OE) libcfs(OE) beegfs(OE) uio_pci_generic uio vfio_pci vfio_virqfd vfio_iommu_type1 vfio irqbypass cuse rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) sunrpc vfat fat intel_rapl_msr intel_rapl_common isst_if_common ipmi_ssif skx_edac iTCO_wdt nfit iTCO_vendor_support libnvdimm ast i2c_algo_bit drm_vram_helper drm_ttm_helper ttm x86_pkg_temp_thermal intel_powerclamp coretemp drm_kms_helper crct10dif_pclmul crc32_pclmul syscopyarea ghash_clmulni_intel sysfillrect rapl sysimgblt acpi_ipmi intel_cstate fb_sys_fops ipmi_si mei_me drm joydev pcspkr ipmi_devintf mei intel_uncore ioatdma i2c_i801 lpc_ich wmi ipmi_msghandler acpi_power_meter acpi_pad binfmt_misc knem(OE) ext4 mbcache jbd2 mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) sd_mod t10_pi sg mlx5_core(OE) mlxfw(OE) pci_hyperv_intf ixgbe tls ahci libahci psample mlxdevm(OE) mdio libata crc32c_intel mlx_compat(OE) dca xpmem(OE) fuse
[  144.403358]  [last unloaded: libcfs]
[  144.494056] CR2: 00000000000004f0

If lnet setup is normal without logical interfaces like 'options lnet networks="o2ib12(ib0)"', that works.



 Comments   
Comment by Serguei Smirnov [ 28/Oct/23 ]

Hi,

Could you please add details on how the two interfaces are configured, for example please share "ip a" output, and other steps-to-reproduce.

Did o2iblnd use to work with logical interfaces before? Crashing is definitely bad in this case, but reusing the same device does appear problematic without special configuration in IB layer, e.g. PKEYs.

Thanks,

Serguei.

Comment by Shuichi Ihara [ 28/Oct/23 ]
[root@ec01 io500.git]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp179s0f0np0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether b8:59:9f:f6:89:98 brd ff:ff:ff:ff:ff:ff
    altname ens801f0np0
3: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether a4:bf:01:5d:e3:d0 brd ff:ff:ff:ff:ff:ff
    altname enp23s0f0
    inet 10.128.11.1/21 brd 10.128.15.255 scope global noprefixroute eno1
       valid_lft forever preferred_lft forever
    inet6 fe80::a6bf:1ff:fe5d:e3d0/64 scope link 
       valid_lft forever preferred_lft forever
4: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether a4:bf:01:5d:e3:d1 brd ff:ff:ff:ff:ff:ff
    altname enp23s0f1
5: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group default qlen 256
    link/infiniband 00:00:11:49:fe:80:00:00:00:00:00:00:b8:59:9f:03:00:f6:89:99 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    inet 10.0.0.1/12 brd 10.15.255.255 scope global noprefixroute ib0
       valid_lft forever preferred_lft forever
    inet 10.0.2.1/12 brd 10.15.255.255 scope global secondary noprefixroute ib0:2
       valid_lft forever preferred_lft forever
    inet 10.0.4.1/12 brd 10.15.255.255 scope global secondary noprefixroute ib0:4
       valid_lft forever preferred_lft forever
    inet 10.0.1.1/12 brd 10.15.255.255 scope global secondary noprefixroute ib0:1
       valid_lft forever preferred_lft forever
    inet 10.0.7.1/12 brd 10.15.255.255 scope global secondary noprefixroute ib0:7
       valid_lft forever preferred_lft forever
    inet 10.0.5.1/12 brd 10.15.255.255 scope global secondary noprefixroute ib0:5
       valid_lft forever preferred_lft forever
    inet 10.0.3.1/12 brd 10.15.255.255 scope global secondary noprefixroute ib0:3
       valid_lft forever preferred_lft forever
    inet 10.0.6.1/12 brd 10.15.255.255 scope global secondary noprefixroute ib0:6
       valid_lft forever preferred_lft forever
    inet6 fe80::ba59:9f03:f6:8999/64 scope link 
       valid_lft forever preferred_lft forever

Did o2iblnd use to work with logical interfaces before? Crashing is definitely bad in this case, but reusing the same device does appear problematic without special configuration in IB layer, e.g. PKEYs.

Yes, this configuration has been working for at least more than 3 years when I've started to use.

The reason of this setting is metadata performance improvements. I thought increasing conns_per_peer helps, but it didn't. Many NIDs still make better metadata performance.

We need to investigate and make same performance imrovements without this workaround though.

Comment by Shuichi Ihara [ 29/Oct/23 ]

I found that "commit: 09c6e2b872 LU-16836 lnet: ensure dev notification on lnd startup" is first place causes this problem.
Before this commit landed, the configuration with logical interfaces has been working well.

 

Comment by Shuichi Ihara [ 29/Oct/23 ]

And, if fact, this is not problem in MR with logical interfaces, but it gets crash even if LNET starts against a logical interface.
So, it hits problem below too.

options lnet networks="o2ib12(ib0:1)" in modproe.conf
# modprobe lustre

 

Comment by Gerrit Updater [ 30/Oct/23 ]

"Serguei Smirnov <ssmirnov@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52894
Subject: LU-17235 o2iblnd: adding alias ib interface causes crash
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4dc9164b23eb275b4050f9c013d13a469fd662fb

Comment by Gerrit Updater [ 08/Nov/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52894/
Subject: LU-17235 o2iblnd: adding alias ib interface causes crash
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 02b22df6431a764c00ed0fbbc3286c2ed4dfbab0

Comment by Peter Jones [ 10/Nov/23 ]

Landed for 2.16

Generated at Sat Feb 10 03:33:45 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.