Details
-
Bug
-
Resolution: Unresolved
-
Medium
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
Environment:
- Lustre version: v2_17_52
- Kernel: Rocky 9.7: 5.14.0-611.27.1.el9_7.x86_64
During testing of adding large numbers of NIDs to a server, I encountered a crash caused by lnetctl net show when I reached a certain threshold of NIDs.
On analyzing this crash, I was able to consistently reproduce this behaviour using the attached script which simply creates a number of LNET NIDs on a server and calls 'lnetctl net show -v 4'.
[servers] [root@server1 ~]# ./reproducer.sh --nids 150 --verbosity 4
=================================================
Reproducer
Max NIDs: 150
Verbosity: 4
=================================================
[Pre-flight] Checking modules and limits...
[Test] Beginning incremental interface addition...
- 1 NIDs : PASS (Binary: 9176 bytes | YAML: 14681 bytes)
- 2 NIDs : PASS (Binary: 9648 bytes | YAML: 16173 bytes)
- 3 NIDs : PASS (Binary: 10120 bytes | YAML: 17665 bytes)
...
- 127 NIDs : PASS (Binary: 65524 bytes | YAML: 191374 bytes)
--> *CRASH*
I have attached the script and crashdump to the ticket.
I also include the following analysis from Claude on what it determines the root cause to be, which appears to be triggered when the netlink payload triggered by 'lnetctl net show' exceeds the 64KiB buffer for a single buffer.
There is a flaw in lnet_net_show_dump() (lnet/lnet/api-ni.c) that reliably panics the kernel or triggers an infinite loop when the number of configured NIs causes the netlink dump payload to exceed the 64 KiB skb buffer limit. The reproducer confirms that when the binary netlink payload crosses 65,535 bytes (e.g., ~128 dummy NIDs at verbosity -v 4), the kernel attempts to fall back to the multi-skb chunking logic but fails due to missing return checks and broken state-resumption logic. The primary issue causing the kernel panic is that the function makes approximately 18 sequential calls to nla_nest_start() without checking for a NULL return. When the skb buffer fills up, nla_nest_start() returns NULL. Because this is ignored, the subsequent nla_nest_end(msg, NULL) attempts to write the 16-bit nla_len field to address 0, instantly seizing the node with a NULL pointer dereference.Furthermore, even if the buffer overflow perfectly aligns with a network boundary (avoiding the mid-NI panic), the multi-skb resumption path is functionally broken. When genlmsg_put fails due to lack of space, the code executes GOTO(net_unlock, rc = -EMSGSIZE), completely bypassing the nlist->lngl_idx = idx state-saving assignment. Compounding this, the function initializes idx = nlist->lngl_idx instead of 0, meaning the if (idx++ < nlist->lngl_idx) continue; check never actually skips previously processed NIs. This throws the kernel into an infinite loop, returning the exact same chunk of networks repeatedly until lnetctl crashes in userspace with a realloc(): invalid next size heap corruption.