Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-20221

LNet: lnet_net_show_dump NULL pointer dereference during multi-skb netlink dumps

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Medium
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Environment:

      • Lustre version:  v2_17_52
      • Kernel: Rocky 9.7: 5.14.0-611.27.1.el9_7.x86_64

      During testing of adding large numbers of NIDs to a server, I encountered a crash caused by lnetctl net show when I reached a certain threshold of NIDs.

      On analyzing this crash, I was able to consistently reproduce this behaviour using the attached script which simply creates a number of LNET NIDs on a server and calls 'lnetctl net show -v 4'.

      [servers] [root@server1 ~]# ./reproducer.sh --nids 150 --verbosity 4
      =================================================
       Reproducer                                      
       Max NIDs:  150                                
       Verbosity: 4                           
      =================================================
      [Pre-flight] Checking modules and limits...
      [Test] Beginning incremental interface addition...
        - 1 NIDs : PASS (Binary: 9176 bytes | YAML: 14681 bytes)
        - 2 NIDs : PASS (Binary: 9648 bytes | YAML: 16173 bytes)
        - 3 NIDs : PASS (Binary: 10120 bytes | YAML: 17665 bytes)
      ...
        - 127 NIDs : PASS (Binary: 65524 bytes | YAML: 191374 bytes)   
      --> *CRASH*

      I have attached the script and crashdump to the ticket.

      I also include the following analysis from Claude on what it determines the root cause to be, which appears to be triggered when the netlink payload triggered by 'lnetctl net show' exceeds the 64KiB buffer for a single buffer.

      There is a flaw in lnet_net_show_dump() (lnet/lnet/api-ni.c) that reliably panics the kernel or triggers an infinite loop when the number of configured NIs causes the netlink dump payload to exceed the 64 KiB skb buffer limit. The reproducer confirms that when the binary netlink payload crosses 65,535 bytes (e.g., ~128 dummy NIDs at verbosity -v 4), the kernel attempts to fall back to the multi-skb chunking logic but fails due to missing return checks and broken state-resumption logic.
      
      The primary issue causing the kernel panic is that the function makes approximately 18 sequential calls to nla_nest_start() without checking for a NULL return. When the skb buffer fills up, nla_nest_start() returns NULL. Because this is ignored, the subsequent nla_nest_end(msg, NULL) attempts to write the 16-bit nla_len field to address 0, instantly seizing the node with a NULL pointer dereference.Furthermore, even if the buffer overflow perfectly aligns with a network boundary (avoiding the mid-NI panic), the multi-skb resumption path is functionally broken. When genlmsg_put fails due to lack of space, the code executes GOTO(net_unlock, rc = -EMSGSIZE), completely bypassing the nlist->lngl_idx = idx state-saving assignment. Compounding this, the function initializes idx = nlist->lngl_idx instead of 0, meaning the if (idx++ < nlist->lngl_idx) continue; check never actually skips previously processed NIs. This throws the kernel into an infinite loop, returning the exact same chunk of networks repeatedly until lnetctl crashes in userspace with a realloc(): invalid next size heap corruption.

      Attachments

        1. reproducer.sh
          3 kB
        2. vmcore
          88.37 MB
        3. vmcore-dmesg.txt
          113 kB

        Issue Links

          Activity

            People

              masingh Malkeet Singh
              mrasobarnett Matt Rásó-Barnett
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: