Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17235

kernel panic on kiblnd_startup with logical interfaces

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.16.0
    • None
    • None
    • Rockylinux8.7 (4.18.0-425.13.1.el8_7.x86_64)
      OFED 5.8-1.1.2.1
      master (commit:d7d1644)
    • 3
    • 9223372036854775807

    Description

      kernel crashes if MR enabled with logical interfaces.

      options lnet networks="o2ib12(ib0,ib0:1)"
      # modprobe lustre
      
      [  143.167995] LNet: Using FastReg for registration
      [  143.648939] LNet: Added LNI 10.0.0.1@o2ib12 [8/512/0/180]
      [  144.128240] BUG: unable to handle kernel NULL pointer dereference at 00000000000004f0
      [  144.136091] PGD 0 P4D 0  [  144.138631] Oops: 0000 [#1] SMP NOPTI
      [  144.142299] CPU: 7 PID: 2739 Comm: modprobe Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-425.13.1.el8_7.x86_64 #1
      [  144.154133] Hardware name: Intel Corporation S2600BPB/S2600BPB, BIOS SE5C620.86B.02.01.0015.032120220358 03/21/2022
      [  144.164573] RIP: 0010:kiblnd_startup+0x1194/0x1720 [ko2iblnd]
      [  144.170338] Code: 44 24 08 4c 8b a8 50 01 00 00 41 8b 4d 68 85 c9 0f 84 7d 02 00 00 49 8b 47 38 48 8b bb a0 01 00 00 48 8d 70 24 e8 9c e1 d5 e7 <80> b8 f0 04 00 00 02 74 0d 80 b8 34 02 00 00 06 0f 84 63 04 00 00
      [  144.189114] RSP: 0018:ffffac2a0952fb30 EFLAGS: 00010046
      [  144.194352] RAX: 0000000000000000 RBX: ffff9f37ac0a3400 RCX: 0000000000000028
      [  144.201494] RDX: ffff9f37582aa800 RSI: ffff9f3775d1d224 RDI: 00000000cbaad2c9
      [  144.208638] RBP: ffff9f3775d1d340 R08: a7c5921741031163 R09: 0000000000000005
      [  144.215776] R10: ffff9f3775e3fd80 R11: ffff9f37553718f0 R12: ffff9f3775d1d340
      [  144.222919] R13: ffff9f376d402a00 R14: 0000000000000007 R15: ffff9f3759f07600
      [  144.230061] FS:  00007fbb91ba0740(0000) GS:ffff9f4e20bc0000(0000) knlGS:0000000000000000
      [  144.238156] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  144.243906] CR2: 00000000000004f0 CR3: 0000000135c02005 CR4: 00000000007706e0
      [  144.251049] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  144.258189] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  144.265592] PKRU: 55555554
      [  144.268521] Call Trace:
      [  144.271144]  lnet_startup_lndnet+0x14f/0x7e0 [lnet]
      [  144.276207]  LNetNIInit+0x6e1/0xd70 [lnet]
      [  144.280486]  ? 0xffffffffc0c58000
      [  144.283964]  ptlrpc_init_portals+0x27/0x250 [ptlrpc]
      [  144.289160]  ? 0xffffffffc0c58000
      [  144.292646]  ptlrpc_init+0x196/0x1000 [ptlrpc]
      [  144.297307]  do_one_initcall+0x46/0x1d0
      [  144.301306]  ? do_init_module+0x22/0x230
      [  144.305386]  ? kmem_cache_alloc_trace+0x142/0x280
      [  144.310246]  do_init_module+0x5a/0x230
      [  144.314149]  load_module+0x14bf/0x17f0
      [  144.318053]  ? __do_sys_finit_module+0xb1/0x110
      [  144.322745]  __do_sys_finit_module+0xb1/0x110
      [  144.327259]  do_syscall_64+0x5b/0x1b0
      [  144.331081]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
      [  144.336289] RIP: 0033:0x7fbb90ab59bd
      [  144.340020] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 9b 64 38 00 f7 d8 64 89 01 48
      [  144.359098] RSP: 002b:00007ffe07f8ec88 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      [  144.366826] RAX: ffffffffffffffda RBX: 000055c022b91800 RCX: 00007fbb90ab59bd
      [  144.374125] RDX: 0000000000000000 RSI: 000055c021ebd8b6 RDI: 0000000000000006
      [  144.381423] RBP: 000055c021ebd8b6 R08: 0000000000000000 R09: 0000000000000000
      [  144.388718] R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000000
      [  144.396011] R13: 000055c022b917a0 R14: 0000000000040000 R15: 0000000000000000
      [  144.403304] Modules linked in: ko2iblnd(OE) ptlrpc(OE+) obdclass(OE) lnet(OE) libcfs(OE) beegfs(OE) uio_pci_generic uio vfio_pci vfio_virqfd vfio_iommu_type1 vfio irqbypass cuse rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) sunrpc vfat fat intel_rapl_msr intel_rapl_common isst_if_common ipmi_ssif skx_edac iTCO_wdt nfit iTCO_vendor_support libnvdimm ast i2c_algo_bit drm_vram_helper drm_ttm_helper ttm x86_pkg_temp_thermal intel_powerclamp coretemp drm_kms_helper crct10dif_pclmul crc32_pclmul syscopyarea ghash_clmulni_intel sysfillrect rapl sysimgblt acpi_ipmi intel_cstate fb_sys_fops ipmi_si mei_me drm joydev pcspkr ipmi_devintf mei intel_uncore ioatdma i2c_i801 lpc_ich wmi ipmi_msghandler acpi_power_meter acpi_pad binfmt_misc knem(OE) ext4 mbcache jbd2 mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) sd_mod t10_pi sg mlx5_core(OE) mlxfw(OE) pci_hyperv_intf ixgbe tls ahci libahci psample mlxdevm(OE) mdio libata crc32c_intel mlx_compat(OE) dca xpmem(OE) fuse
      [  144.403358]  [last unloaded: libcfs]
      [  144.494056] CR2: 00000000000004f0
      

      If lnet setup is normal without logical interfaces like 'options lnet networks="o2ib12(ib0)"', that works.

      Attachments

        Issue Links

          Activity

            People

              ssmirnov Serguei Smirnov
              sihara Shuichi Ihara
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: