Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17247

BUG: unable to handle kernel NULL pointer dereference in kiblnd_passive_connect

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.16.0
    • None
    • master, RHEL8.7
    • 3
    • 9223372036854775807

    Description

      server crashed due to NULL pointer dereference in kiblnd_passive_connect below

      [14161.702631] libcfs: HW NUMA nodes: 1, HW CPU cores: 24, npartitions: 4
      [14161.705274] alg: No test for adler32 (adler32-zlib)
      [14162.456545] Key type ._llcrypt registered
      [14162.457357] Key type .llcrypt registered
      [14162.484133] Lustre: Lustre: Build Version: 2.15.58_109_g40074d3
      [14162.540341] LNet: Using FastReg for registration
      [14162.750736] LNet: Added LNI 10.0.11.209@o2ib12 [32/1024/0/180]
      [14162.950680] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
      [14162.951989] PGD 0 
      [14162.952520] Oops: 0000 [#1] SMP NOPTI
      [14162.953250] CPU: 22 PID: 201160 Comm: kworker/22:4 Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-425.13.1.el8_lustre.ddn17.x86_64 #1
      [14162.955184] Hardware name: DDN SFA400NVX2E, BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      [14162.956667] Workqueue: ib_cm cm_work_handler [ib_cm]
      [14162.957565] RIP: 0010:kiblnd_passive_connect+0x1395/0x1620 [ko2iblnd]
      [14162.958644] Code: c7 05 63 81 01 00 00 01 00 00 e8 26 03 f4 ff 48 89 df ba 40 00 00 00 48 89 c6 e8 06 10 f4 ff 45 8b b4 24 24 01 00 00 49 89 c7 <48> 8b 04 25 40 00 00 00 48 8d 58 38 e8 fa 02 f4 ff 48 89 df ba 40
      [14162.961535] RSP: 0018:ff7a599b4dca79a0 EFLAGS: 00010246
      [14162.962473] RAX: ffffffffc1038f00 RBX: 0005001614010bd1 RCX: 0000000000000000
      [14162.963534] LNet: Added LNI 20.1.11.209@o2ib22 [32/1024/0/180]
      [14162.963649] RDX: ffffffffc1038f12 RSI: 0000000000000000 RDI: 0000000000000000
      [14162.965863] RBP: ff36491ca4dbcc00 R08: 0000000000000001 R09: 0000000000000000
      [14162.967015] R10: ffffffffc1038f40 R11: ffffffffc1038f12 R12: ff364925b2ba2a00
      [14162.968167] R13: ff36492daa67a5b0 R14: 0000000000000000 R15: ffffffffc1038f00
      [14162.969313] FS:  0000000000000000(0000) GS:ff36493e31b80000(0000) knlGS:0000000000000000
      [14162.970594] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [14162.971560] CR2: 0000000000000040 CR3: 0000000f8bc10003 CR4: 0000000000771ee0
      [14162.972711] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [14162.973846] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [14162.974976] PKRU: 55555554
      [14162.975553] Call Trace:
      [14162.976085]  ? xas_store+0x56/0x5a0
      [14162.976755]  kiblnd_cm_callback+0x3d7/0x1e90 [ko2iblnd]
      [14162.977639]  ? __xa_alloc_cyclic+0x49/0xe0
      [14162.978375]  cma_cm_event_handler+0x25/0xd0 [rdma_cm]
      [14162.979227]  cma_ib_req_handler+0x7d1/0x1260 [rdma_cm]
      [14162.980090]  ? update_group_capacity+0x25/0x220
      [14162.980872]  cm_process_work+0x22/0xf0 [ib_cm]
      [14162.981638]  cm_req_handler+0x7f1/0xf40 [ib_cm]
      [14162.982416]  cm_work_handler+0x79c/0xf30 [ib_cm]
      [14162.983198]  ? __switch_to+0x10c/0x450
      [14162.983872]  ? finish_task_switch+0xaf/0x2e0
      [14162.984607]  process_one_work+0x1a7/0x360
      [14162.985300]  ? create_worker+0x1a0/0x1a0
      [14162.985979]  worker_thread+0x30/0x390
      [14162.986623]  ? create_worker+0x1a0/0x1a0
      [14162.987292]  kthread+0x10b/0x130
      [14162.987874]  ? set_kthread_struct+0x50/0x50
      [14162.988577]  ret_from_fork+0x1f/0x40
      [14162.989205] Modules linked in: ko2iblnd(OE) ptlrpc(OE+) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) sunrpc intel_rapl_msr intel_rapl_common nfit libnvdimm kvm_intel kvm irqbypass iTCO_wdt ppdev iTCO_vendor_support crct10dif_pclmul crc32_pclmul bochs drm_vram_helper drm_ttm_helper ghash_clmulni_intel ttm rapl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops pcspkr i2c_i801 drm joydev lpc_ich i6300esb parport_pc parport ext4 mbcache jbd2 sr_mod sd_mod cdrom t10_pi sg mlx5_ib(OE) ib_uverbs(OE) ib_core(OE) mlx5_core(OE) mlxfw(OE) pci_hyperv_intf ahci tls libahci psample mlxdevm(OE) virtio_net libata bnxt_en crc32c_intel net_failover serio_raw virtio_blk mlx_compat(OE) virtio_scsi failover dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs]
      

      Attachments

        Issue Links

          Activity

            [LU-17247] BUG: unable to handle kernel NULL pointer dereference in kiblnd_passive_connect

            I've confirmed that https://review.whamcloud.com/c/fs/lustre-release/+/52202 from LU-17071 solved problem. 
            So, LU-17247 should duplicate LU-17071

            sihara Shuichi Ihara added a comment - I've confirmed that https://review.whamcloud.com/c/fs/lustre-release/+/52202 from LU-17071 solved problem.  So, LU-17247 should duplicate LU-17071

            Although the logical interfaces and alias are not used, I've applied patch https://review.whamcloud.com/#/c/fs/lustre-release/+/52894/ against master

            Shuichi, does that patch cause the crash (seems unlikely, given the patch is very small)?

            If that is the only patch applied, it looks like this would be based on commit v2_15_58-108-g345a2497d0 "LU-5134 utils: Add parallel option to lctl set_param"?

            adilger Andreas Dilger added a comment - Although the logical interfaces and alias are not used, I've applied patch https://review.whamcloud.com/#/c/fs/lustre-release/+/52894/ against master Shuichi, does that patch cause the crash (seems unlikely, given the patch is very small)? If that is the only patch applied, it looks like this would be based on commit v2_15_58-108-g345a2497d0 " LU-5134 utils: Add parallel option to lctl set_param "?

            Although the logical interfaces and alias are not used, I've applied patch https://review.whamcloud.com/#/c/fs/lustre-release/+/52894/ against master

            sihara Shuichi Ihara added a comment - Although the logical interfaces and alias are not used, I've applied patch https://review.whamcloud.com/#/c/fs/lustre-release/+/52894/ against master

            People

              wc-triage WC Triage
              sihara Shuichi Ihara
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: