Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13454

NULL dereference in lnet_health_check lnet_incr_hstats

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.14.0
    • Lustre 2.14.0, Lustre 2.12.4
    • None
    • 3
    • 9223372036854775807

    Description

      The LNet messages used for replies to optimized GETs, created via lnet_create_reply_msg(), are only ever committed for rx. As such, their msg_txni and msg_txpeer fields are NULL. lnet_incr_hstats() does not account for this situation, so when passed one of these messages attempts to deref a NULL pointer.

      [534987.484660] LNet: 33866:0:(o2iblnd_cb.c:2081:kiblnd_close_conn_locked()) Closing conn to 10.16.100.20@o2ib: error 0(sending)(sending_nocred)(waiting)
      [534987.500344] LustreError: 166827:0:(events.c:453:server_bulk_callback()) event type 3, status -103, desc ffff89b65f3aa600
      [534987.500406] LNetError: 166825:0:(lib-msg.c:479:lnet_handle_local_failure()) ni 10.16.100.55@o2ib added to recovery queue. Health = 900
      [534987.500412] LustreError: 166825:0:(events.c:453:server_bulk_callback()) event type 5, status -103, desc ffff89b65494ba00
      [534987.500416] LustreError: 166825:0:(events.c:453:server_bulk_callback()) event type 5, status -103, desc ffff89e751f58800
      [534987.500460] BUG: unable to handle kernel NULL pointer dereference at 00000000000000ec
      [534987.500498] IP: [<ffffffffc0ea5889>] lnet_finalize+0xb99/0xdc0 [lnet]
      [534987.500499] PGD 0
      [534987.500501] Oops: 0002 [#1] SMP
      [534987.500532] Modules linked in: osd_zfs(OE) mdt(OE) mdd(OE) lod(OE) mgs(OE) osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) raid5_pd(POE) raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 ext4 mbcache jbd2 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack xt_multiport iptable_filter xt_CT nf_conntrack libcrc32c iptable_raw dm_service_time dm_multipath mst_pciconf(OE) mlx4_ib(OE) mlx4_en(OE) mlx4_core(OE) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) sd_mod crc_t10dif crct10dif_generic sg ib_umad(OE) ib_ipoib(OE) ib_cm(OE) zfs(POE) zunicode(POE) zlua(POE) edac_mce_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel
      [534987.500557]  lrw gf128mul glue_helper ablk_helper cryptd zcommon(POE) znvpair(POE) mlx5_ib(OE) zavl(POE) pcspkr icp(POE) ib_uverbs(OE) spl(OE) ib_core(OE) ast mlx5_core(OE) ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm drm_panel_orientation_quirks mlx_compat(OE) dm_mod mlxfw devlink mpt3sas(OE) raid_class scsi_transport_sas i2c_piix4 i2c_designware_platform i2c_designware_core pinctrl_amd acpi_cpufreq ip_tables nfsv3 nfs_acl nfs lockd grace fscache team_mode_activebackup team crct10dif_pclmul crct10dif_common crc32c_intel igb i2c_algo_bit dca ptp pps_core nvme nvme_core nfit libnvdimm sunrpc bonding ipmi_si ipmi_devintf ipmi_msghandler [last unloaded: libcfs]
      [534987.500560] CPU: 10 PID: 166825 Comm: kiblnd_connd Kdump: loaded Tainted: P        W  OE  ------------   3.10.0-957.1.3957.1.3.x4.1.6.x86_64 #1
      [534987.500561] Hardware name: Viking Enterprise Solutions VSSEP1EA/VSSEP1EA, BIOS 10.01 03/04/2020
      [534987.500562] task: ffff89baf1de1040 ti: ffff89b659a70000 task.ti: ffff89b659a70000
      [534987.500576] RIP: 0010:[<ffffffffc0ea5889>]  [<ffffffffc0ea5889>] lnet_finalize+0xb99/0xdc0 [lnet]
      [534987.500577] RSP: 0018:ffff89b659a73cf0  EFLAGS: 00010293
      [534987.500578] RAX: ffff89bb25bdfa80 RBX: ffff89adb1978898 RCX: 0000000000000000
      [534987.500579] RDX: 0000000000000000 RSI: ffffffffc0ea5889 RDI: ffff89bb2be8ae00
      [534987.500580] RBP: ffff89b659a73d40 R08: 0000000000000000 R09: 00000001804a0018
      [534987.500581] R10: 000000008c5e9601 R11: fffff16efe317a00 R12: 00000000ffffff99
      [534987.500582] R13: 0000000000000000 R14: 0000000000000005 R15: 0000000000000000
      [534987.500583] FS:  0000000000000000(0000) GS:ffff89cb2ee80000(0000) knlGS:0000000000000000
      [534987.500584] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [534987.500585] CR2: 00000000000000ec CR3: 0000000fd52f6000 CR4: 0000000000340fe0
      [534987.500586] Call Trace:
      [534987.500601]  [<ffffffffc0d1fd22>] ? kiblnd_pool_free_node+0x82/0x170 [ko2iblnd]
      [534987.500609]  [<ffffffffc0d296dd>] kiblnd_tx_done+0x10d/0x3e0 [ko2iblnd]
      [534987.500616]  [<ffffffffc0d299fb>] kiblnd_txlist_done+0x4b/0x60 [ko2iblnd]
      [534987.500624]  [<ffffffffc0d2f05d>] kiblnd_abort_txs+0xed/0x240 [ko2iblnd]
      [534987.500631]  [<ffffffffc0d2f243>] kiblnd_finalise_conn+0x93/0x120 [ko2iblnd]
      [534987.500637]  [<ffffffffc0d336f1>] kiblnd_connd+0x251/0xa00 [ko2iblnd]
      [534987.500642]  [<ffffffffa1cd6b10>] ? wake_up_state+0x20/0x20
      [534987.500649]  [<ffffffffc0d334a0>] ? kiblnd_cm_callback+0x2380/0x2380 [ko2iblnd]
      [534987.500651]  [<ffffffffa1cc1f81>] kthread+0xd1/0xe0
      [534987.500653]  [<ffffffffa1cc1eb0>] ? insert_kthread_work+0x40/0x40
      [534987.500657]  [<ffffffffa2377c1d>] ret_from_fork_nospec_begin+0x7/0x21
      [534987.500659]  [<ffffffffa1cc1eb0>] ? insert_kthread_work+0x40/0x40
      [534987.500677] Code: c0 e8 cc df ed ff f0 ff 82 e8 00 00 00 83 40 58 01 48 8b 3d ca 17 04 00 31 f6 e8 83 60 ef ff 0f b6 43 6d 83 e0 01 e9 c8 f5 ff ff <f0> ff 82 ec 00 00 00 83 40 5c 01 eb d9 f0 ff 82 e4 00 00 00 83
      [534987.500689] RIP  [<ffffffffc0ea5889>] lnet_finalize+0xb99/0xdc0 [lnet]
      [534987.500690]  RSP <ffff89b659a73cf0>
      [534987.500691] CR2: 00000000000000ec
      

      Attachments

        Activity

          People

            hornc Chris Horn
            hornc Chris Horn
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: