Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.14.0, Lustre 2.12.4
-
None
-
3
-
9223372036854775807
Description
The LNet messages used for replies to optimized GETs, created via lnet_create_reply_msg(), are only ever committed for rx. As such, their msg_txni and msg_txpeer fields are NULL. lnet_incr_hstats() does not account for this situation, so when passed one of these messages attempts to deref a NULL pointer.
[534987.484660] LNet: 33866:0:(o2iblnd_cb.c:2081:kiblnd_close_conn_locked()) Closing conn to 10.16.100.20@o2ib: error 0(sending)(sending_nocred)(waiting) [534987.500344] LustreError: 166827:0:(events.c:453:server_bulk_callback()) event type 3, status -103, desc ffff89b65f3aa600 [534987.500406] LNetError: 166825:0:(lib-msg.c:479:lnet_handle_local_failure()) ni 10.16.100.55@o2ib added to recovery queue. Health = 900 [534987.500412] LustreError: 166825:0:(events.c:453:server_bulk_callback()) event type 5, status -103, desc ffff89b65494ba00 [534987.500416] LustreError: 166825:0:(events.c:453:server_bulk_callback()) event type 5, status -103, desc ffff89e751f58800 [534987.500460] BUG: unable to handle kernel NULL pointer dereference at 00000000000000ec [534987.500498] IP: [<ffffffffc0ea5889>] lnet_finalize+0xb99/0xdc0 [lnet] [534987.500499] PGD 0 [534987.500501] Oops: 0002 [#1] SMP [534987.500532] Modules linked in: osd_zfs(OE) mdt(OE) mdd(OE) lod(OE) mgs(OE) osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) raid5_pd(POE) raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 ext4 mbcache jbd2 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack xt_multiport iptable_filter xt_CT nf_conntrack libcrc32c iptable_raw dm_service_time dm_multipath mst_pciconf(OE) mlx4_ib(OE) mlx4_en(OE) mlx4_core(OE) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) sd_mod crc_t10dif crct10dif_generic sg ib_umad(OE) ib_ipoib(OE) ib_cm(OE) zfs(POE) zunicode(POE) zlua(POE) edac_mce_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel [534987.500557] lrw gf128mul glue_helper ablk_helper cryptd zcommon(POE) znvpair(POE) mlx5_ib(OE) zavl(POE) pcspkr icp(POE) ib_uverbs(OE) spl(OE) ib_core(OE) ast mlx5_core(OE) ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm drm_panel_orientation_quirks mlx_compat(OE) dm_mod mlxfw devlink mpt3sas(OE) raid_class scsi_transport_sas i2c_piix4 i2c_designware_platform i2c_designware_core pinctrl_amd acpi_cpufreq ip_tables nfsv3 nfs_acl nfs lockd grace fscache team_mode_activebackup team crct10dif_pclmul crct10dif_common crc32c_intel igb i2c_algo_bit dca ptp pps_core nvme nvme_core nfit libnvdimm sunrpc bonding ipmi_si ipmi_devintf ipmi_msghandler [last unloaded: libcfs] [534987.500560] CPU: 10 PID: 166825 Comm: kiblnd_connd Kdump: loaded Tainted: P W OE ------------ 3.10.0-957.1.3957.1.3.x4.1.6.x86_64 #1 [534987.500561] Hardware name: Viking Enterprise Solutions VSSEP1EA/VSSEP1EA, BIOS 10.01 03/04/2020 [534987.500562] task: ffff89baf1de1040 ti: ffff89b659a70000 task.ti: ffff89b659a70000 [534987.500576] RIP: 0010:[<ffffffffc0ea5889>] [<ffffffffc0ea5889>] lnet_finalize+0xb99/0xdc0 [lnet] [534987.500577] RSP: 0018:ffff89b659a73cf0 EFLAGS: 00010293 [534987.500578] RAX: ffff89bb25bdfa80 RBX: ffff89adb1978898 RCX: 0000000000000000 [534987.500579] RDX: 0000000000000000 RSI: ffffffffc0ea5889 RDI: ffff89bb2be8ae00 [534987.500580] RBP: ffff89b659a73d40 R08: 0000000000000000 R09: 00000001804a0018 [534987.500581] R10: 000000008c5e9601 R11: fffff16efe317a00 R12: 00000000ffffff99 [534987.500582] R13: 0000000000000000 R14: 0000000000000005 R15: 0000000000000000 [534987.500583] FS: 0000000000000000(0000) GS:ffff89cb2ee80000(0000) knlGS:0000000000000000 [534987.500584] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [534987.500585] CR2: 00000000000000ec CR3: 0000000fd52f6000 CR4: 0000000000340fe0 [534987.500586] Call Trace: [534987.500601] [<ffffffffc0d1fd22>] ? kiblnd_pool_free_node+0x82/0x170 [ko2iblnd] [534987.500609] [<ffffffffc0d296dd>] kiblnd_tx_done+0x10d/0x3e0 [ko2iblnd] [534987.500616] [<ffffffffc0d299fb>] kiblnd_txlist_done+0x4b/0x60 [ko2iblnd] [534987.500624] [<ffffffffc0d2f05d>] kiblnd_abort_txs+0xed/0x240 [ko2iblnd] [534987.500631] [<ffffffffc0d2f243>] kiblnd_finalise_conn+0x93/0x120 [ko2iblnd] [534987.500637] [<ffffffffc0d336f1>] kiblnd_connd+0x251/0xa00 [ko2iblnd] [534987.500642] [<ffffffffa1cd6b10>] ? wake_up_state+0x20/0x20 [534987.500649] [<ffffffffc0d334a0>] ? kiblnd_cm_callback+0x2380/0x2380 [ko2iblnd] [534987.500651] [<ffffffffa1cc1f81>] kthread+0xd1/0xe0 [534987.500653] [<ffffffffa1cc1eb0>] ? insert_kthread_work+0x40/0x40 [534987.500657] [<ffffffffa2377c1d>] ret_from_fork_nospec_begin+0x7/0x21 [534987.500659] [<ffffffffa1cc1eb0>] ? insert_kthread_work+0x40/0x40 [534987.500677] Code: c0 e8 cc df ed ff f0 ff 82 e8 00 00 00 83 40 58 01 48 8b 3d ca 17 04 00 31 f6 e8 83 60 ef ff 0f b6 43 6d 83 e0 01 e9 c8 f5 ff ff <f0> ff 82 ec 00 00 00 83 40 5c 01 eb d9 f0 ff 82 e4 00 00 00 83 [534987.500689] RIP [<ffffffffc0ea5889>] lnet_finalize+0xb99/0xdc0 [lnet] [534987.500690] RSP <ffff89b659a73cf0> [534987.500691] CR2: 00000000000000ec