Details
-
Bug
-
Resolution: Fixed
-
Critical
-
None
-
Lustre 2.12.4
-
lustre 2.12.4 + patches
lustre 2.10.8 + patches
RHEL 7.8 + patches
-
3
-
9223372036854775807
Description
Router node crashes with apparently infinite recursion in lnet.
[15037.327128] Thread overran stack, or stack corrupted [15037.332674] Oops: 0000 [#1] SMP [15037.336294] Modules linked in: ko2iblnd(OE) lnet(OE) libcfs(OE) mlx4_ib mlx4_en rpcrdma ib_iser iTCO_wdt iTCO_vendor_support sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass pcspkr zfs(POE) ib_qib rdmavt lpc_ich zunicode(POE) zavl(POE) icp(POE) joydev zcommon(POE) znvpair(POE) spl(OE) mlx4_core devlink sg i2c_i801 ioatdma ipmi_si ipmi_devintf ipmi_msghandler ib_ipoib rdma_ucm ib_uverbs ib_umad acpi_cpufreq iw_cxgb4 rdma_cm iw_cm ib_cm iw_cxgb3 ib_core sch_fq_codel binfmt_misc msr_safe(OE) ip_tables nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache overlay(T) ext4 mbcache jbd2 dm_service_time sd_mod crc_t10dif crct10dif_generic be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi libcxgb qla4xxx iscsi_boot_sysfs 8021q mgag200 garp mrp [15037.415946] stp drm_kms_helper llc syscopyarea sysfillrect sysimgblt fb_sys_fops crct10dif_pclmul crct10dif_common ttm crc32_pclmul crc32c_intel ghash_clmulni_intel ahci drm mpt2sas igb isci libahci aesni_intel lrw gf128mul libsas glue_helper dca ablk_helper raid_class ptp cryptd libata dm_multipath scsi_transport_sas drm_panel_orientation_quirks pps_core wmi i2c_algo_bit sunrpc dm_mirror dm_region_hash dm_log dm_mod iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi [15037.461455] CPU: 3 PID: 21567 Comm: kiblnd_sd_00_00 Kdump: loaded Tainted: P OE ------------ T 3.10.0-1127.0.0.1chaos.ch6.x86_64 #1 [15037.475718] Hardware name: cray cray-2628-lr/S2600GZ, BIOS SE5C600.86B.02.06.0002.101320150901 10/13/2015 [15037.486395] task: ffff9fd2b7775230 ti: ffff9fd27c1e8000 task.ti: ffff9fd27c1e8000 [15037.494744] RIP: 0010:[<ffffffffa17acd4d>] [<ffffffffa17acd4d>] strnlen+0xd/0x40 [15037.503106] RSP: 0018:ffff9fd27c1e7e80 EFLAGS: 00010086 [15037.509032] RAX: ffffffffa1e86261 RBX: ffffffffa2402fd6 RCX: fffffffffffffffe [15037.516994] RDX: 00000000c0ab26ee RSI: ffffffffffffffff RDI: 00000000c0ab26ee [15037.524957] RBP: ffff9fd27c1e7e80 R08: 000000000000ffff R09: 000000000000ffff [15037.532917] R10: 0000000000000000 R11: ffff9fd27c1e7e46 R12: 00000000c0ab26ee [15037.540880] R13: ffffffffa24033a0 R14: 00000000ffffffff R15: 0000000000000000 [15037.548842] FS: 0000000000000000(0000) GS:ffff9fd2be8c0000(0000) knlGS:0000000000000000 [15037.557871] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [15037.564282] CR2: 00000000c0ab26ee CR3: 0000001ffbd04000 CR4: 00000000001607e0 [15037.572244] Call Trace: [15037.574990] [<ffffffffc08e759a>] ? cfs_print_to_console+0x7a/0x1c0 [libcfs] [15037.582862] [<ffffffffc08eda74>] ? libcfs_debug_vmsg2+0x574/0xbb0 [libcfs] [15037.590635] [<ffffffffc08ee107>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [15037.598037] [<ffffffffc0a7e226>] ? lnet_finalize+0x976/0x9f0 [lnet] ... [15039.259572] [<ffffffffc0a7d9e9>] ? lnet_finalize+0x139/0x9f0 [lnet] [15039.266666] [<ffffffffc08ee107>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [15039.274052] [<ffffffffc0a87b8a>] ? lnet_post_send_locked+0x42a/0xa40 [lnet] [15039.281922] [<ffffffffc0a89e38>] ? lnet_return_tx_credits_locked+0x238/0x4a0 [lnet] [15039.290583] [<ffffffffc0a7c88c>] ? lnet_msg_decommit+0xec/0x700 [lnet] [15039.297960] [<ffffffffc0a7dc3f>] ? lnet_finalize+0x38f/0x9f0 [lnet] [15039.305055] [<ffffffffc09bc75d>] ? kiblnd_tx_done+0x10d/0x3e0 [ko2iblnd] [15039.312634] [<ffffffffc09c7d19>] ? kiblnd_scheduler+0x8c9/0x1160 [ko2iblnd] [15039.320502] [<ffffffffa142d59e>] ? __switch_to+0xce/0x5a0 [15039.326626] [<ffffffffa14e29b0>] ? wake_up_state+0x20/0x20 [15039.332835] [<ffffffffa14b46bc>] ? mod_timer+0x11c/0x260 [15039.338859] [<ffffffffc09c7450>] ? kiblnd_cq_event+0x90/0x90 [ko2iblnd] [15039.346338] [<ffffffffa14cca01>] ? kthread+0xd1/0xe0 [15039.351974] [<ffffffffa14cc930>] ? insert_kthread_work+0x40/0x40 [15039.358776] [<ffffffffa1bbff77>] ? ret_from_fork_nospec_begin+0x21/0x21 [15039.366253] [<ffffffffa14cc930>] ? insert_kthread_work+0x40/0x40
Attachments
Issue Links
- is related to
-
LU-12402 LNet Health: lnet_finalize() recursion
- Resolved