Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.12.5
-
lustre-2.12.5_10.llnl
RHEL 7.9 derivative
3.10.0-1160.4.1.1chaos.ch6.x86_64
-
3
-
9223372036854775807
Description
Lustre router produces the following stack dump when LNet is started:
LNet: Added LNI 192.168.129.134@o2ib27 [8/1024/0/180] LNet: Using FastReg for registration ------------[ cut here ]------------ WARNING: CPU: 22 PID: 5953 at kernel/softirq.c:151 __local_bh_enable_ip+0x82/0xb0 Modules linked in: ko2iblnd(OE) lnet(OE) libcfs(OE) opa_vnic rpcrdma ib_iser mlx5_ib iTCO_wdt iTCO_vendor_support sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi hfi1 kvm sg rdmavt irqbypass joydev pcspkr mlx5_core lpc_ich mlxfw devlink i2c_i801 ioatdma ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_cpufreq sch_fq_codel ib_ipoib rdma_ucm ib_uverbs binfmt_misc msr_safe(OE) ib_umad iw_cxgb4 rdma_cm iw_cm ib_cm iw_cxgb3 ib_core ip_tables nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache overlay(T) ext4 mbcache jbd2 dm_service_time sd_mod crc_t10dif[ 234.953405] LNet: Added LNI 172.19.3.75@o2ib600 [8/1024/0/180] crct10dif_generic be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi 8021q libcxgb garp mrp stp qla4xxx llc iscsi_boot_sysfs mgag200 drm_kms_helper syscopyarea crct10dif_pclmul sysfillrect crct10dif_common sysimgblt crc32_pclmul fb_sys_fops crc32c_intel ttm ghash_clmulni_intel igb mxm_wmi drm ahci aesni_intel lrw libahci gf128mul dca glue_helper ablk_helper ptp libata cryptd drm_panel_orientation_quirks pps_core i2c_algo_bit dm_multipath wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse CPU: 22 PID: 5953 Comm: kworker/22:2H Kdump: loaded Tainted: G OE ------------ T 3.10.0-1160.4.1.1chaos.ch6.x86_64 #1 Hardware name: Penguin Computing Relion 2900e/S2600WT2R, BIOS SE5C610.86B.01.01.0027.071020182329 07/10/2018 Workqueue: rdmavt_cq send_complete [rdmavt] Call Trace: [<ffffffff865ae498>] dump_stack+0x19/0x1b [<ffffffff85e9e0e8>] __warn+0xd8/0x100 [<ffffffff85e9e22d>] warn_slowpath_null+0x1d/0x20 [<ffffffff85ea7652>] __local_bh_enable_ip+0x82/0xb0 [<ffffffff865b737e>] _raw_spin_unlock_bh+0x1e/0x20 [<ffffffffc09673c5>] cfs_trace_unlock_tcd+0x65/0xb0 [libcfs] [<ffffffffc096de18>] libcfs_debug_vmsg2+0x728/0xbb0 [libcfs] [<ffffffff85eed858>] ? enqueue_task_fair+0x208/0x6c0 [<ffffffff85edd4df>] ? ttwu_do_activate+0x6f/0x80 [<ffffffffc096e2f7>] libcfs_debug_msg+0x57/0x80 [libcfs] [<ffffffffc086f32a>] kiblnd_cq_completion+0x11a/0x160 [ko2iblnd] [<ffffffffc0d745ae>] send_complete+0x3e/0x60 [rdmavt] [<ffffffff85ec298f>] process_one_work+0x18f/0x4a0 [<ffffffff85ec3776>] worker_thread+0x126/0x3e0 [<ffffffff85ec3650>] ? rescuer_thread+0x430/0x430 [<ffffffff85ecafc1>] kthread+0xd1/0xe0 [<ffffffff85ecaef0>] ? insert_kthread_work+0x40/0x40 [<ffffffff865c1ff7>] ret_from_fork_nospec_begin+0x21/0x21 [<ffffffff85ecaef0>] ? insert_kthread_work+0x40/0x40 ---[ end trace 07f572a412f12f32 ]--- OK Started lnet management.
And the warning indicates that _raw_spin_unlock_bh() was called when either IRQs are disabled, or we are in an IRQ handler.
(gdb) l *(__local_bh_enable_ip+0x82/0xb0) 0xffffffff810a75d0 is in __local_bh_enable_ip (kernel/softirq.c:150). 145 } 146 147 EXPORT_SYMBOL(_local_bh_enable); 148 149 void __local_bh_enable_ip(unsigned long ip, unsigned int cnt) 150 { 151 WARN_ON_ONCE(in_irq() || irqs_disabled()); 152 #ifdef CONFIG_TRACE_IRQFLAGS 153 local_irq_disable(); 154 #endif
Hi Serguei, sorry for the slow response. Actually, I do not think we are seeing this now. My prior search was wrong and I found different stacks that are completely unrelated.