Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
We see following dead loops on several cusomter sites:
[167799.937527] NMI watchdog: BUG: soft lockup - CPU#21 stuck for 22s! [icon:246965] [167799.945027] Modules linked in: mst_pciconf(OE) mgc(OE) lustre(OE) lmv(OE) mdc(OE) fid(OE) osc(OE) lov(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache libcfs(OE) ve_peermem(OE) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx4_en(OE) mlx4_ib(OE) ve_drv(OE) mlx4_core(OE) vp(OE) sunrpc dm_mirror dm_region_hash dm_log dm_mod vfat fat amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr sg ipmi_si ipmi_devintf ipmi_msghandler pcc_cpufreq pinctrl_amd i2c_designware_platform k10temp i2c_piix4 acpi_cpufreq i2c_designware_core knem(OE) binfmt_misc ip_tables xfs libcrc32c mlx5_ib(OE) ib_uverbs(OE) sd_mod crc_t10dif crct10dif_generic ib_core(OE) ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci mlx5_core(OE) drm crct10dif_pclmul mlxfw(OE) crct10dif_common ptp crc32c_intel libahci pps_core devlink libata drm_panel_orientation_quirks mlx_compat(OE) nfit libnvdimm [last unloaded: mst_pci] [167800.047456] CPU: 21 PID: 246965 Comm: icon Kdump: loaded Tainted: G OEL ------------ 3.10.0-1062.4.1.el7.x86_64 #1 [167800.058914] Hardware name: GIGABYTE G292-Z21-NJ/MZ22-G20-00, BIOS R07 01/02/2020 [167800.066385] task: ffff9fb8d6751070 ti: ffffa0282bf84000 task.ti: ffffa0282bf84000 [167800.073943] RIP: 0010:[<ffffffffc0ca3696>] [<ffffffffc0ca3696>] vvp_io_rw_lock+0x346/0x7f0 [lustre] [167800.083183] RSP: 0018:ffffa0282bf87a48 EFLAGS: 00000202 [167800.088571] RAX: 0000000000000000 RBX: 00002ae27e0da6d8 RCX: 0000000000000000 [167800.095783] RDX: 000000000106f3d8 RSI: 0000000000000000 RDI: ffffa0282bf87a90 [167800.102994] RBP: ffffa0282bf87b10 R08: 000000000106f400 R09: ffffa0282bf87ac0 [167800.110206] R10: 00002ae27e0da6d8 R11: 000000000106f401 R12: ffffa0282bf879c0 [167800.117417] R13: ffffa0282bf87a38 R14: ffffffffa78a8b09 R15: ffffa0282bf879b0 [167800.124628] FS: 00002ae279437c00(0000) GS:ffffa025e7540000(0000) knlGS:0000000000000000 [167800.132791] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [167800.138616] CR2: 000000000a390000 CR3: 00000016ab28e000 CR4: 0000000000340fe0 [167800.145826] Call Trace: [167800.148367] [<ffffffffc0ca3b85>] vvp_io_write_lock+0x45/0x80 [lustre] [167800.154990] [<ffffffffc1d172af>] cl_io_lock+0x5f/0x3d0 [obdclass] [167800.161257] [<ffffffffc1d1783a>] cl_io_loop+0xba/0x1c0 [obdclass] [167800.167518] [<ffffffffc0c5ab8a>] ll_file_io_generic+0x61a/0xb10 [lustre] [167800.174392] [<ffffffffc0c5b54c>] ll_file_aio_write+0x29c/0x6e0 [lustre] [167800.181174] [<ffffffffa7a484fb>] do_sync_readv_writev+0x7b/0xd0 [167800.187265] [<ffffffffa7a4a13e>] do_readv_writev+0xce/0x260 [167800.193009] [<ffffffffc0c5b2b0>] ? ll_file_splice_read+0x230/0x230 [lustre] [167800.200143] [<ffffffffc0c5b990>] ? ll_file_aio_write+0x6e0/0x6e0 [lustre] [167800.207097] [<ffffffffa7a4bec1>] ? __sb_end_write+0x31/0x70 [167800.212833] [<ffffffffa7a4a365>] vfs_writev+0x35/0x60 [167800.218050] [<ffffffffa7a4a51f>] SyS_writev+0x7f/0x110 [167800.223356] [<ffffffffa7f8bede>] system_call_fastpath+0x25/0x2a [167800.229440] Code: e8 d0 78 d1 e6 4c 8b 7d 98 4d 85 ff 0f 84 f9 01 00 00 48 8b 45 80 48 8b 55 90 4c 8b 10 48 8b 40 08 48 29 d0 49 39 c7 49 0f 46 c7 <48> 85 c0 48 89 85 60 ff ff ff 74 be 49 01 d2 48 8b bd 70 ff ff [167827.938477] NMI watchdog: BUG: soft lockup - CPU#21 stuck for 22s! [icon:246965] [167827.945977] Modules linked in: mst_pciconf(OE) mgc(OE) lustre(OE) lmv(OE) mdc(OE) fid(OE) osc(OE) lov(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache libcfs(OE) ve_peermem(OE) rdma_ucm(OE) ib_ucm(OE) rdma_cm(O