Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.10.2
-
None
-
3
-
9223372036854775807
Description
We get quite a few soft lockups on our Lustre gateways (Lustre clients that export Lustre filesystems over NFS). Example:
Nov 13 00:26:06 foxtrot2 kernel: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [nfsd:11973]
Nov 13 00:26:06 foxtrot2 kernel: NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [rsync:36079]
Nov 13 00:26:06 foxtrot2 kernel: Modules linked in: vfat fat dm_service_time mpt3sas mpt2sas raid_class scsi_transport_sas mptctl mptb
ase nfsv3 nfs fscache osc(OE) mgc(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE)
dell_rbu libcfs(OE) bonding sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel iTCO_wdt iTCO_vendor_support kv
m joydev dcdbas irqbypass sg shpchp ipmi_si ipmi_devintf ipmi_msghandler lpc_ich mei_me mei acpi_power_meter acpi_pad nfsd auth_rpcgss
nfs_acl lockd grace binfmt_misc ip_tables xfs sd_mod crc_t10dif crct10dif_generic 8021q garp stp llc mrp mgag200 i2c_algo_bit drm_kms
_helper scsi_transport_iscsi bnx2x syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul crct10dif_common crc32_pclmul cr
c32c_intel ahci drm ghash_clmulni_intel
Nov 13 00:26:06 foxtrot2 kernel: libahci aesni_intel dm_multipath libata lrw gf128mul glue_helper ablk_helper cryptd megaraid_sas i2c_
core ptp pps_core mdio libcrc32c wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod [last unloaded: usb_storage]
Nov 13 00:26:06 foxtrot2 kernel: CPU: 1 PID: 36079 Comm: rsync Tainted: G W OE ------------ 3.10.0-693.5.2.el7_lustre.x86_6
4 #1
Nov 13 00:26:06 foxtrot2 kernel: Hardware name: Dell Inc. PowerEdge R620/01W23F, BIOS 2.5.4 01/22/2016
Nov 13 00:26:06 foxtrot2 kernel: task: ffff883ff8a04f10 ti: ffff8815a1200000 task.ti: ffff8815a1200000
Nov 13 00:26:06 foxtrot2 kernel: RIP: 0010:[<ffffffff810fa332>] [<ffffffff810fa332>] native_queued_spin_lock_slowpath+0x112/0x1e0
Nov 13 00:26:06 foxtrot2 kernel: RSP: 0018:ffff8815a1203700 EFLAGS: 00000246
Nov 13 00:26:06 foxtrot2 kernel: RAX: 0000000000000000 RBX: ffff883fff017880 RCX: 0000000000090000
Nov 13 00:26:06 foxtrot2 kernel: RDX: ffff883fff4d7880 RSI: 0000000001390101 RDI: ffff881ff99da818
Nov 13 00:26:06 foxtrot2 kernel: RBP: ffff8815a1203700 R08: ffff883fff017880 R09: 0000000000000000
Nov 13 00:26:06 foxtrot2 kernel: R10: 0004c5dab524ba0b R11: 0000000000000000 R12: 0004c5dab524ba0b
Nov 13 00:26:06 foxtrot2 kernel: R13: 0000000000000000 R14: 0004c5dab39dc857 R15: ffff8815a12036e8
Nov 13 00:26:06 foxtrot2 kernel: FS: 00007f0ff1094740(0000) GS:ffff883fff000000(0000) knlGS:0000000000000000
Nov 13 00:26:06 foxtrot2 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 13 00:26:06 foxtrot2 kernel: CR2: 00007fd6cb1e9000 CR3: 000000163eff9000 CR4: 00000000001407e0
Nov 13 00:26:06 foxtrot2 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 13 00:26:06 foxtrot2 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Nov 13 00:26:06 foxtrot2 kernel: Stack:
Nov 13 00:26:06 foxtrot2 kernel: ffff8815a1203710 ffffffff8169e6bf ffff8815a1203720 ffffffff816abbf0
Nov 13 00:26:06 foxtrot2 kernel: ffff8815a12037a0 ffffffffc0c2d421 ffff8815a12037e0 ffffffffc0c2ba60
Nov 13 00:26:06 foxtrot2 kernel: 0000000000000000 00000161000ab602 0004c5dab524ba0b ffff88130fb65c00
Nov 13 00:26:06 foxtrot2 kernel: Call Trace:
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff8169e6bf>] queued_spin_lock_slowpath+0xb/0xf
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff816abbf0>] _raw_spin_lock+0x20/0x30
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc0c2d421>] ldlm_prepare_lru_list+0x361/0x4e0 [ptlrpc]
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc0c2ba60>] ? ldlm_cancel_aged_no_wait_policy+0x70/0x70 [ptlrpc]
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc0c30c5a>] ldlm_cancel_lru_local+0x1a/0x30 [ptlrpc]
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc0c30e8e>] ldlm_prep_elc_req+0x21e/0x490 [ptlrpc]
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc0c31128>] ldlm_prep_enqueue_req+0x28/0x30 [ptlrpc]
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc07c67a3>] mdc_intent_getattr_pack.isra.15+0x93/0x280 [mdc]
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc07c8f3b>] mdc_enqueue_base+0x9fb/0x18f0 [mdc]
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff810c45a3>] ? try_to_wake_up+0x183/0x340
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff810ba598>] ? __wake_up_common+0x58/0x90
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc07ca6cb>] mdc_intent_lock+0x26b/0x520 [mdc]
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc0c66243>] ? reply_in_callback+0x143/0x5e0 [ptlrpc]
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc0972e30>] ? ll_invalidate_negative_children+0x1d0/0x1d0 [lustre]
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc0c2c7a0>] ? ldlm_expired_completion_wait+0x240/0x240 [ptlrpc]
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc0910e4f>] lmv_intent_lock+0x5cf/0x1b50 [lmv]
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff810b8a01>] ? in_group_p+0x31/0x40
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc09738c5>] ? ll_i2suppgid+0x15/0x40 [lustre]
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc0973914>] ? ll_i2gids+0x24/0xb0 [lustre]
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff81114b02>] ? from_kgid+0x12/0x20
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc0972e30>] ? ll_invalidate_negative_children+0x1d0/0x1d0 [lustre]
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc0974feb>] ll_lookup_it+0x29b/0xee0 [lustre]
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff810c8f28>] ? __enqueue_entity+0x78/0x80
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc0976fbb>] ll_lookup_nd+0xbb/0x190 [lustre]
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff8120b3dd>] lookup_real+0x1d/0x50
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff8120bcb2>] __lookup_hash+0x42/0x60
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff816a13e2>] lookup_slow+0x42/0xa7
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff8120f25b>] path_lookupat+0x77b/0x7b0
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff811df623>] ? kmem_cache_alloc+0x193/0x1e0
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff81211c9f>] ? getname_flags+0x4f/0x1a0
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff8120f2bb>] filename_lookup+0x2b/0xc0
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff81212e37>] user_path_at_empty+0x67/0xc0
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff81212ea1>] user_path_at+0x11/0x20
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff812063e3>] vfs_fstatat+0x63/0xc0
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff812069b1>] SYSC_newlstat+0x31/0x60
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff81206c3e>] SyS_newlstat+0xe/0x10
Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff816b5089>] system_call_fastpath+0x16/0x1b