Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11693

Soft lockups on Lustre clients

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.10.2
    • None
    • 3
    • 9223372036854775807

    Description

      We get quite a few soft lockups on our Lustre gateways (Lustre clients that export Lustre filesystems over NFS). Example:

      Nov 13 00:26:06 foxtrot2 kernel: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [nfsd:11973]
      Nov 13 00:26:06 foxtrot2 kernel: NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [rsync:36079]
      Nov 13 00:26:06 foxtrot2 kernel: Modules linked in: vfat fat dm_service_time mpt3sas mpt2sas raid_class scsi_transport_sas mptctl mptb
      ase nfsv3 nfs fscache osc(OE) mgc(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE)
      dell_rbu libcfs(OE) bonding sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel iTCO_wdt iTCO_vendor_support kv
      m joydev dcdbas irqbypass sg shpchp ipmi_si ipmi_devintf ipmi_msghandler lpc_ich mei_me mei acpi_power_meter acpi_pad nfsd auth_rpcgss
      nfs_acl lockd grace binfmt_misc ip_tables xfs sd_mod crc_t10dif crct10dif_generic 8021q garp stp llc mrp mgag200 i2c_algo_bit drm_kms
      _helper scsi_transport_iscsi bnx2x syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul crct10dif_common crc32_pclmul cr
      c32c_intel ahci drm ghash_clmulni_intel
      Nov 13 00:26:06 foxtrot2 kernel: libahci aesni_intel dm_multipath libata lrw gf128mul glue_helper ablk_helper cryptd megaraid_sas i2c_
      core ptp pps_core mdio libcrc32c wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod [last unloaded: usb_storage]
      Nov 13 00:26:06 foxtrot2 kernel: CPU: 1 PID: 36079 Comm: rsync Tainted: G W OE ------------ 3.10.0-693.5.2.el7_lustre.x86_6
      4 #1
      Nov 13 00:26:06 foxtrot2 kernel: Hardware name: Dell Inc. PowerEdge R620/01W23F, BIOS 2.5.4 01/22/2016
      Nov 13 00:26:06 foxtrot2 kernel: task: ffff883ff8a04f10 ti: ffff8815a1200000 task.ti: ffff8815a1200000
      Nov 13 00:26:06 foxtrot2 kernel: RIP: 0010:[<ffffffff810fa332>] [<ffffffff810fa332>] native_queued_spin_lock_slowpath+0x112/0x1e0
      Nov 13 00:26:06 foxtrot2 kernel: RSP: 0018:ffff8815a1203700 EFLAGS: 00000246
      Nov 13 00:26:06 foxtrot2 kernel: RAX: 0000000000000000 RBX: ffff883fff017880 RCX: 0000000000090000
      Nov 13 00:26:06 foxtrot2 kernel: RDX: ffff883fff4d7880 RSI: 0000000001390101 RDI: ffff881ff99da818
      Nov 13 00:26:06 foxtrot2 kernel: RBP: ffff8815a1203700 R08: ffff883fff017880 R09: 0000000000000000
      Nov 13 00:26:06 foxtrot2 kernel: R10: 0004c5dab524ba0b R11: 0000000000000000 R12: 0004c5dab524ba0b
      Nov 13 00:26:06 foxtrot2 kernel: R13: 0000000000000000 R14: 0004c5dab39dc857 R15: ffff8815a12036e8
      Nov 13 00:26:06 foxtrot2 kernel: FS: 00007f0ff1094740(0000) GS:ffff883fff000000(0000) knlGS:0000000000000000
      Nov 13 00:26:06 foxtrot2 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      Nov 13 00:26:06 foxtrot2 kernel: CR2: 00007fd6cb1e9000 CR3: 000000163eff9000 CR4: 00000000001407e0
      Nov 13 00:26:06 foxtrot2 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      Nov 13 00:26:06 foxtrot2 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Nov 13 00:26:06 foxtrot2 kernel: Stack:
      Nov 13 00:26:06 foxtrot2 kernel: ffff8815a1203710 ffffffff8169e6bf ffff8815a1203720 ffffffff816abbf0
      Nov 13 00:26:06 foxtrot2 kernel: ffff8815a12037a0 ffffffffc0c2d421 ffff8815a12037e0 ffffffffc0c2ba60
      Nov 13 00:26:06 foxtrot2 kernel: 0000000000000000 00000161000ab602 0004c5dab524ba0b ffff88130fb65c00
      Nov 13 00:26:06 foxtrot2 kernel: Call Trace:
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff8169e6bf>] queued_spin_lock_slowpath+0xb/0xf
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff816abbf0>] _raw_spin_lock+0x20/0x30
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc0c2d421>] ldlm_prepare_lru_list+0x361/0x4e0 [ptlrpc]
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc0c2ba60>] ? ldlm_cancel_aged_no_wait_policy+0x70/0x70 [ptlrpc]
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc0c30c5a>] ldlm_cancel_lru_local+0x1a/0x30 [ptlrpc]
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc0c30e8e>] ldlm_prep_elc_req+0x21e/0x490 [ptlrpc]
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc0c31128>] ldlm_prep_enqueue_req+0x28/0x30 [ptlrpc]
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc07c67a3>] mdc_intent_getattr_pack.isra.15+0x93/0x280 [mdc]
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc07c8f3b>] mdc_enqueue_base+0x9fb/0x18f0 [mdc]
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff810c45a3>] ? try_to_wake_up+0x183/0x340
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff810ba598>] ? __wake_up_common+0x58/0x90
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc07ca6cb>] mdc_intent_lock+0x26b/0x520 [mdc]
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc0c66243>] ? reply_in_callback+0x143/0x5e0 [ptlrpc]
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc0972e30>] ? ll_invalidate_negative_children+0x1d0/0x1d0 [lustre]
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc0c2c7a0>] ? ldlm_expired_completion_wait+0x240/0x240 [ptlrpc]
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc0910e4f>] lmv_intent_lock+0x5cf/0x1b50 [lmv]
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff810b8a01>] ? in_group_p+0x31/0x40
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc09738c5>] ? ll_i2suppgid+0x15/0x40 [lustre]
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc0973914>] ? ll_i2gids+0x24/0xb0 [lustre]
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff81114b02>] ? from_kgid+0x12/0x20
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc0972e30>] ? ll_invalidate_negative_children+0x1d0/0x1d0 [lustre]
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc0974feb>] ll_lookup_it+0x29b/0xee0 [lustre]
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff810c8f28>] ? __enqueue_entity+0x78/0x80
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffffc0976fbb>] ll_lookup_nd+0xbb/0x190 [lustre]
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff8120b3dd>] lookup_real+0x1d/0x50
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff8120bcb2>] __lookup_hash+0x42/0x60
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff816a13e2>] lookup_slow+0x42/0xa7
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff8120f25b>] path_lookupat+0x77b/0x7b0
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff811df623>] ? kmem_cache_alloc+0x193/0x1e0
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff81211c9f>] ? getname_flags+0x4f/0x1a0
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff8120f2bb>] filename_lookup+0x2b/0xc0
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff81212e37>] user_path_at_empty+0x67/0xc0
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff81212ea1>] user_path_at+0x11/0x20
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff812063e3>] vfs_fstatat+0x63/0xc0
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff812069b1>] SYSC_newlstat+0x31/0x60
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff81206c3e>] SyS_newlstat+0xe/0x10
      Nov 13 00:26:06 foxtrot2 kernel: [<ffffffff816b5089>] system_call_fastpath+0x16/0x1b

       

      Attachments

        Issue Links

          Activity

            People

              yujian Jian Yu
              cmcl Campbell Mcleay (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: