Details

    • Bug
    • Resolution: Incomplete
    • Minor
    • None
    • Lustre 2.12.8
    • None
    • 3
    • 9223372036854775807

    Description

      kernel: NMI watchdog: BUG: soft lockup - CPU#23 stuck for 22s! [ptlrpcd_01_10:3531

      full version: 2.12.8_6_g5457c37-1.el7

      Can you let me know what debugging options I should turn on to get the info needed to diagnose the issue.

      Attachments

        Issue Links

          Activity

            [LU-16343] soft lockups ptlrpcd

            Hi Alex,

            We have had only one ptlrpcd lockup since the beginning of December last year. I think we could close this ticket for now, and open a new one if needed

            Thanks,
            Campbell

            dneg Dneg (Inactive) added a comment - Hi Alex, We have had only one ptlrpcd lockup since the beginning of December last year. I think we could close this ticket for now, and open a new one if needed Thanks, Campbell

            dneg thanks for the log. unfortuntely the log has the only trace, so I can't idenfity another thread holding the spinlock. ideally we need a crashdump or full set of traces (echo t >/proc/sysrq-trigger) to be able to find which process was holding the spinlock so blocking ptlrpcd.

            bzzz Alex Zhuravlev added a comment - dneg thanks for the log. unfortuntely the log has the only trace, so I can't idenfity another thread holding the spinlock. ideally we need a crashdump or full set of traces (echo t >/proc/sysrq-trigger) to be able to find which process was holding the spinlock so blocking ptlrpcd.

            Hi Alex,

            ful syslog file attached

            dneg Dneg (Inactive) added a comment - Hi Alex, ful syslog file attached

            can you please attach full dmesg/syslog output? probably something bad happened before.

            bzzz Alex Zhuravlev added a comment - can you please attach full dmesg/syslog output? probably something bad happened before.

            Hi Alex,

            Do you need any further information?

            Kind regards,
            Campbell

            dneg Dneg (Inactive) added a comment - Hi Alex, Do you need any further information? Kind regards, Campbell

            Hi Alex, yes, sorry, pasted below:

            Nov  9 03:11:25 foxtrot3 kernel: NMI watchdog: BUG: soft lockup - CPU#23 stuck for 22s! [ptlrpcd_01_10:3531]
            Nov  9 03:11:25 foxtrot3 kernel: Modules linked in: rpcsec_gss_krb5 vfat fat mpt3sas mpt2sas raid_class scsi_transport_sas mptctl mptbase nfsv3 nfs fscache mgc(OE) lustre(OE) lmv(OE) mdc(OE) fid(OE) osc(OE) lov(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) dell_rbu libcfs(OE) binfmt_misc bonding iTCO_wdt iTCO_vendor_support dcdbas joydev sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass sg ipmi_si ipmi_devintf ipmi_msghandler acpi_pad wmi acpi_power_meter mei_me mei lpc_ich nfsd auth_rpcgss nfs_acl lockd grace ip_tables xfs sd_mod crc_t10dif crct10dif_generic 8021q garp mrp stp llc mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul crct10dif_common bnx2x crc32_pclmul ahci crc32c_intel scsi_transport_iscsi ghash_clmulni_intel libahci
            Nov  9 03:11:25 foxtrot3 kernel: drm aesni_intel libata lrw gf128mul glue_helper ablk_helper cryptd megaraid_sas drm_panel_orientation_quirks dm_multipath ptp pps_core mdio libcrc32c sunrpc dm_mirror dm_region_hash dm_log dm_mod [last unloaded: usb_storage]
            Nov  9 03:11:25 foxtrot3 kernel: CPU: 23 PID: 3531 Comm: ptlrpcd_01_10 Kdump: loaded Tainted: G           OE  ------------   3.10.0-1160.49.1.el7.x86_64 #1
            Nov  9 03:11:25 foxtrot3 kernel: Hardware name: Dell Inc. PowerEdge R620/0KCKR5, BIOS 2.5.4 01/22/2016
            Nov  9 03:11:25 foxtrot3 kernel: task: ffff9b81766a5280 ti: ffff9b817f080000 task.ti: ffff9b817f080000
            Nov  9 03:11:25 foxtrot3 kernel: RIP: 0010:[<ffffffffa3b17aa2>]  [<ffffffffa3b17aa2>] native_queued_spin_lock_slowpath+0x122/0x200
            Nov  9 03:11:25 foxtrot3 kernel: RSP: 0018:ffff9b817f083ad0  EFLAGS: 00000246
            Nov  9 03:11:25 foxtrot3 kernel: RAX: 0000000000000000 RBX: ffff9b8ab3ab6a00 RCX: 0000000000b90000
            Nov  9 03:11:25 foxtrot3 kernel: RDX: ffff9ba17f15b8c0 RSI: 0000000000590001 RDI: ffff9b95f1d56de4
            Nov  9 03:11:25 foxtrot3 kernel: RBP: ffff9b817f083ad0 R08: ffff9ba17f2db8c0 R09: 0000000000000000
            Nov  9 03:11:25 foxtrot3 kernel: R10: 0000000000000001 R11: ffff9b8ab3ab6a00 R12: ffffffffc09175f8
            Nov  9 03:11:25 foxtrot3 kernel: R13: ffff9b8ab3ab6a00 R14: ffff9b817e897000 R15: ffffffffa3c26900
            Nov  9 03:11:25 foxtrot3 kernel: FS:  0000000000000000(0000) GS:ffff9ba17f2c0000(0000) knlGS:0000000000000000
            Nov  9 03:11:25 foxtrot3 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
            Nov  9 03:11:25 foxtrot3 kernel: CR2: 000055f56a905fb8 CR3: 000000293f08e000 CR4: 00000000000607e0
            Nov  9 03:11:25 foxtrot3 kernel: Call Trace:
            Nov  9 03:11:25 foxtrot3 kernel: [<ffffffffa417dcf3>] queued_spin_lock_slowpath+0xb/0xf
            Nov  9 03:11:25 foxtrot3 kernel: [<ffffffffa418baa0>] _raw_spin_lock+0x20/0x30
            Nov  9 03:11:25 foxtrot3 kernel: [<ffffffffc1041bec>] osc_page_delete+0x1fc/0x500 [osc]
            Nov  9 03:11:25 foxtrot3 kernel: [<ffffffffc0ce1550>] cl_page_delete0+0x80/0x220 [obdclass]
            Nov  9 03:11:25 foxtrot3 kernel: [<ffffffffc0ce1723>] cl_page_delete+0x33/0x110 [obdclass]
            Nov  9 03:11:25 foxtrot3 kernel: [<ffffffffc1041861>] discard_pagevec+0x91/0x130 [osc]
            Nov  9 03:11:25 foxtrot3 kernel: [<ffffffffc104263a>] osc_lru_shrink+0x74a/0x7c0 [osc]
            Nov  9 03:11:25 foxtrot3 kernel: [<ffffffffc104364c>] lru_queue_work+0x4c/0x230 [osc]
            Nov  9 03:11:25 foxtrot3 kernel: [<ffffffffc0eae31a>] work_interpreter+0x3a/0xf0 [ptlrpc]
            Nov  9 03:11:25 foxtrot3 kernel: [<ffffffffc0eab231>] ptlrpc_check_set.part.23+0x481/0x1dd0 [ptlrpc]
            Nov  9 03:11:25 foxtrot3 kernel: [<ffffffffa3ae26ec>] ? set_next_entity+0x3c/0xe0
            Nov  9 03:11:25 foxtrot3 kernel: [<ffffffffc0eacbdb>] ptlrpc_check_set+0x5b/0xe0 [ptlrpc]
            Nov  9 03:11:25 foxtrot3 kernel: [<ffffffffc0ed810b>] ptlrpcd_check+0x4ab/0x590 [ptlrpc]
            Nov  9 03:11:25 foxtrot3 kernel: [<ffffffffc0ed84f0>] ptlrpcd+0x300/0x560 [ptlrpc]
            Nov  9 03:11:25 foxtrot3 kernel: [<ffffffffa3adadf0>] ? wake_up_state+0x20/0x20
            Nov  9 03:11:25 foxtrot3 kernel: [<ffffffffc0ed81f0>] ? ptlrpcd_check+0x590/0x590 [ptlrpc]
            Nov  9 03:11:25 foxtrot3 kernel: [<ffffffffa3ac5e61>] kthread+0xd1/0xe0
            Nov  9 03:11:25 foxtrot3 kernel: [<ffffffffa3ac5d90>] ? insert_kthread_work+0x40/0x40
            Nov  9 03:11:25 foxtrot3 kernel: [<ffffffffa4195df7>] ret_from_fork_nospec_begin+0x21/0x21
            Nov  9 03:11:25 foxtrot3 kernel: [<ffffffffa3ac5d90>] ? insert_kthread_work+0x40/0x40
            Nov  9 03:11:25 foxtrot3 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 c0 b8 01 00 48 03 14 c5 60 15 75 a4 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b
            
            dneg Dneg (Inactive) added a comment - Hi Alex, yes, sorry, pasted below: Nov 9 03:11:25 foxtrot3 kernel: NMI watchdog: BUG: soft lockup - CPU#23 stuck for 22s! [ptlrpcd_01_10:3531] Nov 9 03:11:25 foxtrot3 kernel: Modules linked in: rpcsec_gss_krb5 vfat fat mpt3sas mpt2sas raid_class scsi_transport_sas mptctl mptbase nfsv3 nfs fscache mgc(OE) lustre(OE) lmv(OE) mdc(OE) fid(OE) osc(OE) lov(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) dell_rbu libcfs(OE) binfmt_misc bonding iTCO_wdt iTCO_vendor_support dcdbas joydev sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass sg ipmi_si ipmi_devintf ipmi_msghandler acpi_pad wmi acpi_power_meter mei_me mei lpc_ich nfsd auth_rpcgss nfs_acl lockd grace ip_tables xfs sd_mod crc_t10dif crct10dif_generic 8021q garp mrp stp llc mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul crct10dif_common bnx2x crc32_pclmul ahci crc32c_intel scsi_transport_iscsi ghash_clmulni_intel libahci Nov 9 03:11:25 foxtrot3 kernel: drm aesni_intel libata lrw gf128mul glue_helper ablk_helper cryptd megaraid_sas drm_panel_orientation_quirks dm_multipath ptp pps_core mdio libcrc32c sunrpc dm_mirror dm_region_hash dm_log dm_mod [last unloaded: usb_storage] Nov 9 03:11:25 foxtrot3 kernel: CPU: 23 PID: 3531 Comm: ptlrpcd_01_10 Kdump: loaded Tainted: G OE ------------ 3.10.0-1160.49.1.el7.x86_64 #1 Nov 9 03:11:25 foxtrot3 kernel: Hardware name: Dell Inc. PowerEdge R620/0KCKR5, BIOS 2.5.4 01/22/2016 Nov 9 03:11:25 foxtrot3 kernel: task: ffff9b81766a5280 ti: ffff9b817f080000 task.ti: ffff9b817f080000 Nov 9 03:11:25 foxtrot3 kernel: RIP: 0010:[<ffffffffa3b17aa2>] [<ffffffffa3b17aa2>] native_queued_spin_lock_slowpath+0x122/0x200 Nov 9 03:11:25 foxtrot3 kernel: RSP: 0018:ffff9b817f083ad0 EFLAGS: 00000246 Nov 9 03:11:25 foxtrot3 kernel: RAX: 0000000000000000 RBX: ffff9b8ab3ab6a00 RCX: 0000000000b90000 Nov 9 03:11:25 foxtrot3 kernel: RDX: ffff9ba17f15b8c0 RSI: 0000000000590001 RDI: ffff9b95f1d56de4 Nov 9 03:11:25 foxtrot3 kernel: RBP: ffff9b817f083ad0 R08: ffff9ba17f2db8c0 R09: 0000000000000000 Nov 9 03:11:25 foxtrot3 kernel: R10: 0000000000000001 R11: ffff9b8ab3ab6a00 R12: ffffffffc09175f8 Nov 9 03:11:25 foxtrot3 kernel: R13: ffff9b8ab3ab6a00 R14: ffff9b817e897000 R15: ffffffffa3c26900 Nov 9 03:11:25 foxtrot3 kernel: FS: 0000000000000000(0000) GS:ffff9ba17f2c0000(0000) knlGS:0000000000000000 Nov 9 03:11:25 foxtrot3 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 9 03:11:25 foxtrot3 kernel: CR2: 000055f56a905fb8 CR3: 000000293f08e000 CR4: 00000000000607e0 Nov 9 03:11:25 foxtrot3 kernel: Call Trace: Nov 9 03:11:25 foxtrot3 kernel: [<ffffffffa417dcf3>] queued_spin_lock_slowpath+0xb/0xf Nov 9 03:11:25 foxtrot3 kernel: [<ffffffffa418baa0>] _raw_spin_lock+0x20/0x30 Nov 9 03:11:25 foxtrot3 kernel: [<ffffffffc1041bec>] osc_page_delete+0x1fc/0x500 [osc] Nov 9 03:11:25 foxtrot3 kernel: [<ffffffffc0ce1550>] cl_page_delete0+0x80/0x220 [obdclass] Nov 9 03:11:25 foxtrot3 kernel: [<ffffffffc0ce1723>] cl_page_delete+0x33/0x110 [obdclass] Nov 9 03:11:25 foxtrot3 kernel: [<ffffffffc1041861>] discard_pagevec+0x91/0x130 [osc] Nov 9 03:11:25 foxtrot3 kernel: [<ffffffffc104263a>] osc_lru_shrink+0x74a/0x7c0 [osc] Nov 9 03:11:25 foxtrot3 kernel: [<ffffffffc104364c>] lru_queue_work+0x4c/0x230 [osc] Nov 9 03:11:25 foxtrot3 kernel: [<ffffffffc0eae31a>] work_interpreter+0x3a/0xf0 [ptlrpc] Nov 9 03:11:25 foxtrot3 kernel: [<ffffffffc0eab231>] ptlrpc_check_set.part.23+0x481/0x1dd0 [ptlrpc] Nov 9 03:11:25 foxtrot3 kernel: [<ffffffffa3ae26ec>] ? set_next_entity+0x3c/0xe0 Nov 9 03:11:25 foxtrot3 kernel: [<ffffffffc0eacbdb>] ptlrpc_check_set+0x5b/0xe0 [ptlrpc] Nov 9 03:11:25 foxtrot3 kernel: [<ffffffffc0ed810b>] ptlrpcd_check+0x4ab/0x590 [ptlrpc] Nov 9 03:11:25 foxtrot3 kernel: [<ffffffffc0ed84f0>] ptlrpcd+0x300/0x560 [ptlrpc] Nov 9 03:11:25 foxtrot3 kernel: [<ffffffffa3adadf0>] ? wake_up_state+0x20/0x20 Nov 9 03:11:25 foxtrot3 kernel: [<ffffffffc0ed81f0>] ? ptlrpcd_check+0x590/0x590 [ptlrpc] Nov 9 03:11:25 foxtrot3 kernel: [<ffffffffa3ac5e61>] kthread+0xd1/0xe0 Nov 9 03:11:25 foxtrot3 kernel: [<ffffffffa3ac5d90>] ? insert_kthread_work+0x40/0x40 Nov 9 03:11:25 foxtrot3 kernel: [<ffffffffa4195df7>] ret_from_fork_nospec_begin+0x21/0x21 Nov 9 03:11:25 foxtrot3 kernel: [<ffffffffa3ac5d90>] ? insert_kthread_work+0x40/0x40 Nov 9 03:11:25 foxtrot3 kernel: Code: 13 48 c1 ea 0d 48 98 83 e2 30 48 81 c2 c0 b8 01 00 48 03 14 c5 60 15 75 a4 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 <41> 8b 40 08 85 c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b

            any stack trace following that message?

            bzzz Alex Zhuravlev added a comment - any stack trace following that message?

            People

              bzzz Alex Zhuravlev
              dneg Dneg (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: