Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>
This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/30ffe57e-345c-11e8-b45c-52540065bddc
test_8 failed with the following error:
Timeout occurred after 82 mins, last suite running was replay-single, restarting cluster to continue tests
the following lockup appears in the client console log:
[ 1465.679476] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 24s! [sssd_be:599] [ 1465.680466] Modules linked in: lustre(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel ppdev lrw gf128mul glue_helper ablk_helper cryptd pcspkr joydev i2c_piix4 virtio_balloon parport_pc parport nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix virtio_blk libata crct10dif_pclmul crct10dif_common 8139too crc32c_intel 8139cp mii serio_raw virtio_pci virtio_ring i2c_core virtio floppy [ 1465.680466] CPU: 1 PID: 599 Comm: sssd_be Tainted: G OE ------------ 3.10.0-693.21.1.el7.x86_64 #1 [ 1465.680466] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 [ 1465.680466] task: ffff88003675cf10 ti: ffff88007ab8c000 task.ti: ffff88007ab8c000 [ 1465.680466] RIP: 0010:[<ffffffff811f682c>] [<ffffffff811f682c>] __mem_cgroup_uncharge_common+0x4c/0x2f0 [ 1465.680466] RSP: 0018:ffff88007ab8fa48 EFLAGS: 00010286 [ 1465.680466] RAX: ffff88007c4b9ea0 RBX: ffffea0001cff7c0 RCX: 00000000000739ea [ 1465.680466] RDX: ffff88007ff851c0 RSI: 0000000000000001 RDI: 00000000000739ea [ 1465.680466] RBP: ffff88007ab8fa78 R08: 0000000000000000 R09: 00005652e6738000 [ 1465.680466] R10: 0000000000000001 R11: 0000000000000001 R12: ffffffff810cf05c [ 1465.680466] R13: ffff88007ab8f9d0 R14: ffff88003675cf78 R15: 0000000000020200 [ 1465.680466] FS: 0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 [ 1465.680466] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1465.680466] CR2: 00007ff24c0d6000 CR3: 0000000079eaa000 CR4: 00000000000606e0 [ 1465.680466] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1465.680466] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1465.680466] Call Trace: [ 1465.680466] [<ffffffff811fae4a>] mem_cgroup_uncharge_page+0x2a/0x30 [ 1465.680466] [<ffffffff811c0e89>] page_remove_rmap+0xb9/0x160 [ 1465.680466] [<ffffffff811b3ae5>] unmap_page_range+0x4a5/0x920 [ 1465.680466] [<ffffffff811b3fe1>] unmap_single_vma+0x81/0xf0 [ 1465.680466] [<ffffffff811b4fe9>] unmap_vmas+0x49/0x90 [ 1465.680466] [<ffffffff811bd68c>] exit_mmap+0xac/0x1a0 [ 1465.680466] [<ffffffff81087967>] mmput+0x67/0xf0 [ 1465.680466] [<ffffffff81090e85>] do_exit+0x285/0xa40 [ 1465.680466] [<ffffffff8109f455>] ? complete_signal+0x205/0x250 [ 1465.680466] [<ffffffff810916bf>] do_group_exit+0x3f/0xa0 [ 1465.680466] [<ffffffff810a18de>] get_signal_to_deliver+0x1ce/0x5e0 [ 1465.680466] [<ffffffff8102a457>] do_signal+0x57/0x6c0 [ 1465.680466] [<ffffffff816c0661>] ? system_call_after_swapgs+0xae/0x146 [ 1465.680466] [<ffffffff816c0655>] ? system_call_after_swapgs+0xa2/0x146 [ 1465.680466] [<ffffffff8102ab1f>] do_notify_resume+0x5f/0xb0 [ 1465.680466] [<ffffffff816c0a5d>] int_signal+0x12/0x17 [ 1465.680466] Code: 85 c0 0f 85 7b 01 00 00 48 8b 07 49 89 fc 41 89 f5 41 be 01 00 00 00 f6 c4 40 0f 85 77 01 00 00 4c 89 e7 e8 b7 50 00 00 48 89 c3 <48> 8b 00 a8 02 0f 84 4d 01 00 00 f0 0f ba 2b 00 19 c0 85 c0 0f [ 1465.680466] Kernel panic - not syncing: softlockup: hung tasks [ 1465.680466] CPU: 1 PID: 599 Comm: sssd_be Tainted: G OEL ------------ 3.10.0-693.21.1.el7.x86_64 #1 [ 1465.680466] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 [ 1465.680466] Call Trace: [ 1465.680466] <IRQ> [<ffffffff816ae7c8>] dump_stack+0x19/0x1b [ 1465.680466] [<ffffffff816a8634>] panic+0xe8/0x21f [ 1465.680466] [<ffffffff8102d7cf>] ? show_regs+0x5f/0x210 [ 1465.680466] [<ffffffff811334e1>] watchdog_timer_fn+0x231/0x240 [ 1465.680466] [<ffffffff811332b0>] ? watchdog+0x40/0x40 [ 1465.680466] [<ffffffff810b8196>] __hrtimer_run_queues+0xd6/0x260 [ 1465.680466] [<ffffffff810b872f>] hrtimer_interrupt+0xaf/0x1d0 [ 1465.680466] [<ffffffff8105467b>] local_apic_timer_interrupt+0x3b/0x60 [ 1465.680466] [<ffffffff816c4e73>] smp_apic_timer_interrupt+0x43/0x60 [ 1465.680466] [<ffffffff816c1732>] apic_timer_interrupt+0x162/0x170 [ 1465.680466] <EOI> [<ffffffff811f682c>] ? __mem_cgroup_uncharge_common+0x4c/0x2f0 [ 1465.680466] [<ffffffff811f6829>] ? __mem_cgroup_uncharge_common+0x49/0x2f0 [ 1465.680466] [<ffffffff811fae4a>] mem_cgroup_uncharge_page+0x2a/0x30 [ 1465.680466] [<ffffffff811c0e89>] page_remove_rmap+0xb9/0x160 [ 1465.680466] [<ffffffff811b3ae5>] unmap_page_range+0x4a5/0x920 [ 1465.680466] [<ffffffff811b3fe1>] unmap_single_vma+0x81/0xf0 [ 1465.680466] [<ffffffff811b4fe9>] unmap_vmas+0x49/0x90 [ 1465.680466] [<ffffffff811bd68c>] exit_mmap+0xac/0x1a0 [ 1465.680466] [<ffffffff81087967>] mmput+0x67/0xf0 [ 1465.680466] [<ffffffff81090e85>] do_exit+0x285/0xa40 [ 1465.680466] [<ffffffff8109f455>] ? complete_signal+0x205/0x250 [ 1465.680466] [<ffffffff810916bf>] do_group_exit+0x3f/0xa0 [ 1465.680466] [<ffffffff810a18de>] get_signal_to_deliver+0x1ce/0x5e0 [ 1465.680466] [<ffffffff8102a457>] do_signal+0x57/0x6c0 [ 1465.680466] [<ffffffff816c0661>] ? system_call_after_swapgs+0xae/0x146 [ 1465.680466] [<ffffffff816c0655>] ? system_call_after_swapgs+0xa2/0x146 [ 1465.680466] [<ffffffff8102ab1f>] do_notify_resume+0x5f/0xb0 [ 1465.680466] [<ffffffff816c0a5d>] int_signal+0x12/0x17
It doesn't look very lustre specific.
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
replay-single test_8 - Timeout occurred after 82 mins, last suite running was replay-single, restarting cluster to continue tests