Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/96df30ce-dccc-11e4-a6e6-5254006e85c2.

      from evidence in console log it looks like the client node panic'ed and even took a crash dump:

      17:25:50:Saving to remote location onyx-4.onyx.hpdd.intel.com:/export/scratch/dumps
      17:25:50:Saving vmcore-dmesg.txt
      17:25:50:Saved vmcore-dmesg.txt

      This seems like a one off and possibly a TEI issue. Other test runs of patches depending on the one in this test run completed without any problems.

      The sub-test test_120g failed with the following error:

      test failed to respond and timed out
      

      Please provide additional information about the failure here.

      Info required for matching: sanity 120g

      Attachments

        Issue Links

          Activity

            [LU-6439] sanity test_120g: panic on client
            pjones Peter Jones added a comment -

            Landed for 2.11

            pjones Peter Jones added a comment - Landed for 2.11
            laisiyao Lai Siyao added a comment -

            this looks to be a dup of LU-3270, there is a backport patch for 2.5 http://review.whamcloud.com/#/c/12901/, which should be able to fix this.

            laisiyao Lai Siyao added a comment - this looks to be a dup of LU-3270 , there is a backport patch for 2.5 http://review.whamcloud.com/#/c/12901/ , which should be able to fix this.

            found the saved crash dump. in the vmcore-dmesg.txt I see the following:

            <0>BUG: soft lockup - CPU#0 stuck for 67s! [ptlrpcd_0:2379]
            <4>Modules linked in: ext2 lustre(U) obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) ksocklnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) nfs fscache nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs autofs4 ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
            <4>CPU 0
            <4>Modules linked in: ext2 lustre(U) obdecho(U) mgc(U) lov(U) osc(U) mdc(U)
            <0>BUG: soft lockup - CPU#1 stuck for 67s! [ll_sa_28008:28009]
            <4>Modules linked in: ext2 lustre(U) obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) ksocklnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) nfs fscache nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs autofs4 ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
            <4>CPU 1
            <4>Modules linked in: ext2 lustre(U) obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) ksocklnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) nfs fscache nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs autofs4 ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
            <4>
            <4>Pid: 28009, comm: ll_sa_28008 Not tainted 2.6.32-431.29.2.el6.x86_64 #1 Red Hat KVM
            <4>RIP: 0010:[<ffffffff8152b84e>] [<ffffffff8152b84e>] _spin_lock+0x1e/0x30
            <4>RSP: 0018:ffff88006f26fd30 EFLAGS: 00000206
            <4>RAX: 0000000000000001 RBX: ffff88006f26fd30 RCX: 0000000000000003
            <4>RDX: 0000000000000000 RSI: 000000001082ebea RDI: ffff88006a1e2440
            <4>RBP: ffffffff8100bb8e R08: 0000000031353433 R09: 0000000000000000
            <4>R10: ffff880067eb68c0 R11: 0000000000000080 R12: 0000000000000000
            <4>R13: 0000000000000eef R14: 0000000200001b71 R15: 0000000000000000
            <4>FS: 0000000000000000(0000) GS:ffff880002300000(0000) knlGS:0000000000000000
            <4>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
            <4>CR2: 0000000001bb80b8 CR3: 000000007d793000 CR4: 00000000000006e0
            <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
            <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
            <4>Process ll_sa_28008 (pid: 28009, threadinfo ffff88006f26e000, task ffff88007abec080)
            <4>Stack:
            <4> ffff88006f26fdc0 ffffffffa0b6a275 0000000000000005 0000000000000080
            <4><d> 0000000000001a77 ffff88006cb14080 ffff88006a1e2178 ffff88005fc241c8
            <4><d> 0000000000000000 0000000000000000 ffff88007a38ab00 ffff88006b218200
            <4>Call Trace:
            <4> [<ffffffffa0b6a275>] ? ll_statahead_one+0x295/0xdc0 [lustre]
            <4> [<ffffffffa0b6b11b>] ? ll_statahead_thread+0x37b/0xfb0 [lustre]
            <4> [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
            <4> [<ffffffffa0b6ada0>] ? ll_statahead_thread+0x0/0xfb0 [lustre]
            <4> [<ffffffff8109abf6>] ? kthread+0x96/0xa0
            <4> [<ffffffff8100c20a>] ? child_rip+0xa/0x20
            <4> [<ffffffff8109ab60>] ? kthread+0x0/0xa0
            <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
            <4>Code: 00 00 00 01 74 05 e8 92 3a d6 ff c9 c3 55 48 89 e5 0f 1f 44 00 00 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 0e f3 90 <0f> 1f 44 00 00 83 3f 00 75 f4 eb df c9 c3 0f 1f 40 00 55 48 89
            <4>Call Trace:
            <4> [<ffffffffa0b6a275>] ? ll_statahead_one+0x295/0xdc0 [lustre]
            <4> [<ffffffffa0b6b11b>] ? ll_statahead_thread+0x37b/0xfb0 [lustre]
            <4> [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
            <4> [<ffffffffa0b6ada0>] ? ll_statahead_thread+0x0/0xfb0 [lustre]
            <4> [<ffffffff8109abf6>] ? kthread+0x96/0xa0
            <4> [<ffffffff8100c20a>] ? child_rip+0xa/0x20
            <4> [<ffffffff8109ab60>] ? kthread+0x0/0xa0
            <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
            <0>Kernel panic - not syncing: softlockup: hung tasks
            <4>Pid: 28009, comm: ll_sa_28008 Not tainted 2.6.32-431.29.2.el6.x86_64 #1
            <4>Call Trace:
            <4> <IRQ> [<ffffffff8152873c>] ? panic+0xa7/0x16f
            <4> [<ffffffff810e6200>] ? watchdog_timer_fn+0x0/0x1e0
            <4> [<ffffffff810e63ca>] ? watchdog_timer_fn+0x1ca/0x1e0
            <4> [<ffffffff8109f6be>] ? __run_hrtimer+0x8e/0x1a0
            <4> [<ffffffff810a6a9f>] ? ktime_get_update_offsets+0x4f/0xd0
            <4> [<ffffffff8109fa26>] ? hrtimer_interrupt+0xe6/0x260
            <4> [<ffffffff81031f1d>] ? local_apic_timer_interrupt+0x3d/0x70
            <4> [<ffffffff815325e5>] ? smp_apic_timer_interrupt+0x45/0x60
            <4> [<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20
            <4> <EOI> [<ffffffff8152b84e>] ? _spin_lock+0x1e/0x30
            <4> [<ffffffffa0b6a275>] ? ll_statahead_one+0x295/0xdc0 [lustre]
            <4> [<ffffffffa0b6b11b>] ? ll_statahead_thread+0x37b/0xfb0 [lustre]
            <4> [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
            <4> [<ffffffffa0b6ada0>] ? ll_statahead_thread+0x0/0xfb0 [lustre]
            <4> [<ffffffff8109abf6>] ? kthread+0x96/0xa0
            <4> [<ffffffff8100c20a>] ? child_rip+0xa/0x20
            <4> [<ffffffff8109ab60>] ? kthread+0x0/0xa0
            <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20

            so it looks like a statahead issue.

            dup of LU-4410?

            bogl Bob Glossman (Inactive) added a comment - found the saved crash dump. in the vmcore-dmesg.txt I see the following: <0>BUG: soft lockup - CPU#0 stuck for 67s! [ptlrpcd_0:2379] <4>Modules linked in: ext2 lustre(U) obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) ksocklnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) nfs fscache nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs autofs4 ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] <4>CPU 0 <4>Modules linked in: ext2 lustre(U) obdecho(U) mgc(U) lov(U) osc(U) mdc(U) <0>BUG: soft lockup - CPU#1 stuck for 67s! [ll_sa_28008:28009] <4>Modules linked in: ext2 lustre(U) obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) ksocklnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) nfs fscache nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs autofs4 ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] <4>CPU 1 <4>Modules linked in: ext2 lustre(U) obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) ksocklnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) nfs fscache nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs autofs4 ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] <4> <4>Pid: 28009, comm: ll_sa_28008 Not tainted 2.6.32-431.29.2.el6.x86_64 #1 Red Hat KVM <4>RIP: 0010: [<ffffffff8152b84e>] [<ffffffff8152b84e>] _spin_lock+0x1e/0x30 <4>RSP: 0018:ffff88006f26fd30 EFLAGS: 00000206 <4>RAX: 0000000000000001 RBX: ffff88006f26fd30 RCX: 0000000000000003 <4>RDX: 0000000000000000 RSI: 000000001082ebea RDI: ffff88006a1e2440 <4>RBP: ffffffff8100bb8e R08: 0000000031353433 R09: 0000000000000000 <4>R10: ffff880067eb68c0 R11: 0000000000000080 R12: 0000000000000000 <4>R13: 0000000000000eef R14: 0000000200001b71 R15: 0000000000000000 <4>FS: 0000000000000000(0000) GS:ffff880002300000(0000) knlGS:0000000000000000 <4>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b <4>CR2: 0000000001bb80b8 CR3: 000000007d793000 CR4: 00000000000006e0 <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>Process ll_sa_28008 (pid: 28009, threadinfo ffff88006f26e000, task ffff88007abec080) <4>Stack: <4> ffff88006f26fdc0 ffffffffa0b6a275 0000000000000005 0000000000000080 <4><d> 0000000000001a77 ffff88006cb14080 ffff88006a1e2178 ffff88005fc241c8 <4><d> 0000000000000000 0000000000000000 ffff88007a38ab00 ffff88006b218200 <4>Call Trace: <4> [<ffffffffa0b6a275>] ? ll_statahead_one+0x295/0xdc0 [lustre] <4> [<ffffffffa0b6b11b>] ? ll_statahead_thread+0x37b/0xfb0 [lustre] <4> [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 <4> [<ffffffffa0b6ada0>] ? ll_statahead_thread+0x0/0xfb0 [lustre] <4> [<ffffffff8109abf6>] ? kthread+0x96/0xa0 <4> [<ffffffff8100c20a>] ? child_rip+0xa/0x20 <4> [<ffffffff8109ab60>] ? kthread+0x0/0xa0 <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20 <4>Code: 00 00 00 01 74 05 e8 92 3a d6 ff c9 c3 55 48 89 e5 0f 1f 44 00 00 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 0e f3 90 <0f> 1f 44 00 00 83 3f 00 75 f4 eb df c9 c3 0f 1f 40 00 55 48 89 <4>Call Trace: <4> [<ffffffffa0b6a275>] ? ll_statahead_one+0x295/0xdc0 [lustre] <4> [<ffffffffa0b6b11b>] ? ll_statahead_thread+0x37b/0xfb0 [lustre] <4> [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 <4> [<ffffffffa0b6ada0>] ? ll_statahead_thread+0x0/0xfb0 [lustre] <4> [<ffffffff8109abf6>] ? kthread+0x96/0xa0 <4> [<ffffffff8100c20a>] ? child_rip+0xa/0x20 <4> [<ffffffff8109ab60>] ? kthread+0x0/0xa0 <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20 <0>Kernel panic - not syncing: softlockup: hung tasks <4>Pid: 28009, comm: ll_sa_28008 Not tainted 2.6.32-431.29.2.el6.x86_64 #1 <4>Call Trace: <4> <IRQ> [<ffffffff8152873c>] ? panic+0xa7/0x16f <4> [<ffffffff810e6200>] ? watchdog_timer_fn+0x0/0x1e0 <4> [<ffffffff810e63ca>] ? watchdog_timer_fn+0x1ca/0x1e0 <4> [<ffffffff8109f6be>] ? __run_hrtimer+0x8e/0x1a0 <4> [<ffffffff810a6a9f>] ? ktime_get_update_offsets+0x4f/0xd0 <4> [<ffffffff8109fa26>] ? hrtimer_interrupt+0xe6/0x260 <4> [<ffffffff81031f1d>] ? local_apic_timer_interrupt+0x3d/0x70 <4> [<ffffffff815325e5>] ? smp_apic_timer_interrupt+0x45/0x60 <4> [<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20 <4> <EOI> [<ffffffff8152b84e>] ? _spin_lock+0x1e/0x30 <4> [<ffffffffa0b6a275>] ? ll_statahead_one+0x295/0xdc0 [lustre] <4> [<ffffffffa0b6b11b>] ? ll_statahead_thread+0x37b/0xfb0 [lustre] <4> [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 <4> [<ffffffffa0b6ada0>] ? ll_statahead_thread+0x0/0xfb0 [lustre] <4> [<ffffffff8109abf6>] ? kthread+0x96/0xa0 <4> [<ffffffff8100c20a>] ? child_rip+0xa/0x20 <4> [<ffffffff8109ab60>] ? kthread+0x0/0xa0 <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20 so it looks like a statahead issue. dup of LU-4410 ?

            People

              laisiyao Lai Siyao
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: