Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4410

sanityn test 40a: BUG: soft lockup - CPU#0 stuck for 67s! [ptlrpcd_0:2892]

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • None
    • Lustre 2.6.0, Lustre 2.4.2, Lustre 2.5.2, Lustre 2.5.3

    • Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/70/ (2.4.2 RC2)
      Distro/Arch: RHEL6.4/x86_64
      FSTYPE=zfs
    • 3
    • 12104

    Description

      sanityn test 40a hung and hit the following failure on one client:

      21:36:52:Lustre: DEBUG MARKER: == sanityn test 40a: pdirops: create vs others ================ 21:34:49 (1387604089)
      21:36:53:BUG: soft lockup - CPU#0 stuck for 67s! [ptlrpcd_0:2892]
      21:36:53:Modules linked in: lustre(U) obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) nfs fscache nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode 8139too 8139cp mii virtio_balloon i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
      21:36:53:CPU 0 
      21:36:53:Modules linked in: lustre(U) obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U)
      21:36:53:BUG: soft lockup - CPU#1 stuck for 67s! [ll_sa_4070:4079]
      21:36:53:Modules linked in: lustre(U) obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) nfs fscache nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode 8139too 8139cp mii virtio_balloon i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
      21:36:53:CPU 1 
      21:36:53:Modules linked in: lustre(U) obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) nfs fscache nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode 8139too 8139cp mii virtio_balloon i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
      21:36:53:
      21:36:53:Pid: 4079, comm: ll_sa_4070 Not tainted 2.6.32-358.23.2.el6.x86_64 #1 Red Hat KVM
      21:36:53:RIP: 0010:[<ffffffff81510aae>]  [<ffffffff81510aae>] _spin_lock+0x1e/0x30
      21:36:53:RSP: 0018:ffff88006c26bda0  EFLAGS: 00000206
      21:36:53:RAX: 0000000000000002 RBX: ffff88006c26bda0 RCX: ffff88007cfd8800
      21:36:54:RDX: 0000000000000000 RSI: ffff88006c25fec0 RDI: ffff88007a737ec0
      21:36:54:RBP: ffffffff8100bb8e R08: ffff88007d860e68 R09: 00000000fffffffe
      21:36:54:R10: 0000000000000000 R11: 0000000000000001 R12: ffff88006c26bd80
      21:36:54:R13: ffff88006d6c9000 R14: 0000000000001000 R15: 0000000000000000
      21:36:54:FS:  00007fb227702700(0000) GS:ffff880002300000(0000) knlGS:0000000000000000
      21:36:54:CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      21:36:54:CR2: 00007f7bbff64000 CR3: 000000006c183000 CR4: 00000000000006e0
      21:36:54:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      21:36:54:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      21:36:54:Process ll_sa_4070 (pid: 4079, threadinfo ffff88006c26a000, task ffff88006bd25500)
      21:36:54:Stack:
      21:36:54: ffff88006c26be10 ffffffffa0abb680 ffff88007a737bf8 ffff88006e9501c8
      21:36:54:<d> 0000000000000000 ffff88007a737b00 ffff88007caa01c0 ffff88006bf57200
      21:36:54:<d> ffff88006c26bdf0 ffff88007a7ba800 ffff88007a7ba970 ffff88007a737e80
      21:36:54:Call Trace:
      21:36:54: [<ffffffffa0abb680>] ? ll_post_statahead+0x50/0xa80 [lustre]
      21:36:55: [<ffffffffa0abf8c8>] ? ll_statahead_thread+0x268/0xfa0 [lustre]
      21:36:55: [<ffffffff81063990>] ? default_wake_function+0x0/0x20
      21:36:55: [<ffffffffa0abf660>] ? ll_statahead_thread+0x0/0xfa0 [lustre]
      21:36:55: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
      21:36:55: [<ffffffffa0abf660>] ? ll_statahead_thread+0x0/0xfa0 [lustre]
      21:36:55: [<ffffffffa0abf660>] ? ll_statahead_thread+0x0/0xfa0 [lustre]
      21:36:55: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      

      Maloo report: https://maloo.whamcloud.com/test_sets/7cca784a-6b4b-11e3-99ba-52540035b04c

      Attachments

        Issue Links

          Activity

            [LU-4410] sanityn test 40a: BUG: soft lockup - CPU#0 stuck for 67s! [ptlrpcd_0:2892]
            yujian Jian Yu added a comment -

            Here is the back-ported patch for Lustre b2_5 branch: http://review.whamcloud.com/13846

            yujian Jian Yu added a comment - Here is the back-ported patch for Lustre b2_5 branch: http://review.whamcloud.com/13846
            yujian Jian Yu added a comment -

            While verifying patch http://review.whamcloud.com/11615 with FSTYPE=zfs on Lustre b2_5 branch, lustre-rsync-test hit the same failure:
            https://testing.hpdd.intel.com/test_sets/8cc59532-30ac-11e4-9e60-5254006e85c2

            yujian Jian Yu added a comment - While verifying patch http://review.whamcloud.com/11615 with FSTYPE=zfs on Lustre b2_5 branch, lustre-rsync-test hit the same failure: https://testing.hpdd.intel.com/test_sets/8cc59532-30ac-11e4-9e60-5254006e85c2
            yong.fan nasf (Inactive) added a comment - Another failure instance: https://testing.hpdd.intel.com/test_sessions/6d60501e-1dbb-11e4-8fe8-5254006e85c2
            liwei Li Wei (Inactive) added a comment - lustre-rsync-test 6, master, zfs, single MDT: https://testing.hpdd.intel.com/test_sets/67240e86-1cf6-11e4-9a83-5254006e85c2

            Another lustre-rsync-test test_6 on master branch review-dne-part-1:
            https://testing.hpdd.intel.com/test_sets/5f96c5ec-094a-11e4-b76d-5254006e85c2

            utopiabound Nathaniel Clark added a comment - Another lustre-rsync-test test_6 on master branch review-dne-part-1: https://testing.hpdd.intel.com/test_sets/5f96c5ec-094a-11e4-b76d-5254006e85c2
            yujian Jian Yu added a comment -

            Another sanityn test 40a failure instance on Lustre b2_5 branch:
            https://maloo.whamcloud.com/test_sets/17d0d2b0-f5fa-11e3-9d30-52540035b04c

            yujian Jian Yu added a comment - Another sanityn test 40a failure instance on Lustre b2_5 branch: https://maloo.whamcloud.com/test_sets/17d0d2b0-f5fa-11e3-9d30-52540035b04c
            yujian Jian Yu added a comment -

            Here is the patch back-ported to Lustre b2_5 branch: http://review.whamcloud.com/10674

            The patch was reverted from Lustre b2_5 branch because we need wait until master version is fully ready.

            yujian Jian Yu added a comment - Here is the patch back-ported to Lustre b2_5 branch: http://review.whamcloud.com/10674 The patch was reverted from Lustre b2_5 branch because we need wait until master version is fully ready.
            di.wang Di Wang added a comment -

            Hmm, I saw similar problem when I run my patch http://review.whamcloud.com/#/c/10622/ on master. https://maloo.whamcloud.com/test_sets/196f5da8-f2d5-11e3-b88b-52540035b04c

            Is this also needed on master?

            di.wang Di Wang added a comment - Hmm, I saw similar problem when I run my patch http://review.whamcloud.com/#/c/10622/ on master. https://maloo.whamcloud.com/test_sets/196f5da8-f2d5-11e3-b88b-52540035b04c Is this also needed on master?
            yujian Jian Yu added a comment -

            Here is the patch back-ported to Lustre b2_5 branch: http://review.whamcloud.com/10674

            yujian Jian Yu added a comment - Here is the patch back-ported to Lustre b2_5 branch: http://review.whamcloud.com/10674
            yujian Jian Yu added a comment -

            Would you please to try the patch?

            Sure, I'll do this. Thank you!

            yujian Jian Yu added a comment - Would you please to try the patch? Sure, I'll do this. Thank you!

            People

              wc-triage WC Triage
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: