Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8680

replay-single test_20b: BUG: soft lockup - osc_makes_rpc()

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.9.0
    • Lustre 2.9.0
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Niu Yawei <yawei.niu@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/a265b3dc-8b49-11e6-ad53-5254006e85c2.

      The sub-test test_20b failed with the following error:

      test failed to respond and timed out
      

      Please provide additional information about the failure here.

      Info required for matching: replay-single 20b

      Attachments

        Activity

          [LU-8680] replay-single test_20b: BUG: soft lockup - osc_makes_rpc()

          Bobi Jam (bobijam@hotmail.com) uploaded a new patch: http://review.whamcloud.com/23326
          Subject: LU-8680 osc: soft lock - osc_makes_rpc()
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 4b891b4752273c0852bec39e188a6aecfec800de

          gerrit Gerrit Updater added a comment - Bobi Jam (bobijam@hotmail.com) uploaded a new patch: http://review.whamcloud.com/23326 Subject: LU-8680 osc: soft lock - osc_makes_rpc() Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 4b891b4752273c0852bec39e188a6aecfec800de
          jay Jinshan Xiong (Inactive) added a comment - - edited

          Indeed. It occurred to me why I would like to have

          osc_max_write_chunks()
          {
                  return PTLRPC_MAX_BRW_SIZE >> cli->cl_chunkbits;
          }
          

          instead of the current implementation.

          jay Jinshan Xiong (Inactive) added a comment - - edited Indeed. It occurred to me why I would like to have osc_max_write_chunks() { return PTLRPC_MAX_BRW_SIZE >> cli->cl_chunkbits; } instead of the current implementation.

          One thing I can't see from the patch of LU-8135 is that: when will the large extent (exceeding chunk limitation) being flushed back?

          niu Niu Yawei (Inactive) added a comment - One thing I can't see from the patch of LU-8135 is that: when will the large extent (exceeding chunk limitation) being flushed back?

          This could be a regression caused by LU-8135, that patch limited chunk number in a write RPC, so for a extent with large number of chunk, osc_check_rpcs() will run into a loop and never break.

          see osc_check_rpcs()

                          if (osc_makes_rpc(cli, osc, OBD_BRW_WRITE)) {
                                  rc = osc_send_write_rpc(env, cli, osc);
                                  if (rc < 0) {
                                          CERROR("Write request failed with %d\n", rc);
          
                                          /* osc_send_write_rpc failed, mostly because of
                                           * memory pressure.
                                           *
                                           * It can't break here, because if:
                                           *  - a page was submitted by osc_io_submit, so
                                           *    page locked;
                                           *  - no request in flight
                                           *  - no subsequent request
                                           * The system will be in live-lock state,
                                           * because there is no chance to call
                                           * osc_io_unplug() and osc_check_rpcs() any
                                           * more. pdflush can't help in this case,
                                           * because it might be blocked at grabbing
                                           * the page lock as we mentioned.
                                           *
                                           * Anyway, continue to drain pages. */
                                          /* break; */
                                  }
                          }
          

          With the fix of LU-8135, osc_send_write_rpc() will do nothing when the extent is too large, and osc_check_rpcs() won't break loop but continue to try on the same object.

          niu Niu Yawei (Inactive) added a comment - This could be a regression caused by LU-8135 , that patch limited chunk number in a write RPC, so for a extent with large number of chunk, osc_check_rpcs() will run into a loop and never break. see osc_check_rpcs() if (osc_makes_rpc(cli, osc, OBD_BRW_WRITE)) { rc = osc_send_write_rpc(env, cli, osc); if (rc < 0) { CERROR( "Write request failed with %d\n" , rc); /* osc_send_write_rpc failed, mostly because of * memory pressure. * * It can't break here, because if : * - a page was submitted by osc_io_submit, so * page locked; * - no request in flight * - no subsequent request * The system will be in live-lock state, * because there is no chance to call * osc_io_unplug() and osc_check_rpcs() any * more. pdflush can't help in this case , * because it might be blocked at grabbing * the page lock as we mentioned. * * Anyway, continue to drain pages. */ /* break ; */ } } With the fix of LU-8135 , osc_send_write_rpc() will do nothing when the extent is too large, and osc_check_rpcs() won't break loop but continue to try on the same object.

          Hi Bobijam,

          Could you please look into this issue?

          Thanks.
          Joe

          jgmitter Joseph Gmitter (Inactive) added a comment - Hi Bobijam, Could you please look into this issue? Thanks. Joe

          Looks all these failures happened in interop testing, Sarah, did you ever observe such failure during interop testing?

          niu Niu Yawei (Inactive) added a comment - Looks all these failures happened in interop testing, Sarah, did you ever observe such failure during interop testing?

          I searched maloo, and seems it appeared since around Sep 24 (not sure why some results don't have console logs, so it's hard to determine the exact date when it appeared first time), and happens quite often.

          niu Niu Yawei (Inactive) added a comment - I searched maloo, and seems it appeared since around Sep 24 (not sure why some results don't have console logs, so it's hard to determine the exact date when it appeared first time), and happens quite often.

          not to my knowledge.

          jay Jinshan Xiong (Inactive) added a comment - not to my knowledge.

          Xiong, is there an open issue for it?

          niu Niu Yawei (Inactive) added a comment - Xiong, is there an open issue for it?

          On client console:

          11:49:29:[ 1435.319733] Lustre: DEBUG MARKER: == replay-single test 20b: write, unlink, eviction, replay, (test mds_cleanup_orphans) =============== 11:48:55 (1475693335)
          11:49:29:[ 1437.189973] LustreError: 167-0: lustre-MDT0000-mdc-ffff880078c87000: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
          11:49:29:[ 1437.194981] LustreError: 30004:0:(file.c:3403:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200000007:0x1:0x0] error: rc = -5
          11:49:29:[ 1460.070003] BUG: soft lockup - CPU#1 stuck for 22s! [ptlrpcd_00_00:6005]
          11:49:29:[ 1460.070003] Modules linked in: lustre(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) sha512_generic crypto_null libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache xprtrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic crct10dif_common ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ppdev pcspkr virtio_balloon i2c_piix4 parport_pc parport nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi cirrus syscopyarea sysfillrect sysimgblt drm_kms_helper virtio_blk ttm ata_piix drm serio_raw 8139too libata virtio_pci virtio_ring 8139cp virtio mii i2c_core floppy
          11:49:29:[ 1460.070003] CPU: 1 PID: 6005 Comm: ptlrpcd_00_00 Tainted: G           OE  ------------   3.10.0-327.36.1.el7.x86_64 #1
          11:49:29:[ 1460.070003] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
          11:49:29:[ 1460.070003] task: ffff880036a7f300 ti: ffff880079148000 task.ti: ffff880079148000
          11:49:29:[ 1460.070003] RIP: 0010:[<ffffffffa0ae9315>]  [<ffffffffa0ae9315>] osc_makes_rpc+0x45/0x5f0 [osc]
          11:49:29:[ 1460.070003] RSP: 0018:ffff88007914bac0  EFLAGS: 00000202
          11:49:29:[ 1460.070003] RAX: 0000000000000004 RBX: ffff88007914be58 RCX: 0000000000000004
          11:49:29:[ 1460.070003] RDX: 0000000000000001 RSI: ffff88007b0dd4c0 RDI: ffff880079dc50e0
          11:49:29:[ 1460.070003] RBP: ffff88007914bad8 R08: ffff88007b0dd510 R09: 000000000000045b
          11:49:29:[ 1460.070003] R10: 0000000000010000 R11: 000000000000000b R12: ffffffffa05cbde2
          11:49:29:[ 1460.070003] R13: ffff88007914ba58 R14: ffff88007914ba88 R15: 0000000000000013
          11:49:29:[ 1460.070003] FS:  0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
          11:49:29:[ 1460.070003] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
          11:49:29:[ 1460.070003] CR2: 00007f20994b0670 CR3: 000000000194a000 CR4: 00000000000006e0
          11:49:29:[ 1460.070003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
          11:49:29:[ 1460.070003] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
          11:49:29:[ 1460.070003] Stack:
          11:49:29:[ 1460.070003]  ffff88007b0dd568 ffff88007a46ed20 ffff880079dc51c8 ffff88007914bc20
          11:49:29:[ 1460.070003]  ffffffffa0aeaa4d 00000000000003f5 0000000100000000 0000000100000000
          11:49:29:[ 1460.070003]  0000000100000000 0000000000000000 0000000054d8e01c ffff88007c0b1800
          11:49:29:[ 1460.070003] Call Trace:
          11:49:29:[ 1460.070003]  [<ffffffffa0aeaa4d>] osc_check_rpcs+0x8d/0x18b0 [osc]
          11:49:29:[ 1460.070003]  [<ffffffff810c5618>] ? load_balance+0x218/0x890
          11:49:29:[ 1460.070003]  [<ffffffff810bb7d8>] ? sched_clock_cpu+0x98/0xc0
          11:49:29:[ 1460.070003]  [<ffffffff8101cd99>] ? sched_clock+0x9/0x10
          11:49:29:[ 1460.070003]  [<ffffffffa0aec352>] osc_io_unplug0+0xe2/0x130 [osc]
          11:49:29:[ 1460.070003]  [<ffffffffa0aecf50>] osc_io_unplug+0x10/0x20 [osc]
          11:49:29:[ 1460.070003]  [<ffffffffa0ac9441>] brw_queue_work+0x31/0xd0 [osc]
          11:49:29:[ 1460.070003]  [<ffffffffa090f0d7>] work_interpreter+0x37/0xf0 [ptlrpc]
          11:49:29:[ 1460.070003]  [<ffffffffa090bdb5>] ptlrpc_check_set.part.23+0x425/0x1dd0 [ptlrpc]
          11:49:29:[ 1460.070003]  [<ffffffffa090d7bb>] ptlrpc_check_set+0x5b/0xe0 [ptlrpc]
          11:49:29:[ 1460.070003]  [<ffffffffa09382bb>] ptlrpcd_check+0x4eb/0x5e0 [ptlrpc]
          11:49:29:[ 1460.070003]  [<ffffffffa093866b>] ptlrpcd+0x2bb/0x560 [ptlrpc]
          11:49:29:[ 1460.070003]  [<ffffffff810b8940>] ? wake_up_state+0x20/0x20
          11:49:29:[ 1460.070003]  [<ffffffffa09383b0>] ? ptlrpcd_check+0x5e0/0x5e0 [ptlrpc]
          11:49:29:[ 1460.070003]  [<ffffffff810a5b8f>] kthread+0xcf/0xe0
          11:49:29:[ 1460.070003]  [<ffffffff810a5ac0>] ? kthread_create_on_node+0x140/0x140
          11:49:29:[ 1460.070003]  [<ffffffff81646958>] ret_from_fork+0x58/0x90
          11:49:29:[ 1460.070003]  [<ffffffff810a5ac0>] ? kthread_create_on_node+0x140/0x140
          11:49:29:[ 1460.070003] Code: af ff 01 74 0d f6 05 2a a8 af ff 08 0f 85 44 01 00 00 49 8b 44 24 48 48 85 c0 0f 84 a6 01 00 00 0f b6 80 b4 02 00 00 d0 e8 89 c1 <83> e1 01 83 e2 02 74 3b 8b 83 dc 00 00 00 85 c0 0f 85 b5 00 00 
          11:49:29:[ 1460.070003] Kernel panic - not syncing: softlockup: hung tasks
          11:49:29:[ 1460.070003] CPU: 1 PID: 6005 Comm: ptlrpcd_00_00 Tainted: G           OEL ------------   3.10.0-327.36.1.el7.x86_64 #1
          11:49:29:[ 1460.070003] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
          11:49:29:[ 1460.070003]  ffffffff818730cf 0000000054d8e01c ffff88007fd03e18 ffffffff81636301
          11:49:29:[ 1460.070003]  ffff88007fd03e98 ffffffff8162fb90 0000000000000008 ffff88007fd03ea8
          11:49:29:[ 1460.070003]  ffff88007fd03e48 0000000054d8e01c ffff88007fd03e67 0000000000000000
          11:49:29:[ 1460.070003] Call Trace:
          11:49:29:[ 1460.070003]  <IRQ>  [<ffffffff81636301>] dump_stack+0x19/0x1b
          11:49:29:[ 1460.070003]  [<ffffffff8162fb90>] panic+0xd8/0x1e7
          11:49:29:[ 1460.070003]  [<ffffffff8111b846>] watchdog_timer_fn+0x1b6/0x1c0
          11:49:29:[ 1460.070003]  [<ffffffff8111b690>] ? watchdog_enable+0xc0/0xc0
          11:49:29:[ 1460.070003]  [<ffffffff810a9db2>] __hrtimer_run_queues+0xd2/0x260
          11:49:29:[ 1460.070003]  [<ffffffff810aa350>] hrtimer_interrupt+0xb0/0x1e0
          11:49:29:[ 1460.070003]  [<ffffffff8164831c>] ? call_softirq+0x1c/0x30
          11:49:29:[ 1460.070003]  [<ffffffff810495c7>] local_apic_timer_interrupt+0x37/0x60
          11:49:29:[ 1460.070003]  [<ffffffff81648f8f>] smp_apic_timer_interrupt+0x3f/0x60
          11:49:29:[ 1460.070003]  [<ffffffff8164765d>] apic_timer_interrupt+0x6d/0x80
          11:49:29:[ 1460.070003]  <EOI>  [<ffffffffa0ae9315>] ? osc_makes_rpc+0x45/0x5f0 [osc]
          11:49:29:[ 1460.070003]  [<ffffffffa0aeaa4d>] osc_check_rpcs+0x8d/0x18b0 [osc]
          11:49:29:[ 1460.070003]  [<ffffffff810c5618>] ? load_balance+0x218/0x890
          11:49:29:[ 1460.070003]  [<ffffffff810bb7d8>] ? sched_clock_cpu+0x98/0xc0
          11:49:29:[ 1460.070003]  [<ffffffff8101cd99>] ? sched_clock+0x9/0x10
          11:49:29:[ 1460.070003]  [<ffffffffa0aec352>] osc_io_unplug0+0xe2/0x130 [osc]
          11:49:29:[ 1460.070003]  [<ffffffffa0aecf50>] osc_io_unplug+0x10/0x20 [osc]
          11:49:29:[ 1460.070003]  [<ffffffffa0ac9441>] brw_queue_work+0x31/0xd0 [osc]
          11:49:29:[ 1460.070003]  [<ffffffffa090f0d7>] work_interpreter+0x37/0xf0 [ptlrpc]
          11:49:29:[ 1460.070003]  [<ffffffffa090bdb5>] ptlrpc_check_set.part.23+0x425/0x1dd0 [ptlrpc]
          11:49:29:[ 1460.070003]  [<ffffffffa090d7bb>] ptlrpc_check_set+0x5b/0xe0 [ptlrpc]
          11:49:29:[ 1460.070003]  [<ffffffffa09382bb>] ptlrpcd_check+0x4eb/0x5e0 [ptlrpc]
          11:49:29:[ 1460.070003]  [<ffffffffa093866b>] ptlrpcd+0x2bb/0x560 [ptlrpc]
          11:49:29:[ 1460.070003]  [<ffffffff810b8940>] ? wake_up_state+0x20/0x20
          11:49:29:[ 1460.070003]  [<ffffffffa09383b0>] ? ptlrpcd_check+0x5e0/0x5e0 [ptlrpc]
          11:49:29:[ 1460.070003]  [<ffffffff810a5b8f>] kthread+0xcf/0xe0
          11:49:29:[ 1460.070003]  [<ffffffff810a5ac0>] ? kthread_create_on_node+0x140/0x140
          11:49:29:[ 1460.070003]  [<ffffffff81646958>] ret_from_fork+0x58/0x90
          11:49:29:[ 1460.070003]  [<ffffffff810a5ac0>] ? kthread_create_on_node+0x140/0x140
          
          niu Niu Yawei (Inactive) added a comment - On client console: 11:49:29:[ 1435.319733] Lustre: DEBUG MARKER: == replay-single test 20b: write, unlink, eviction, replay, (test mds_cleanup_orphans) =============== 11:48:55 (1475693335) 11:49:29:[ 1437.189973] LustreError: 167-0: lustre-MDT0000-mdc-ffff880078c87000: This client was evicted by lustre-MDT0000; in progress operations using this service will fail. 11:49:29:[ 1437.194981] LustreError: 30004:0:(file.c:3403:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200000007:0x1:0x0] error: rc = -5 11:49:29:[ 1460.070003] BUG: soft lockup - CPU#1 stuck for 22s! [ptlrpcd_00_00:6005] 11:49:29:[ 1460.070003] Modules linked in: lustre(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) sha512_generic crypto_null libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache xprtrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic crct10dif_common ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ppdev pcspkr virtio_balloon i2c_piix4 parport_pc parport nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi cirrus syscopyarea sysfillrect sysimgblt drm_kms_helper virtio_blk ttm ata_piix drm serio_raw 8139too libata virtio_pci virtio_ring 8139cp virtio mii i2c_core floppy 11:49:29:[ 1460.070003] CPU: 1 PID: 6005 Comm: ptlrpcd_00_00 Tainted: G OE ------------ 3.10.0-327.36.1.el7.x86_64 #1 11:49:29:[ 1460.070003] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 11:49:29:[ 1460.070003] task: ffff880036a7f300 ti: ffff880079148000 task.ti: ffff880079148000 11:49:29:[ 1460.070003] RIP: 0010:[<ffffffffa0ae9315>] [<ffffffffa0ae9315>] osc_makes_rpc+0x45/0x5f0 [osc] 11:49:29:[ 1460.070003] RSP: 0018:ffff88007914bac0 EFLAGS: 00000202 11:49:29:[ 1460.070003] RAX: 0000000000000004 RBX: ffff88007914be58 RCX: 0000000000000004 11:49:29:[ 1460.070003] RDX: 0000000000000001 RSI: ffff88007b0dd4c0 RDI: ffff880079dc50e0 11:49:29:[ 1460.070003] RBP: ffff88007914bad8 R08: ffff88007b0dd510 R09: 000000000000045b 11:49:29:[ 1460.070003] R10: 0000000000010000 R11: 000000000000000b R12: ffffffffa05cbde2 11:49:29:[ 1460.070003] R13: ffff88007914ba58 R14: ffff88007914ba88 R15: 0000000000000013 11:49:29:[ 1460.070003] FS: 0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 11:49:29:[ 1460.070003] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b 11:49:29:[ 1460.070003] CR2: 00007f20994b0670 CR3: 000000000194a000 CR4: 00000000000006e0 11:49:29:[ 1460.070003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 11:49:29:[ 1460.070003] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 11:49:29:[ 1460.070003] Stack: 11:49:29:[ 1460.070003] ffff88007b0dd568 ffff88007a46ed20 ffff880079dc51c8 ffff88007914bc20 11:49:29:[ 1460.070003] ffffffffa0aeaa4d 00000000000003f5 0000000100000000 0000000100000000 11:49:29:[ 1460.070003] 0000000100000000 0000000000000000 0000000054d8e01c ffff88007c0b1800 11:49:29:[ 1460.070003] Call Trace: 11:49:29:[ 1460.070003] [<ffffffffa0aeaa4d>] osc_check_rpcs+0x8d/0x18b0 [osc] 11:49:29:[ 1460.070003] [<ffffffff810c5618>] ? load_balance+0x218/0x890 11:49:29:[ 1460.070003] [<ffffffff810bb7d8>] ? sched_clock_cpu+0x98/0xc0 11:49:29:[ 1460.070003] [<ffffffff8101cd99>] ? sched_clock+0x9/0x10 11:49:29:[ 1460.070003] [<ffffffffa0aec352>] osc_io_unplug0+0xe2/0x130 [osc] 11:49:29:[ 1460.070003] [<ffffffffa0aecf50>] osc_io_unplug+0x10/0x20 [osc] 11:49:29:[ 1460.070003] [<ffffffffa0ac9441>] brw_queue_work+0x31/0xd0 [osc] 11:49:29:[ 1460.070003] [<ffffffffa090f0d7>] work_interpreter+0x37/0xf0 [ptlrpc] 11:49:29:[ 1460.070003] [<ffffffffa090bdb5>] ptlrpc_check_set.part.23+0x425/0x1dd0 [ptlrpc] 11:49:29:[ 1460.070003] [<ffffffffa090d7bb>] ptlrpc_check_set+0x5b/0xe0 [ptlrpc] 11:49:29:[ 1460.070003] [<ffffffffa09382bb>] ptlrpcd_check+0x4eb/0x5e0 [ptlrpc] 11:49:29:[ 1460.070003] [<ffffffffa093866b>] ptlrpcd+0x2bb/0x560 [ptlrpc] 11:49:29:[ 1460.070003] [<ffffffff810b8940>] ? wake_up_state+0x20/0x20 11:49:29:[ 1460.070003] [<ffffffffa09383b0>] ? ptlrpcd_check+0x5e0/0x5e0 [ptlrpc] 11:49:29:[ 1460.070003] [<ffffffff810a5b8f>] kthread+0xcf/0xe0 11:49:29:[ 1460.070003] [<ffffffff810a5ac0>] ? kthread_create_on_node+0x140/0x140 11:49:29:[ 1460.070003] [<ffffffff81646958>] ret_from_fork+0x58/0x90 11:49:29:[ 1460.070003] [<ffffffff810a5ac0>] ? kthread_create_on_node+0x140/0x140 11:49:29:[ 1460.070003] Code: af ff 01 74 0d f6 05 2a a8 af ff 08 0f 85 44 01 00 00 49 8b 44 24 48 48 85 c0 0f 84 a6 01 00 00 0f b6 80 b4 02 00 00 d0 e8 89 c1 <83> e1 01 83 e2 02 74 3b 8b 83 dc 00 00 00 85 c0 0f 85 b5 00 00 11:49:29:[ 1460.070003] Kernel panic - not syncing: softlockup: hung tasks 11:49:29:[ 1460.070003] CPU: 1 PID: 6005 Comm: ptlrpcd_00_00 Tainted: G OEL ------------ 3.10.0-327.36.1.el7.x86_64 #1 11:49:29:[ 1460.070003] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 11:49:29:[ 1460.070003] ffffffff818730cf 0000000054d8e01c ffff88007fd03e18 ffffffff81636301 11:49:29:[ 1460.070003] ffff88007fd03e98 ffffffff8162fb90 0000000000000008 ffff88007fd03ea8 11:49:29:[ 1460.070003] ffff88007fd03e48 0000000054d8e01c ffff88007fd03e67 0000000000000000 11:49:29:[ 1460.070003] Call Trace: 11:49:29:[ 1460.070003] <IRQ> [<ffffffff81636301>] dump_stack+0x19/0x1b 11:49:29:[ 1460.070003] [<ffffffff8162fb90>] panic+0xd8/0x1e7 11:49:29:[ 1460.070003] [<ffffffff8111b846>] watchdog_timer_fn+0x1b6/0x1c0 11:49:29:[ 1460.070003] [<ffffffff8111b690>] ? watchdog_enable+0xc0/0xc0 11:49:29:[ 1460.070003] [<ffffffff810a9db2>] __hrtimer_run_queues+0xd2/0x260 11:49:29:[ 1460.070003] [<ffffffff810aa350>] hrtimer_interrupt+0xb0/0x1e0 11:49:29:[ 1460.070003] [<ffffffff8164831c>] ? call_softirq+0x1c/0x30 11:49:29:[ 1460.070003] [<ffffffff810495c7>] local_apic_timer_interrupt+0x37/0x60 11:49:29:[ 1460.070003] [<ffffffff81648f8f>] smp_apic_timer_interrupt+0x3f/0x60 11:49:29:[ 1460.070003] [<ffffffff8164765d>] apic_timer_interrupt+0x6d/0x80 11:49:29:[ 1460.070003] <EOI> [<ffffffffa0ae9315>] ? osc_makes_rpc+0x45/0x5f0 [osc] 11:49:29:[ 1460.070003] [<ffffffffa0aeaa4d>] osc_check_rpcs+0x8d/0x18b0 [osc] 11:49:29:[ 1460.070003] [<ffffffff810c5618>] ? load_balance+0x218/0x890 11:49:29:[ 1460.070003] [<ffffffff810bb7d8>] ? sched_clock_cpu+0x98/0xc0 11:49:29:[ 1460.070003] [<ffffffff8101cd99>] ? sched_clock+0x9/0x10 11:49:29:[ 1460.070003] [<ffffffffa0aec352>] osc_io_unplug0+0xe2/0x130 [osc] 11:49:29:[ 1460.070003] [<ffffffffa0aecf50>] osc_io_unplug+0x10/0x20 [osc] 11:49:29:[ 1460.070003] [<ffffffffa0ac9441>] brw_queue_work+0x31/0xd0 [osc] 11:49:29:[ 1460.070003] [<ffffffffa090f0d7>] work_interpreter+0x37/0xf0 [ptlrpc] 11:49:29:[ 1460.070003] [<ffffffffa090bdb5>] ptlrpc_check_set.part.23+0x425/0x1dd0 [ptlrpc] 11:49:29:[ 1460.070003] [<ffffffffa090d7bb>] ptlrpc_check_set+0x5b/0xe0 [ptlrpc] 11:49:29:[ 1460.070003] [<ffffffffa09382bb>] ptlrpcd_check+0x4eb/0x5e0 [ptlrpc] 11:49:29:[ 1460.070003] [<ffffffffa093866b>] ptlrpcd+0x2bb/0x560 [ptlrpc] 11:49:29:[ 1460.070003] [<ffffffff810b8940>] ? wake_up_state+0x20/0x20 11:49:29:[ 1460.070003] [<ffffffffa09383b0>] ? ptlrpcd_check+0x5e0/0x5e0 [ptlrpc] 11:49:29:[ 1460.070003] [<ffffffff810a5b8f>] kthread+0xcf/0xe0 11:49:29:[ 1460.070003] [<ffffffff810a5ac0>] ? kthread_create_on_node+0x140/0x140 11:49:29:[ 1460.070003] [<ffffffff81646958>] ret_from_fork+0x58/0x90 11:49:29:[ 1460.070003] [<ffffffff810a5ac0>] ? kthread_create_on_node+0x140/0x140

          People

            bobijam Zhenyu Xu
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: