Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.4.0
    • Lustre 2.4.0
    • single node racer
    • 3
    • 5788

    Description

      I saw this several times, before.
      Showing up after multihour racer run:

      Dec  9 10:47:06 centos6-0 kernel: [238404.096020] BUG: soft lockup - CPU#4 stuck for 67s! [ptlrpcd_6:16397]
      Dec  9 10:47:06 centos6-0 kernel: [238404.096267] Modules linked in: lustre ofd osp lod ost mdt osd_ldiskfs fsfilt_ldiskfs ldiskfs mdd mgs lquota obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass lvfs ksocklnd lnet libcfs exportfs jbd sha512_generic sha256_generic ext4 mbcache jbd2 virtio_balloon virtio_console i2c_piix4 i2c_core virtio_blk virtio_net virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache nfs_acl auth_rpcgss sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: libcfs]
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261] CPU 4 
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261] Modules linked in: lustre ofd osp lod ost mdt osd_ldiskfs fsfilt_ldiskfs ldiskfs mdd mgs lquota obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass lvfs ksocklnd lnet libcfs exportfs jbd sha512_generic sha256_generic ext4 mbcache jbd2 virtio_balloon virtio_console i2c_piix4 i2c_core virtio_blk virtio_net virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache nfs_acl auth_rpcgss sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: libcfs]
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261] 
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261] Pid: 16397, comm: ptlrpcd_6 Not tainted 2.6.32-debug #6 Bochs Bochs
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261] RIP: 0010:[<ffffffffa04af93a>]  [<ffffffffa04af93a>] osc_extent_finish+0x35a/0xa20 [osc]
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261] RSP: 0018:ffff88004c843b70  EFLAGS: 00010286
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261] RAX: 0000000000201000 RBX: ffff88004c843cb0 RCX: ffff880058592f80
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261] RDX: ffff880018852f30 RSI: ffff88002ad42f38 RDI: ffff88002ad42f48
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261] RBP: ffffffff8100bc0e R08: 0000000000000001 R09: ffff880000000000
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261] R10: 0000000000000527 R11: 0000000087654321 R12: ffff880058592f38
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261] R13: ffff880034eb1f30 R14: ffff88002ad42f30 R15: ffff880058592f80
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261] FS:  00007f7510cbc700(0000) GS:ffff880006300000(0000) knlGS:0000000000000000
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261] CR2: ffff880034eb1f58 CR3: 0000000001a25000 CR4: 00000000000006e0
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261] Process ptlrpcd_6 (pid: 16397, threadinfo ffff88004c842000, task ffff8800af572100)
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261] Stack:
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261]  ffff880000000001 00f0000100000000 ffff88000000002d ffffffffa04c35bb
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261] <d> ffff88004c843c60 ffff880048d44eb0 ffff880000000000 0000000000000100
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261] <d> ffff88000000002b ffffffff0000002d 0000000000000000 ffff880000000100
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261] Call Trace:
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261]  [<ffffffff814faf3e>] ? _spin_unlock+0xe/0x10
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261]  [<ffffffffa0493587>] ? brw_interpret+0x2f7/0x1170 [osc]
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261]  [<ffffffffa11f7416>] ? ptlrpc_unregister_bulk+0xa6/0xb10 [ptlrpc]
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261]  [<ffffffffa11f0326>] ? ptlrpc_check_set+0x2b6/0x1db0 [ptlrpc]
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261]  [<ffffffffa12221cb>] ? ptlrpcd_check+0x55b/0x590 [ptlrpc]
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261]  [<ffffffffa122271b>] ? ptlrpcd+0x22b/0x3a0 [ptlrpc]
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261]  [<ffffffff81057d60>] ? default_wake_function+0x0/0x20
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261]  [<ffffffffa12224f0>] ? ptlrpcd+0x0/0x3a0 [ptlrpc]
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261]  [<ffffffff8100c14a>] ? child_rip+0xa/0x20
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261]  [<ffffffffa12224f0>] ? ptlrpcd+0x0/0x3a0 [ptlrpc]
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261]  [<ffffffffa12224f0>] ? ptlrpcd+0x0/0x3a0 [ptlrpc]
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261]  [<ffffffff8100c140>] ? child_rip+0x0/0x20
      Dec  9 10:47:06 centos6-0 kernel: [238404.097261] Code: 89 77 08 48 89 3e 48 89 4a 18 48 89 4a 20 48 8b 72 08 48 8b 4a 10 48 89 4e 08 48 89 31 48 89 42 08 48 89 42 10 74 0a 48 8b 42 28 <49> 39 45 28 73 03 49 89 d5 41 83 6c 24 44 01 8b 8d 5c ff ff ff 
      ...
      Dec  9 10:48:31 centos6-0 kernel: [238489.077267] LustreError: 4573:0:(import.c:324:ptlrpc_invalidate_import()) lustre-OST0000_UUID: rc = -110 waiting for callback (1 != 0)
      Dec  9 10:48:31 centos6-0 kernel: [238489.077619] LustreError: 4573:0:(import.c:350:ptlrpc_invalidate_import()) @@@ still on sending list  req@ffff88006caa9bf0 x1420891247142946/t0(0) o3->lustre-OST0000-osc-ffff88007156cbf0@0@lo:6/4 lens 488/400 e 0 to 0 dl 1355067992 ref 1 fl Interpret:RE/0/0 rc -5/1048576
      ...
      Dec  9 10:49:43 centos6-0 kernel: [238560.869299] INFO: task ldlm_bl_00:17875 blocked for more than 120 seconds.
      Dec  9 10:49:43 centos6-0 kernel: [238560.869491] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      Dec  9 10:49:43 centos6-0 kernel: [238560.869781] ldlm_bl_00    D 0000000000000003  4032 17875      2 0x00000000
      Dec  9 10:49:43 centos6-0 kernel: [238560.869968]  ffff88002082fd50 0000000000000046 0000000100000000 0000000000000000
      Dec  9 10:49:43 centos6-0 kernel: [238560.870272]  ffff88002082fce0 ffffffffa0ac93c2 ffff88002082fd40 0000000000000001
      Dec  9 10:49:43 centos6-0 kernel: [238560.870568]  ffff88006b6589f8 ffff88002082ffd8 000000000000fba8 ffff88006b6589f8
      Dec  9 10:49:43 centos6-0 kernel: [238560.870864] Call Trace:
      Dec  9 10:49:43 centos6-0 kernel: [238560.871005]  [<ffffffffa0ac93c2>] ? cfs_hash_bd_add_locked+0x62/0x90 [libcfs]
      Dec  9 10:49:43 centos6-0 kernel: [238560.871287]  [<ffffffffa0ac9214>] ? cfs_hash_dual_bd_unlock+0x34/0x60 [libcfs]
      Dec  9 10:49:43 centos6-0 kernel: [238560.871567]  [<ffffffff814f9618>] __mutex_lock_slowpath+0x128/0x2c0
      Dec  9 10:49:43 centos6-0 kernel: [238560.871739]  [<ffffffff814f97e1>] mutex_lock+0x31/0x50
      Dec  9 10:49:43 centos6-0 kernel: [238560.871923]  [<ffffffffa0dafbda>] cl_lock_mutex_get+0x7a/0xe0 [obdclass]
      Dec  9 10:49:43 centos6-0 kernel: [238560.872113]  [<ffffffffa04a26fa>] osc_ldlm_blocking_ast+0x7a/0x380 [osc]
      Dec  9 10:49:43 centos6-0 kernel: [238560.872292]  [<ffffffffa0ac4641>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      Dec  9 10:49:43 centos6-0 kernel: [238560.872488]  [<ffffffffa11d5580>] ldlm_handle_bl_callback+0x130/0x400 [ptlrpc]
      Dec  9 10:49:43 centos6-0 kernel: [238560.872779]  [<ffffffffa11d5ad1>] ldlm_bl_thread_main+0x281/0x3e0 [ptlrpc]
      Dec  9 10:49:43 centos6-0 kernel: [238560.872959]  [<ffffffff81057d60>] ? default_wake_function+0x0/0x20
      Dec  9 10:49:43 centos6-0 kernel: [238560.873144]  [<ffffffffa11d5850>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc]
      Dec  9 10:49:43 centos6-0 kernel: [238560.873333]  [<ffffffff8100c14a>] child_rip+0xa/0x20
      Dec  9 10:49:43 centos6-0 kernel: [238560.873502]  [<ffffffffa11d5850>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc]
      Dec  9 10:49:43 centos6-0 kernel: [238560.873692]  [<ffffffffa11d5850>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc]
      Dec  9 10:49:43 centos6-0 kernel: [238560.873870]  [<ffffffff8100c140>] ? child_rip+0x0/0x20
      Dec  9 10:49:43 centos6-0 kernel: [238560.874066] INFO: task dd:31903 blocked for more than 120 seconds.
      Dec  9 10:49:43 centos6-0 kernel: [238560.874235] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      Dec  9 10:49:43 centos6-0 kernel: [238560.874519] dd            D 0000000000000007  2240 31903  18571 0x00000000
      Dec  9 10:49:43 centos6-0 kernel: [238560.874704]  ffff8800138b3a08 0000000000000086 ffff8800138b39d0 ffff8800138b39cc
      Dec  9 10:49:43 centos6-0 kernel: [238560.875000]  000000000000002a ffff8800bcc24f00 ffff8800063567c0 0000000000000001
      Dec  9 10:49:43 centos6-0 kernel: [238560.875295]  ffff8800787f2ab8 ffff8800138b3fd8 000000000000fba8 ffff8800787f2ab8
      Dec  9 10:49:43 centos6-0 kernel: [238560.875589] Call Trace:
      Dec  9 10:49:43 centos6-0 kernel: [238560.875718]  [<ffffffff814f9618>] __mutex_lock_slowpath+0x128/0x2c0
      Dec  9 10:49:43 centos6-0 kernel: [238560.875889]  [<ffffffff814f97e1>] mutex_lock+0x31/0x50
      Dec  9 10:49:43 centos6-0 kernel: [238560.876062]  [<ffffffffa0dafbda>] cl_lock_mutex_get+0x7a/0xe0 [obdclass]
      Dec  9 10:49:43 centos6-0 kernel: [238560.876251]  [<ffffffffa0db256a>] cl_lock_state_wait+0x1ca/0x340 [obdclass]
      Dec  9 10:49:43 centos6-0 kernel: [238560.876431]  [<ffffffff81057d60>] ? default_wake_function+0x0/0x20
      Dec  9 10:49:43 centos6-0 kernel: [238560.876612]  [<ffffffffa0db2a35>] cl_lock_enqueue_wait+0xe5/0x300 [obdclass]
      Dec  9 10:49:43 centos6-0 kernel: [238560.876803]  [<ffffffffa0dafa2a>] ? cl_lock_mutex_put+0x7a/0x90 [obdclass]
      Dec  9 10:49:43 centos6-0 kernel: [238560.876995]  [<ffffffffa0556cbb>] lov_lock_enqueue+0x50b/0x7d0 [lov]
      Dec  9 10:49:43 centos6-0 kernel: [238560.877181]  [<ffffffffa0db17fc>] cl_enqueue_try+0xfc/0x310 [obdclass]
      Dec  9 10:49:43 centos6-0 kernel: [238560.877374]  [<ffffffffa0db2cbd>] cl_enqueue_locked+0x6d/0x210 [obdclass]
      Dec  9 10:49:43 centos6-0 kernel: [238560.877567]  [<ffffffffa0db395e>] cl_lock_request+0x7e/0x280 [obdclass]
      Dec  9 10:49:43 centos6-0 kernel: [238560.877755]  [<ffffffffa0db8c86>] cl_io_lock+0x3d6/0x5b0 [obdclass]
      Dec  9 10:49:43 centos6-0 kernel: [238560.877938]  [<ffffffffa0db8f02>] cl_io_loop+0xa2/0x1b0 [obdclass]
      Dec  9 10:49:43 centos6-0 kernel: [238560.878122]  [<ffffffffa0c1463b>] ll_file_io_generic+0x42b/0x550 [lustre]
      Dec  9 10:49:43 centos6-0 kernel: [238560.878304]  [<ffffffffa0c1552c>] ll_file_aio_write+0x13c/0x2c0 [lustre]
      Dec  9 10:49:43 centos6-0 kernel: [238560.878871]  [<ffffffffa0c15819>] ll_file_write+0x169/0x2a0 [lustre]
      Dec  9 10:49:43 centos6-0 kernel: [238560.879054]  [<ffffffff8117b2e8>] vfs_write+0xb8/0x1a0
      Dec  9 10:49:43 centos6-0 kernel: [238560.879212]  [<ffffffff8117bbb1>] sys_write+0x51/0x90
      Dec  9 10:49:43 centos6-0 kernel: [238560.879369]  [<ffffffff8100b0f2>] system_csystem_call_fastpath+0x16/0x1b
      Dec  9 10:49:43 centos6-0 kernel: [238560.879704] "echo 0 > /proc/sys/kernel/hun
      g_task_timeout_secs" disables this message.
      Dec  9 10:49:43 centos6-0 kernel: [238560.879992] cat           D 00000000000000
      01  2608 32058  18604 0x00000000
      Dec  9 10:49:43 centos6-0 kernel: [238560.880534]  ffff880011bcd7e8 000000000000
      0086 ffff880011bcd7b0 ffff880011bcd7ac
      Dec  9 10:49:43 centos6-0 kernel: [238560.880830]  ffff88000e388cb0 ffff8800bcc2
      4300 ffff8800062967c0 0000000000000001
      Dec  9 10:49:43 centos6-0 kernel: [238560.881130]  ffff880071512978 ffff880011bc
      dfd8 000000000000fba8 ffff880071512978
      Dec  9 10:49:43 centos6-0 kernel: [238560.881435] Call Trace:
      Dec  9 10:49:43 centos6-0 kernel: [238560.881565]  [<ffffffff81112320>] ? sync_p
      age+0x0/0x50
      Dec  9 10:49:43 centos6-0 kernel: [238560.881723]  [<ffffffff814f8473>] io_schedule+0x73/0xc0
      Dec  9 10:49:43 centos6-0 kernel: [238560.881882]  [<ffffffff81112360>] sync_page+0x40/0x50
      Dec  9 10:49:43 centos6-0 kernel: [238560.882043]  [<ffffffff814f8cda>] __wait_on_bit_lock+0x5a/0xc0
      Dec  9 10:49:43 centos6-0 kernel: [238560.882210]  [<ffffffff811122f7>] __lock_page+0x67/0x70
      Dec  9 10:49:43 centos6-0 kernel: [238560.882370]  [<ffffffff8108fda0>] ? wake_bit_function+0x0/0x50
      Dec  9 10:49:43 centos6-0 kernel: [238560.882536]  [<ffffffff81112437>] ? unlock_page+0x27/0x30
      Dec  9 10:49:43 centos6-0 kernel: [238560.882707]  [<ffffffffa0c6748a>] vvp_page_own+0x9a/0xe0 [lustre]
      Dec  9 10:49:43 centos6-0 kernel: [238560.882893]  [<ffffffffa0dab1eb>] cl_page_own0+0x11b/0x360 [obdclass]
      Dec  9 10:49:43 centos6-0 kernel: [238560.883082]  [<ffffffffa0dab460>] cl_page_own+0x10/0x20 [obdclass]
      Dec  9 10:49:43 centos6-0 kernel: [238560.883265]  [<ffffffffa0db1e7f>] check_and_discard_cb+0xcf/0x160 [obdclass]
      Dec  9 10:49:43 centos6-0 kernel: [238560.883458]  [<ffffffffa0dacb8c>] cl_page_gang_lookup+0x1fc/0x410 [obdclass]
      Dec  9 10:49:43 centos6-0 kernel: [238560.883650]  [<ffffffffa0db1db0>] ? check_and_discard_cb+0x0/0x160 [obdclass]
      Dec  9 10:49:43 centos6-0 kernel: [238560.883940]  [<ffffffffa0db1db0>] ? check_and_discard_cb+0x0/0x160 [obdclass]
      Dec  9 10:49:43 centos6-0 kernel: [238560.884235]  [<ffffffffa0daf67e>] cl_lock_discard_pages+0x11e/0x1f0 [obdclass]
      Dec  9 10:49:43 centos6-0 kernel: [238560.884528]  [<ffffffffa0da4a55>] ? cl_env_info+0x15/0x20 [obdclass]
      Dec  9 10:49:43 centos6-0 kernel: [238560.884707]  [<ffffffffa04a0f1f>] osc_lock_flush+0xff/0x280 [osc]
      Dec  9 10:49:43 centos6-0 kernel: [238560.884879]  [<ffffffffa04a1187>] osc_lock_cancel+0xe7/0x1c0 [osc]
      Dec  9 10:49:43 centos6-0 kernel: [238560.885063]  [<ffffffffa0dad3e5>] cl_lock_cancel0+0x75/0x160 [obdclass]
      Dec  9 10:49:43 centos6-0 kernel: [238560.885254]  [<ffffffffa0daf026>] cl_lock_hold_release+0x1c6/0x2b0 [obdclass]
      Dec  9 10:49:43 centos6-0 kernel: [238560.885558]  [<ffffffffa0db0b97>] cl_lock_unhold+0x37/0x130 [obdclass]
      Dec  9 10:49:43 centos6-0 kernel: [238560.885740]  [<ffffffffa0553ea8>] lov_sublock_release+0x1a8/0x2a0 [lov]
      Dec  9 10:49:43 centos6-0 kernel: [238560.885922]  [<ffffffffa055599d>] lov_lock_unuse+0xed/0x280 [lov]
      Dec  9 10:49:43 centos6-0 kernel: [238560.886106]  [<ffffffffa0dad5c5>] cl_unuse_try_internal+0x55/0x100 [obdclass]
      Dec  9 10:49:43 centos6-0 kernel: [238560.886395]  [<ffffffffa0db0f99>] cl_unuse_try+0x199/0x340 [obdclass]
      Dec  9 10:49:43 centos6-0 kernel: [238560.886581]  [<ffffffffa0db1177>] cl_unuse_locked+0x37/0x1a0 [obdclass]
      Dec  9 10:49:43 centos6-0 kernel: [238560.886768]  [<ffffffffa0db131e>] cl_unuse+0x3e/0x120 [obdclass]
      Dec  9 10:49:43 centos6-0 kernel: [238560.886949]  [<ffffffffa0db8227>] cl_io_unlock+0x127/0x2d0 [obdclass]
      Dec  9 10:49:43 centos6-0 kernel: [238560.887140]  [<ffffffffa0db8f2d>] cl_io_loop+0xcd/0x1b0 [obdclass]
      Dec  9 10:49:43 centos6-0 kernel: [238560.887318]  [<ffffffffa0c145b3>] ll_file_io_generic+0x3a3/0x550 [lustre]
      Dec  9 10:49:43 centos6-0 kernel: [238560.887502]  [<ffffffffa0c14899>] ll_file_aio_read+0x139/0x2c0 [lustre]
      Dec  9 10:49:43 centos6-0 kernel: [238560.887685]  [<ffffffffa0c14d59>] ll_file_read+0x169/0x2a0 [lustre]
      Dec  9 10:49:43 centos6-0 kernel: [238560.887858]  [<ffffffff8117b9e5>] vfs_read+0xb5/0x1a0
      Dec  9 10:49:43 centos6-0 kernel: [238560.888017]  [<ffffffff8117bb21>] sys_read+0x51/0x90
      Dec  9 10:49:43 centos6-0 kernel: [238560.888175]  [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
      

      Attachments

        Activity

          People

            green Oleg Drokin
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: