Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.4.0
-
single node racer
-
3
-
5788
Description
I saw this several times, before.
Showing up after multihour racer run:
Dec 9 10:47:06 centos6-0 kernel: [238404.096020] BUG: soft lockup - CPU#4 stuck for 67s! [ptlrpcd_6:16397] Dec 9 10:47:06 centos6-0 kernel: [238404.096267] Modules linked in: lustre ofd osp lod ost mdt osd_ldiskfs fsfilt_ldiskfs ldiskfs mdd mgs lquota obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass lvfs ksocklnd lnet libcfs exportfs jbd sha512_generic sha256_generic ext4 mbcache jbd2 virtio_balloon virtio_console i2c_piix4 i2c_core virtio_blk virtio_net virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache nfs_acl auth_rpcgss sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: libcfs] Dec 9 10:47:06 centos6-0 kernel: [238404.097261] CPU 4 Dec 9 10:47:06 centos6-0 kernel: [238404.097261] Modules linked in: lustre ofd osp lod ost mdt osd_ldiskfs fsfilt_ldiskfs ldiskfs mdd mgs lquota obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass lvfs ksocklnd lnet libcfs exportfs jbd sha512_generic sha256_generic ext4 mbcache jbd2 virtio_balloon virtio_console i2c_piix4 i2c_core virtio_blk virtio_net virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache nfs_acl auth_rpcgss sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: libcfs] Dec 9 10:47:06 centos6-0 kernel: [238404.097261] Dec 9 10:47:06 centos6-0 kernel: [238404.097261] Pid: 16397, comm: ptlrpcd_6 Not tainted 2.6.32-debug #6 Bochs Bochs Dec 9 10:47:06 centos6-0 kernel: [238404.097261] RIP: 0010:[<ffffffffa04af93a>] [<ffffffffa04af93a>] osc_extent_finish+0x35a/0xa20 [osc] Dec 9 10:47:06 centos6-0 kernel: [238404.097261] RSP: 0018:ffff88004c843b70 EFLAGS: 00010286 Dec 9 10:47:06 centos6-0 kernel: [238404.097261] RAX: 0000000000201000 RBX: ffff88004c843cb0 RCX: ffff880058592f80 Dec 9 10:47:06 centos6-0 kernel: [238404.097261] RDX: ffff880018852f30 RSI: ffff88002ad42f38 RDI: ffff88002ad42f48 Dec 9 10:47:06 centos6-0 kernel: [238404.097261] RBP: ffffffff8100bc0e R08: 0000000000000001 R09: ffff880000000000 Dec 9 10:47:06 centos6-0 kernel: [238404.097261] R10: 0000000000000527 R11: 0000000087654321 R12: ffff880058592f38 Dec 9 10:47:06 centos6-0 kernel: [238404.097261] R13: ffff880034eb1f30 R14: ffff88002ad42f30 R15: ffff880058592f80 Dec 9 10:47:06 centos6-0 kernel: [238404.097261] FS: 00007f7510cbc700(0000) GS:ffff880006300000(0000) knlGS:0000000000000000 Dec 9 10:47:06 centos6-0 kernel: [238404.097261] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Dec 9 10:47:06 centos6-0 kernel: [238404.097261] CR2: ffff880034eb1f58 CR3: 0000000001a25000 CR4: 00000000000006e0 Dec 9 10:47:06 centos6-0 kernel: [238404.097261] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Dec 9 10:47:06 centos6-0 kernel: [238404.097261] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Dec 9 10:47:06 centos6-0 kernel: [238404.097261] Process ptlrpcd_6 (pid: 16397, threadinfo ffff88004c842000, task ffff8800af572100) Dec 9 10:47:06 centos6-0 kernel: [238404.097261] Stack: Dec 9 10:47:06 centos6-0 kernel: [238404.097261] ffff880000000001 00f0000100000000 ffff88000000002d ffffffffa04c35bb Dec 9 10:47:06 centos6-0 kernel: [238404.097261] <d> ffff88004c843c60 ffff880048d44eb0 ffff880000000000 0000000000000100 Dec 9 10:47:06 centos6-0 kernel: [238404.097261] <d> ffff88000000002b ffffffff0000002d 0000000000000000 ffff880000000100 Dec 9 10:47:06 centos6-0 kernel: [238404.097261] Call Trace: Dec 9 10:47:06 centos6-0 kernel: [238404.097261] [<ffffffff814faf3e>] ? _spin_unlock+0xe/0x10 Dec 9 10:47:06 centos6-0 kernel: [238404.097261] [<ffffffffa0493587>] ? brw_interpret+0x2f7/0x1170 [osc] Dec 9 10:47:06 centos6-0 kernel: [238404.097261] [<ffffffffa11f7416>] ? ptlrpc_unregister_bulk+0xa6/0xb10 [ptlrpc] Dec 9 10:47:06 centos6-0 kernel: [238404.097261] [<ffffffffa11f0326>] ? ptlrpc_check_set+0x2b6/0x1db0 [ptlrpc] Dec 9 10:47:06 centos6-0 kernel: [238404.097261] [<ffffffffa12221cb>] ? ptlrpcd_check+0x55b/0x590 [ptlrpc] Dec 9 10:47:06 centos6-0 kernel: [238404.097261] [<ffffffffa122271b>] ? ptlrpcd+0x22b/0x3a0 [ptlrpc] Dec 9 10:47:06 centos6-0 kernel: [238404.097261] [<ffffffff81057d60>] ? default_wake_function+0x0/0x20 Dec 9 10:47:06 centos6-0 kernel: [238404.097261] [<ffffffffa12224f0>] ? ptlrpcd+0x0/0x3a0 [ptlrpc] Dec 9 10:47:06 centos6-0 kernel: [238404.097261] [<ffffffff8100c14a>] ? child_rip+0xa/0x20 Dec 9 10:47:06 centos6-0 kernel: [238404.097261] [<ffffffffa12224f0>] ? ptlrpcd+0x0/0x3a0 [ptlrpc] Dec 9 10:47:06 centos6-0 kernel: [238404.097261] [<ffffffffa12224f0>] ? ptlrpcd+0x0/0x3a0 [ptlrpc] Dec 9 10:47:06 centos6-0 kernel: [238404.097261] [<ffffffff8100c140>] ? child_rip+0x0/0x20 Dec 9 10:47:06 centos6-0 kernel: [238404.097261] Code: 89 77 08 48 89 3e 48 89 4a 18 48 89 4a 20 48 8b 72 08 48 8b 4a 10 48 89 4e 08 48 89 31 48 89 42 08 48 89 42 10 74 0a 48 8b 42 28 <49> 39 45 28 73 03 49 89 d5 41 83 6c 24 44 01 8b 8d 5c ff ff ff ... Dec 9 10:48:31 centos6-0 kernel: [238489.077267] LustreError: 4573:0:(import.c:324:ptlrpc_invalidate_import()) lustre-OST0000_UUID: rc = -110 waiting for callback (1 != 0) Dec 9 10:48:31 centos6-0 kernel: [238489.077619] LustreError: 4573:0:(import.c:350:ptlrpc_invalidate_import()) @@@ still on sending list req@ffff88006caa9bf0 x1420891247142946/t0(0) o3->lustre-OST0000-osc-ffff88007156cbf0@0@lo:6/4 lens 488/400 e 0 to 0 dl 1355067992 ref 1 fl Interpret:RE/0/0 rc -5/1048576 ... Dec 9 10:49:43 centos6-0 kernel: [238560.869299] INFO: task ldlm_bl_00:17875 blocked for more than 120 seconds. Dec 9 10:49:43 centos6-0 kernel: [238560.869491] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 9 10:49:43 centos6-0 kernel: [238560.869781] ldlm_bl_00 D 0000000000000003 4032 17875 2 0x00000000 Dec 9 10:49:43 centos6-0 kernel: [238560.869968] ffff88002082fd50 0000000000000046 0000000100000000 0000000000000000 Dec 9 10:49:43 centos6-0 kernel: [238560.870272] ffff88002082fce0 ffffffffa0ac93c2 ffff88002082fd40 0000000000000001 Dec 9 10:49:43 centos6-0 kernel: [238560.870568] ffff88006b6589f8 ffff88002082ffd8 000000000000fba8 ffff88006b6589f8 Dec 9 10:49:43 centos6-0 kernel: [238560.870864] Call Trace: Dec 9 10:49:43 centos6-0 kernel: [238560.871005] [<ffffffffa0ac93c2>] ? cfs_hash_bd_add_locked+0x62/0x90 [libcfs] Dec 9 10:49:43 centos6-0 kernel: [238560.871287] [<ffffffffa0ac9214>] ? cfs_hash_dual_bd_unlock+0x34/0x60 [libcfs] Dec 9 10:49:43 centos6-0 kernel: [238560.871567] [<ffffffff814f9618>] __mutex_lock_slowpath+0x128/0x2c0 Dec 9 10:49:43 centos6-0 kernel: [238560.871739] [<ffffffff814f97e1>] mutex_lock+0x31/0x50 Dec 9 10:49:43 centos6-0 kernel: [238560.871923] [<ffffffffa0dafbda>] cl_lock_mutex_get+0x7a/0xe0 [obdclass] Dec 9 10:49:43 centos6-0 kernel: [238560.872113] [<ffffffffa04a26fa>] osc_ldlm_blocking_ast+0x7a/0x380 [osc] Dec 9 10:49:43 centos6-0 kernel: [238560.872292] [<ffffffffa0ac4641>] ? libcfs_debug_msg+0x41/0x50 [libcfs] Dec 9 10:49:43 centos6-0 kernel: [238560.872488] [<ffffffffa11d5580>] ldlm_handle_bl_callback+0x130/0x400 [ptlrpc] Dec 9 10:49:43 centos6-0 kernel: [238560.872779] [<ffffffffa11d5ad1>] ldlm_bl_thread_main+0x281/0x3e0 [ptlrpc] Dec 9 10:49:43 centos6-0 kernel: [238560.872959] [<ffffffff81057d60>] ? default_wake_function+0x0/0x20 Dec 9 10:49:43 centos6-0 kernel: [238560.873144] [<ffffffffa11d5850>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc] Dec 9 10:49:43 centos6-0 kernel: [238560.873333] [<ffffffff8100c14a>] child_rip+0xa/0x20 Dec 9 10:49:43 centos6-0 kernel: [238560.873502] [<ffffffffa11d5850>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc] Dec 9 10:49:43 centos6-0 kernel: [238560.873692] [<ffffffffa11d5850>] ? ldlm_bl_thread_main+0x0/0x3e0 [ptlrpc] Dec 9 10:49:43 centos6-0 kernel: [238560.873870] [<ffffffff8100c140>] ? child_rip+0x0/0x20 Dec 9 10:49:43 centos6-0 kernel: [238560.874066] INFO: task dd:31903 blocked for more than 120 seconds. Dec 9 10:49:43 centos6-0 kernel: [238560.874235] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 9 10:49:43 centos6-0 kernel: [238560.874519] dd D 0000000000000007 2240 31903 18571 0x00000000 Dec 9 10:49:43 centos6-0 kernel: [238560.874704] ffff8800138b3a08 0000000000000086 ffff8800138b39d0 ffff8800138b39cc Dec 9 10:49:43 centos6-0 kernel: [238560.875000] 000000000000002a ffff8800bcc24f00 ffff8800063567c0 0000000000000001 Dec 9 10:49:43 centos6-0 kernel: [238560.875295] ffff8800787f2ab8 ffff8800138b3fd8 000000000000fba8 ffff8800787f2ab8 Dec 9 10:49:43 centos6-0 kernel: [238560.875589] Call Trace: Dec 9 10:49:43 centos6-0 kernel: [238560.875718] [<ffffffff814f9618>] __mutex_lock_slowpath+0x128/0x2c0 Dec 9 10:49:43 centos6-0 kernel: [238560.875889] [<ffffffff814f97e1>] mutex_lock+0x31/0x50 Dec 9 10:49:43 centos6-0 kernel: [238560.876062] [<ffffffffa0dafbda>] cl_lock_mutex_get+0x7a/0xe0 [obdclass] Dec 9 10:49:43 centos6-0 kernel: [238560.876251] [<ffffffffa0db256a>] cl_lock_state_wait+0x1ca/0x340 [obdclass] Dec 9 10:49:43 centos6-0 kernel: [238560.876431] [<ffffffff81057d60>] ? default_wake_function+0x0/0x20 Dec 9 10:49:43 centos6-0 kernel: [238560.876612] [<ffffffffa0db2a35>] cl_lock_enqueue_wait+0xe5/0x300 [obdclass] Dec 9 10:49:43 centos6-0 kernel: [238560.876803] [<ffffffffa0dafa2a>] ? cl_lock_mutex_put+0x7a/0x90 [obdclass] Dec 9 10:49:43 centos6-0 kernel: [238560.876995] [<ffffffffa0556cbb>] lov_lock_enqueue+0x50b/0x7d0 [lov] Dec 9 10:49:43 centos6-0 kernel: [238560.877181] [<ffffffffa0db17fc>] cl_enqueue_try+0xfc/0x310 [obdclass] Dec 9 10:49:43 centos6-0 kernel: [238560.877374] [<ffffffffa0db2cbd>] cl_enqueue_locked+0x6d/0x210 [obdclass] Dec 9 10:49:43 centos6-0 kernel: [238560.877567] [<ffffffffa0db395e>] cl_lock_request+0x7e/0x280 [obdclass] Dec 9 10:49:43 centos6-0 kernel: [238560.877755] [<ffffffffa0db8c86>] cl_io_lock+0x3d6/0x5b0 [obdclass] Dec 9 10:49:43 centos6-0 kernel: [238560.877938] [<ffffffffa0db8f02>] cl_io_loop+0xa2/0x1b0 [obdclass] Dec 9 10:49:43 centos6-0 kernel: [238560.878122] [<ffffffffa0c1463b>] ll_file_io_generic+0x42b/0x550 [lustre] Dec 9 10:49:43 centos6-0 kernel: [238560.878304] [<ffffffffa0c1552c>] ll_file_aio_write+0x13c/0x2c0 [lustre] Dec 9 10:49:43 centos6-0 kernel: [238560.878871] [<ffffffffa0c15819>] ll_file_write+0x169/0x2a0 [lustre] Dec 9 10:49:43 centos6-0 kernel: [238560.879054] [<ffffffff8117b2e8>] vfs_write+0xb8/0x1a0 Dec 9 10:49:43 centos6-0 kernel: [238560.879212] [<ffffffff8117bbb1>] sys_write+0x51/0x90 Dec 9 10:49:43 centos6-0 kernel: [238560.879369] [<ffffffff8100b0f2>] system_csystem_call_fastpath+0x16/0x1b Dec 9 10:49:43 centos6-0 kernel: [238560.879704] "echo 0 > /proc/sys/kernel/hun g_task_timeout_secs" disables this message. Dec 9 10:49:43 centos6-0 kernel: [238560.879992] cat D 00000000000000 01 2608 32058 18604 0x00000000 Dec 9 10:49:43 centos6-0 kernel: [238560.880534] ffff880011bcd7e8 000000000000 0086 ffff880011bcd7b0 ffff880011bcd7ac Dec 9 10:49:43 centos6-0 kernel: [238560.880830] ffff88000e388cb0 ffff8800bcc2 4300 ffff8800062967c0 0000000000000001 Dec 9 10:49:43 centos6-0 kernel: [238560.881130] ffff880071512978 ffff880011bc dfd8 000000000000fba8 ffff880071512978 Dec 9 10:49:43 centos6-0 kernel: [238560.881435] Call Trace: Dec 9 10:49:43 centos6-0 kernel: [238560.881565] [<ffffffff81112320>] ? sync_p age+0x0/0x50 Dec 9 10:49:43 centos6-0 kernel: [238560.881723] [<ffffffff814f8473>] io_schedule+0x73/0xc0 Dec 9 10:49:43 centos6-0 kernel: [238560.881882] [<ffffffff81112360>] sync_page+0x40/0x50 Dec 9 10:49:43 centos6-0 kernel: [238560.882043] [<ffffffff814f8cda>] __wait_on_bit_lock+0x5a/0xc0 Dec 9 10:49:43 centos6-0 kernel: [238560.882210] [<ffffffff811122f7>] __lock_page+0x67/0x70 Dec 9 10:49:43 centos6-0 kernel: [238560.882370] [<ffffffff8108fda0>] ? wake_bit_function+0x0/0x50 Dec 9 10:49:43 centos6-0 kernel: [238560.882536] [<ffffffff81112437>] ? unlock_page+0x27/0x30 Dec 9 10:49:43 centos6-0 kernel: [238560.882707] [<ffffffffa0c6748a>] vvp_page_own+0x9a/0xe0 [lustre] Dec 9 10:49:43 centos6-0 kernel: [238560.882893] [<ffffffffa0dab1eb>] cl_page_own0+0x11b/0x360 [obdclass] Dec 9 10:49:43 centos6-0 kernel: [238560.883082] [<ffffffffa0dab460>] cl_page_own+0x10/0x20 [obdclass] Dec 9 10:49:43 centos6-0 kernel: [238560.883265] [<ffffffffa0db1e7f>] check_and_discard_cb+0xcf/0x160 [obdclass] Dec 9 10:49:43 centos6-0 kernel: [238560.883458] [<ffffffffa0dacb8c>] cl_page_gang_lookup+0x1fc/0x410 [obdclass] Dec 9 10:49:43 centos6-0 kernel: [238560.883650] [<ffffffffa0db1db0>] ? check_and_discard_cb+0x0/0x160 [obdclass] Dec 9 10:49:43 centos6-0 kernel: [238560.883940] [<ffffffffa0db1db0>] ? check_and_discard_cb+0x0/0x160 [obdclass] Dec 9 10:49:43 centos6-0 kernel: [238560.884235] [<ffffffffa0daf67e>] cl_lock_discard_pages+0x11e/0x1f0 [obdclass] Dec 9 10:49:43 centos6-0 kernel: [238560.884528] [<ffffffffa0da4a55>] ? cl_env_info+0x15/0x20 [obdclass] Dec 9 10:49:43 centos6-0 kernel: [238560.884707] [<ffffffffa04a0f1f>] osc_lock_flush+0xff/0x280 [osc] Dec 9 10:49:43 centos6-0 kernel: [238560.884879] [<ffffffffa04a1187>] osc_lock_cancel+0xe7/0x1c0 [osc] Dec 9 10:49:43 centos6-0 kernel: [238560.885063] [<ffffffffa0dad3e5>] cl_lock_cancel0+0x75/0x160 [obdclass] Dec 9 10:49:43 centos6-0 kernel: [238560.885254] [<ffffffffa0daf026>] cl_lock_hold_release+0x1c6/0x2b0 [obdclass] Dec 9 10:49:43 centos6-0 kernel: [238560.885558] [<ffffffffa0db0b97>] cl_lock_unhold+0x37/0x130 [obdclass] Dec 9 10:49:43 centos6-0 kernel: [238560.885740] [<ffffffffa0553ea8>] lov_sublock_release+0x1a8/0x2a0 [lov] Dec 9 10:49:43 centos6-0 kernel: [238560.885922] [<ffffffffa055599d>] lov_lock_unuse+0xed/0x280 [lov] Dec 9 10:49:43 centos6-0 kernel: [238560.886106] [<ffffffffa0dad5c5>] cl_unuse_try_internal+0x55/0x100 [obdclass] Dec 9 10:49:43 centos6-0 kernel: [238560.886395] [<ffffffffa0db0f99>] cl_unuse_try+0x199/0x340 [obdclass] Dec 9 10:49:43 centos6-0 kernel: [238560.886581] [<ffffffffa0db1177>] cl_unuse_locked+0x37/0x1a0 [obdclass] Dec 9 10:49:43 centos6-0 kernel: [238560.886768] [<ffffffffa0db131e>] cl_unuse+0x3e/0x120 [obdclass] Dec 9 10:49:43 centos6-0 kernel: [238560.886949] [<ffffffffa0db8227>] cl_io_unlock+0x127/0x2d0 [obdclass] Dec 9 10:49:43 centos6-0 kernel: [238560.887140] [<ffffffffa0db8f2d>] cl_io_loop+0xcd/0x1b0 [obdclass] Dec 9 10:49:43 centos6-0 kernel: [238560.887318] [<ffffffffa0c145b3>] ll_file_io_generic+0x3a3/0x550 [lustre] Dec 9 10:49:43 centos6-0 kernel: [238560.887502] [<ffffffffa0c14899>] ll_file_aio_read+0x139/0x2c0 [lustre] Dec 9 10:49:43 centos6-0 kernel: [238560.887685] [<ffffffffa0c14d59>] ll_file_read+0x169/0x2a0 [lustre] Dec 9 10:49:43 centos6-0 kernel: [238560.887858] [<ffffffff8117b9e5>] vfs_read+0xb5/0x1a0 Dec 9 10:49:43 centos6-0 kernel: [238560.888017] [<ffffffff8117bb21>] sys_read+0x51/0x90 Dec 9 10:49:43 centos6-0 kernel: [238560.888175] [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b