Details
-
Bug
-
Resolution: Fixed
-
Critical
-
None
-
Lustre 2.10.0
-
Lustre: Build Version: 2.10.0_61_g6aabd4a
ZFS: 0.7.1-1
-
3
-
9223372036854775807
Description
We had two test runs stop at the exact same spot last night with the exact same error messages:
[14833.875452] Lustre: testfs-OST0001: new disk, initializing [14833.878250] Lustre: srv-testfs-OST0001: No data found on store. Initialize space [14833.902088] Lustre: testfs-OST0001: Imperative Recovery not enabled, recovery window 300-900 [14834.388551] Lustre: Failing over testfs-OST0001 [14834.398323] Lustre: server umount testfs-OST0001 complete [14836.789429] sda: sda1 sda9 [14837.830153] sde: sde1 sde9 [14840.744138] sda: sda1 sda9 [14854.985498] sde: sde1 sde9 [14885.314589] Lustre: testfs-OST0001: Imperative Recovery not enabled, recovery window 300-900 [14891.448270] Lustre: testfs-OST0001: Connection restored to testfs-MDT0000-mdtlov_UUID (at 10.14.83.53@tcp) [15042.279898] Lustre: 1995:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503391178/real 1503391178] req@ffff88002681ad00 x1576412291011264/t0(0) o400->MGC10.14.83.52@tcp@10.14.83.52@tcp:26/25 lens 224/224 e 0 to 1 dl 1503391185 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [15042.289748] Lustre: 1995:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 1 previous similar message [15042.292641] LustreError: 166-1: MGC10.14.83.52@tcp: Connection to MGS (at 10.14.83.52@tcp) was lost; in progress operations using this service will fail [15074.296779] Lustre: testfs-MDT0000-lwp-OST0001: Connection to testfs-MDT0000 (at 10.14.83.53@tcp) was lost; in progress operations using this service will wait for recovery to complete [15135.302559] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503391267/real 1503391267] req@ffff88002681b900 x1576412291011408/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.53@tcp:12/10 lens 520/544 e 0 to 1 dl 1503391278 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [15135.316461] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 6 previous similar messages [15300.302154] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503391417/real 1503391417] req@ffff88002681a700 x1576412291011584/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.53@tcp:12/10 lens 520/544 e 0 to 1 dl 1503391443 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [15300.313985] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 11 previous similar messages [15615.301742] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503391717/real 1503391717] req@ffff880026818c00 x1576412291011776/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.53@tcp:12/10 lens 520/544 e 0 to 1 dl 1503391758 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [15615.315431] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 11 previous similar messages [16254.304740] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503392342/real 1503392342] req@ffff880021452100 x1576412291012080/t0(0) o250->MGC10.14.83.52@tcp@10.14.83.52@tcp:26/25 lens 520/544 e 0 to 1 dl 1503392397 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [16254.318093] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 18 previous similar messages [16879.303144] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503392967/real 1503392967] req@ffff880021450300 x1576412291012352/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.53@tcp:12/10 lens 520/544 e 0 to 1 dl 1503393022 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [16879.317746] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 16 previous similar messages [17529.303018] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503393617/real 1503393617] req@ffff880021451500 x1576412291012656/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.53@tcp:12/10 lens 520/544 e 0 to 1 dl 1503393672 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [17529.319175] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 18 previous similar messages [18129.305069] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503394217/real 1503394217] req@ffff88002681a700 x1576412291012912/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.53@tcp:12/10 lens 520/544 e 0 to 1 dl 1503394272 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [18129.320603] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 15 previous similar messages [18400.583192] WARNING: can't open objset 51, error 5 [18400.608243] VERIFY3(0 == zap_lookup(spa->spa_meta_objset, spa->spa_feat_enabled_txg_obj, feature->fi_guid, sizeof (uint64_t), 1, res)) failed (0 == 52) [18400.618086] PANIC at zfeature.c:294:feature_get_enabled_txg() [18400.622541] Showing stack for process 15981 [18400.626693] CPU: 1 PID: 15981 Comm: zpool Tainted: P OE ------------ 3.10.0-514.26.2.el7_lustre.x86_64 #1 [18400.631653] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [18400.635887] ffffffffa0e1d9b9 000000009e369596 ffff88007a42f8f8 ffffffff8168729f [18400.640500] ffff88007a42f908 ffffffffa0bee254 ffff88007a42fa90 ffffffffa0bee329 [18400.644779] 000000009e369596 ffff880000000030 ffff88007a42faa0 ffff88007a42fa40 [18400.649018] Call Trace: [18400.652394] [<ffffffff8168729f>] dump_stack+0x19/0x1b [18400.656133] [<ffffffffa0bee254>] spl_dumpstack+0x44/0x50 [spl] [18400.659779] [<ffffffffa0bee329>] spl_panic+0xc9/0x110 [spl] [18400.663224] [<ffffffffa0d42fe0>] ? zap_lookup_norm+0x60/0xd0 [zfs] [18400.666665] [<ffffffffa0d45345>] spa_feature_enabled_txg+0x105/0x110 [zfs] [18400.670112] [<ffffffffa0cd931b>] traverse_impl+0x3db/0x460 [zfs] [18400.673268] [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0 [18400.676337] [<ffffffffa0bea319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl] [18400.679462] [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [18400.682391] [<ffffffffa0cd9834>] traverse_pool+0x84/0x1d0 [zfs] [18400.685219] [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [18400.688157] [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [18400.690844] [<ffffffffa0d7d02a>] ? zio_null+0x6a/0x70 [zfs] [18400.693403] [<ffffffffa0d16128>] spa_load+0x1a58/0x2030 [zfs] [18400.695969] [<ffffffffa0d1675e>] spa_load_best+0x5e/0x290 [zfs] [18400.698506] [<ffffffffa0d184a2>] spa_import+0x212/0x730 [zfs] [18400.700874] [<ffffffffa0d53a27>] zfs_ioc_pool_import+0x147/0x160 [zfs] [18400.703281] [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs] [18400.705594] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [18400.707786] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [18400.710014] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [18400.712253] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [18400.714424] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [18600.401090] INFO: task zpool:15981 blocked for more than 120 seconds. [18600.406480] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [18600.410711] zpool D ffff88007b996850 0 15981 15941 0x00000080 [18600.414668] ffff88007a42f8f8 0000000000000086 ffff880026af3ec0 ffff88007a42ffd8 [18600.418542] ffff88007a42ffd8 ffff88007a42ffd8 ffff880026af3ec0 ffffffffa0e1d9b9 [18600.422329] 0000000000000126 ffffffffa0dfef70 ffffffffa0e1dcd8 ffff88007b996850 [18600.426025] Call Trace: [18600.428721] [<ffffffff8168c969>] schedule+0x29/0x70 [18600.431740] [<ffffffffa0bee355>] spl_panic+0xf5/0x110 [spl] [18600.434684] [<ffffffffa0d42fe0>] ? zap_lookup_norm+0x60/0xd0 [zfs] [18600.437608] [<ffffffffa0d45345>] spa_feature_enabled_txg+0x105/0x110 [zfs] [18600.440576] [<ffffffffa0cd931b>] traverse_impl+0x3db/0x460 [zfs] [18600.443208] [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0 [18600.445769] [<ffffffffa0bea319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl] [18600.448395] [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [18600.451054] [<ffffffffa0cd9834>] traverse_pool+0x84/0x1d0 [zfs] [18600.453420] [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [18600.455770] [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [18600.458073] [<ffffffffa0d7d02a>] ? zio_null+0x6a/0x70 [zfs] [18600.460240] [<ffffffffa0d16128>] spa_load+0x1a58/0x2030 [zfs] [18600.462362] [<ffffffffa0d1675e>] spa_load_best+0x5e/0x290 [zfs] [18600.464379] [<ffffffffa0d184a2>] spa_import+0x212/0x730 [zfs] [18600.466391] [<ffffffffa0d53a27>] zfs_ioc_pool_import+0x147/0x160 [zfs] [18600.468478] [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs] [18600.470429] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [18600.472467] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [18600.474422] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [18600.476269] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [18600.478246] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [18600.480194] INFO: task zpool:16142 blocked for more than 120 seconds. [18600.482172] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [18600.484232] zpool D ffffffffa0f22fa8 0 16142 1 0x00000084 [18600.486254] ffff880022043d60 0000000000000082 ffff880036440fb0 ffff880022043fd8 [18600.488358] ffff880022043fd8 ffff880022043fd8 ffff880036440fb0 ffffffffa0f22fa0 [18600.490470] ffffffffa0f22fa4 ffff880036440fb0 00000000ffffffff ffffffffa0f22fa8 [18600.492548] Call Trace: [18600.494035] [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70 [18600.496033] [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0 [18600.497883] [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f [18600.499666] [<ffffffffa0d1b6c0>] spa_all_configs+0x40/0x120 [zfs] [18600.501572] [<ffffffffa0d57cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs] [18600.503400] [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs] [18600.505153] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [18600.506874] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [18600.508643] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [18600.510361] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [18600.512129] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [18720.513066] INFO: task zpool:15981 blocked for more than 120 seconds. [18720.516508] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [18720.519948] zpool D ffff88007b996850 0 15981 1 0x00000084 [18720.523326] ffff88007a42f8f8 0000000000000086 ffff880026af3ec0 ffff88007a42ffd8 [18720.526586] ffff88007a42ffd8 ffff88007a42ffd8 ffff880026af3ec0 ffffffffa0e1d9b9 [18720.529828] 0000000000000126 ffffffffa0dfef70 ffffffffa0e1dcd8 ffff88007b996850 [18720.533034] Call Trace: [18720.535282] [<ffffffff8168c969>] schedule+0x29/0x70 [18720.537973] [<ffffffffa0bee355>] spl_panic+0xf5/0x110 [spl] [18720.540701] [<ffffffffa0d42fe0>] ? zap_lookup_norm+0x60/0xd0 [zfs] [18720.543465] [<ffffffffa0d45345>] spa_feature_enabled_txg+0x105/0x110 [zfs] [18720.546169] [<ffffffffa0cd931b>] traverse_impl+0x3db/0x460 [zfs] [18720.548680] [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0 [18720.551141] [<ffffffffa0bea319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl] [18720.553787] [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [18720.556222] [<ffffffffa0cd9834>] traverse_pool+0x84/0x1d0 [zfs] [18720.558741] [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [18720.561158] [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [18720.563569] [<ffffffffa0d7d02a>] ? zio_null+0x6a/0x70 [zfs] [18720.565793] [<ffffffffa0d16128>] spa_load+0x1a58/0x2030 [zfs] [18720.568028] [<ffffffffa0d1675e>] spa_load_best+0x5e/0x290 [zfs] [18720.570276] [<ffffffffa0d184a2>] spa_import+0x212/0x730 [zfs] [18720.572496] [<ffffffffa0d53a27>] zfs_ioc_pool_import+0x147/0x160 [zfs] [18720.574745] [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs] [18720.576819] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [18720.578837] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [18720.582488] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [18720.584451] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [18720.586520] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [18720.588603] INFO: task zpool:16142 blocked for more than 120 seconds. [18720.590724] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [18720.592984] zpool D ffffffffa0f22fa8 0 16142 1 0x00000084 [18720.595153] ffff880022043d60 0000000000000082 ffff880036440fb0 ffff880022043fd8 [18720.597356] ffff880022043fd8 ffff880022043fd8 ffff880036440fb0 ffffffffa0f22fa0 [18720.599540] ffffffffa0f22fa4 ffff880036440fb0 00000000ffffffff ffffffffa0f22fa8 [18720.601732] Call Trace: [18720.603309] [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70 [18720.605390] [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0 [18720.607287] [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f [18720.609093] [<ffffffffa0d1b6c0>] spa_all_configs+0x40/0x120 [zfs] [18720.611025] [<ffffffffa0d57cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs] [18720.612979] [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs] [18720.614836] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [18720.616581] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [18720.618358] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [18720.620030] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [18720.621907] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [18720.623680] INFO: task zpool:16158 blocked for more than 120 seconds. [18720.625393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [18720.627258] zpool D ffffffffa0f22fa8 0 16158 1 0x00000084 [18720.629098] ffff880042fd3d60 0000000000000082 ffff880036440000 ffff880042fd3fd8 [18720.630976] ffff880042fd3fd8 ffff880042fd3fd8 ffff880036440000 ffffffffa0f22fa0 [18720.632855] ffffffffa0f22fa4 ffff880036440000 00000000ffffffff ffffffffa0f22fa8 [18720.634731] Call Trace: [18720.636082] [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70 [18720.637802] [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0 [18720.639652] [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f [18720.641290] [<ffffffffa0d1b6c0>] spa_all_configs+0x40/0x120 [zfs] [18720.643023] [<ffffffffa0d57cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs] [18720.644796] [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs] [18720.646489] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [18720.648106] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [18720.649788] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [18720.651362] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [18720.653035] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [18720.654733] INFO: task zpool:16174 blocked for more than 120 seconds. [18720.656571] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [18720.658449] zpool D ffffffffa0f22fa8 0 16174 1 0x00000084 [18720.660311] ffff8800115a7d60 0000000000000086 ffff880026169f60 ffff8800115a7fd8 [18720.662247] ffff8800115a7fd8 ffff8800115a7fd8 ffff880026169f60 ffffffffa0f22fa0 [18720.664184] ffffffffa0f22fa4 ffff880026169f60 00000000ffffffff ffffffffa0f22fa8 [18720.666126] Call Trace: [18720.667565] [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70 [18720.669366] [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0 [18720.671150] [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f [18720.672871] [<ffffffffa0d1b6c0>] spa_all_configs+0x40/0x120 [zfs] [18720.674837] [<ffffffffa0d57cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs] [18720.676696] [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs] [18720.678479] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [18720.680151] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [18720.681882] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [18720.683524] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [18720.685240] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [18754.305036] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503394842/real 1503394842] req@ffff88002681b600 x1576412291013184/t0(0) o250->MGC10.14.83.52@tcp@10.14.83.52@tcp:26/25 lens 520/544 e 0 to 1 dl 1503394897 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [18754.316977] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 16 previous similar messages [18840.686099] INFO: task zpool:15981 blocked for more than 120 seconds. [18840.691071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [18840.694953] zpool D ffff88007b996850 0 15981 1 0x00000084 [18840.698602] ffff88007a42f8f8 0000000000000086 ffff880026af3ec0 ffff88007a42ffd8 [18840.702216] ffff88007a42ffd8 ffff88007a42ffd8 ffff880026af3ec0 ffffffffa0e1d9b9 [18840.705869] 0000000000000126 ffffffffa0dfef70 ffffffffa0e1dcd8 ffff88007b996850 [18840.709323] Call Trace: [18840.711896] [<ffffffff8168c969>] schedule+0x29/0x70 [18840.714985] [<ffffffffa0bee355>] spl_panic+0xf5/0x110 [spl] [18840.718012] [<ffffffffa0d42fe0>] ? zap_lookup_norm+0x60/0xd0 [zfs] [18840.720950] [<ffffffffa0d45345>] spa_feature_enabled_txg+0x105/0x110 [zfs] [18840.723927] [<ffffffffa0cd931b>] traverse_impl+0x3db/0x460 [zfs] [18840.726745] [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0 [18840.729416] [<ffffffffa0bea319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl] [18840.732143] [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [18840.734821] [<ffffffffa0cd9834>] traverse_pool+0x84/0x1d0 [zfs] [18840.737421] [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [18840.739974] [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [18840.742654] [<ffffffffa0d7d02a>] ? zio_null+0x6a/0x70 [zfs] [18840.745083] [<ffffffffa0d16128>] spa_load+0x1a58/0x2030 [zfs] [18840.747490] [<ffffffffa0d1675e>] spa_load_best+0x5e/0x290 [zfs] [18840.749804] [<ffffffffa0d184a2>] spa_import+0x212/0x730 [zfs] [18840.752141] [<ffffffffa0d53a27>] zfs_ioc_pool_import+0x147/0x160 [zfs] [18840.754539] [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs] [18840.756837] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [18840.759036] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [18840.761354] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [18840.763490] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [18840.765856] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [18840.768014] INFO: task zpool:16142 blocked for more than 120 seconds. [18840.770176] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [18840.772502] zpool D ffffffffa0f22fa8 0 16142 1 0x00000084 [18840.774773] ffff880022043d60 0000000000000082 ffff880036440fb0 ffff880022043fd8 [18840.777131] ffff880022043fd8 ffff880022043fd8 ffff880036440fb0 ffffffffa0f22fa0 [18840.779409] ffffffffa0f22fa4 ffff880036440fb0 00000000ffffffff ffffffffa0f22fa8 [18840.781655] Call Trace: [18840.783323] [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70 [18840.785410] [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0 [18840.787611] [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f [18840.789515] [<ffffffffa0d1b6c0>] spa_all_configs+0x40/0x120 [zfs] [18840.791552] [<ffffffffa0d57cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs] [18840.793580] [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs] [18840.795556] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [18840.797462] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [18840.799316] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [18840.801034] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [18840.802791] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [18840.804499] INFO: task zpool:16158 blocked for more than 120 seconds. [18840.806377] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [18840.808248] zpool D ffffffffa0f22fa8 0 16158 1 0x00000084 [18840.810085] ffff880042fd3d60 0000000000000082 ffff880036440000 ffff880042fd3fd8 [18840.811988] ffff880042fd3fd8 ffff880042fd3fd8 ffff880036440000 ffffffffa0f22fa0 [18840.813881] ffffffffa0f22fa4 ffff880036440000 00000000ffffffff ffffffffa0f22fa8 [18840.815763] Call Trace: [18840.817122] [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70 [18840.818878] [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0 [18840.820589] [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f [18840.822233] [<ffffffffa0d1b6c0>] spa_all_configs+0x40/0x120 [zfs] [18840.824108] [<ffffffffa0d57cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs] [18840.825875] [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs] [18840.827558] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [18840.829207] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [18840.830897] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [18840.832483] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [18840.834151] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [18840.835868] INFO: task zpool:16174 blocked for more than 120 seconds. [18840.837608] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [18840.839502] zpool D ffffffffa0f22fa8 0 16174 1 0x00000084 [18840.841413] ffff8800115a7d60 0000000000000086 ffff880026169f60 ffff8800115a7fd8 [18840.843350] ffff8800115a7fd8 ffff8800115a7fd8 ffff880026169f60 ffffffffa0f22fa0 [18840.845311] ffffffffa0f22fa4 ffff880026169f60 00000000ffffffff ffffffffa0f22fa8 [18840.847279] Call Trace: [18840.848725] [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70 [18840.850537] [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0 [18840.852321] [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f [18840.854048] [<ffffffffa0d1b6c0>] spa_all_configs+0x40/0x120 [zfs] [18840.855873] [<ffffffffa0d57cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs] [18840.857717] [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs] [18840.859633] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [18840.861311] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [18840.863054] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [18840.864694] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [18840.866468] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [19354.308243] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503395442/real 1503395442] req@ffff88002681a400 x1576412291013440/t0(0) o250->MGC10.14.83.52@tcp@10.14.83.52@tcp:26/25 lens 520/544 e 0 to 1 dl 1503395497 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [19354.319940] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 15 previous similar messages [19606.633901] Lustre: Failing over testfs-OST0001 [19606.643768] Lustre: server umount testfs-OST0001 complete
Relevant entries in the process list on the machine:
... 19296 ? Ds 0:00 | \_ zpool export zfs_pool_scsi0QEMU_QEMU_HARDDISK_disk11 ... 15981 ? D 0:00 zpool import -f zfs_pool_scsi0QEMU_QEMU_HARDDISK_disk15 16142 ? D 0:00 zpool list -H -o name 16158 ? D 0:00 zpool list -H -o name 16174 ? D 0:00 zpool list -H -o name 16190 ? D 0:00 zpool list -H -o name 16214 ? D 0:00 zpool list -H -o name 16231 ? D 0:00 zpool list -H -o name
Attachments
Issue Links
- links to
We think we've addressed this by integrating with MMP, and removing all force imports / exports.