[LU-9901] PANIC at zfeature.c:294:feature_get_enabled_txg() Created: 22/Aug/17 Updated: 23/Aug/18 Resolved: 11/Nov/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Brian Murrell (Inactive) | Assignee: | WC Triage |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | zfs | ||
| Environment: |
Lustre: Build Version: 2.10.0_61_g6aabd4a |
||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
We had two test runs stop at the exact same spot last night with the exact same error messages: [14833.875452] Lustre: testfs-OST0001: new disk, initializing [14833.878250] Lustre: srv-testfs-OST0001: No data found on store. Initialize space [14833.902088] Lustre: testfs-OST0001: Imperative Recovery not enabled, recovery window 300-900 [14834.388551] Lustre: Failing over testfs-OST0001 [14834.398323] Lustre: server umount testfs-OST0001 complete [14836.789429] sda: sda1 sda9 [14837.830153] sde: sde1 sde9 [14840.744138] sda: sda1 sda9 [14854.985498] sde: sde1 sde9 [14885.314589] Lustre: testfs-OST0001: Imperative Recovery not enabled, recovery window 300-900 [14891.448270] Lustre: testfs-OST0001: Connection restored to testfs-MDT0000-mdtlov_UUID (at 10.14.83.53@tcp) [15042.279898] Lustre: 1995:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503391178/real 1503391178] req@ffff88002681ad00 x1576412291011264/t0(0) o400->MGC10.14.83.52@tcp@10.14.83.52@tcp:26/25 lens 224/224 e 0 to 1 dl 1503391185 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [15042.289748] Lustre: 1995:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 1 previous similar message [15042.292641] LustreError: 166-1: MGC10.14.83.52@tcp: Connection to MGS (at 10.14.83.52@tcp) was lost; in progress operations using this service will fail [15074.296779] Lustre: testfs-MDT0000-lwp-OST0001: Connection to testfs-MDT0000 (at 10.14.83.53@tcp) was lost; in progress operations using this service will wait for recovery to complete [15135.302559] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503391267/real 1503391267] req@ffff88002681b900 x1576412291011408/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.53@tcp:12/10 lens 520/544 e 0 to 1 dl 1503391278 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [15135.316461] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 6 previous similar messages [15300.302154] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503391417/real 1503391417] req@ffff88002681a700 x1576412291011584/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.53@tcp:12/10 lens 520/544 e 0 to 1 dl 1503391443 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [15300.313985] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 11 previous similar messages [15615.301742] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503391717/real 1503391717] req@ffff880026818c00 x1576412291011776/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.53@tcp:12/10 lens 520/544 e 0 to 1 dl 1503391758 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [15615.315431] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 11 previous similar messages [16254.304740] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503392342/real 1503392342] req@ffff880021452100 x1576412291012080/t0(0) o250->MGC10.14.83.52@tcp@10.14.83.52@tcp:26/25 lens 520/544 e 0 to 1 dl 1503392397 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [16254.318093] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 18 previous similar messages [16879.303144] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503392967/real 1503392967] req@ffff880021450300 x1576412291012352/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.53@tcp:12/10 lens 520/544 e 0 to 1 dl 1503393022 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [16879.317746] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 16 previous similar messages [17529.303018] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503393617/real 1503393617] req@ffff880021451500 x1576412291012656/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.53@tcp:12/10 lens 520/544 e 0 to 1 dl 1503393672 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [17529.319175] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 18 previous similar messages [18129.305069] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503394217/real 1503394217] req@ffff88002681a700 x1576412291012912/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.53@tcp:12/10 lens 520/544 e 0 to 1 dl 1503394272 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [18129.320603] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 15 previous similar messages [18400.583192] WARNING: can't open objset 51, error 5 [18400.608243] VERIFY3(0 == zap_lookup(spa->spa_meta_objset, spa->spa_feat_enabled_txg_obj, feature->fi_guid, sizeof (uint64_t), 1, res)) failed (0 == 52) [18400.618086] PANIC at zfeature.c:294:feature_get_enabled_txg() [18400.622541] Showing stack for process 15981 [18400.626693] CPU: 1 PID: 15981 Comm: zpool Tainted: P OE ------------ 3.10.0-514.26.2.el7_lustre.x86_64 #1 [18400.631653] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [18400.635887] ffffffffa0e1d9b9 000000009e369596 ffff88007a42f8f8 ffffffff8168729f [18400.640500] ffff88007a42f908 ffffffffa0bee254 ffff88007a42fa90 ffffffffa0bee329 [18400.644779] 000000009e369596 ffff880000000030 ffff88007a42faa0 ffff88007a42fa40 [18400.649018] Call Trace: [18400.652394] [<ffffffff8168729f>] dump_stack+0x19/0x1b [18400.656133] [<ffffffffa0bee254>] spl_dumpstack+0x44/0x50 [spl] [18400.659779] [<ffffffffa0bee329>] spl_panic+0xc9/0x110 [spl] [18400.663224] [<ffffffffa0d42fe0>] ? zap_lookup_norm+0x60/0xd0 [zfs] [18400.666665] [<ffffffffa0d45345>] spa_feature_enabled_txg+0x105/0x110 [zfs] [18400.670112] [<ffffffffa0cd931b>] traverse_impl+0x3db/0x460 [zfs] [18400.673268] [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0 [18400.676337] [<ffffffffa0bea319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl] [18400.679462] [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [18400.682391] [<ffffffffa0cd9834>] traverse_pool+0x84/0x1d0 [zfs] [18400.685219] [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [18400.688157] [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [18400.690844] [<ffffffffa0d7d02a>] ? zio_null+0x6a/0x70 [zfs] [18400.693403] [<ffffffffa0d16128>] spa_load+0x1a58/0x2030 [zfs] [18400.695969] [<ffffffffa0d1675e>] spa_load_best+0x5e/0x290 [zfs] [18400.698506] [<ffffffffa0d184a2>] spa_import+0x212/0x730 [zfs] [18400.700874] [<ffffffffa0d53a27>] zfs_ioc_pool_import+0x147/0x160 [zfs] [18400.703281] [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs] [18400.705594] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [18400.707786] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [18400.710014] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [18400.712253] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [18400.714424] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [18600.401090] INFO: task zpool:15981 blocked for more than 120 seconds. [18600.406480] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [18600.410711] zpool D ffff88007b996850 0 15981 15941 0x00000080 [18600.414668] ffff88007a42f8f8 0000000000000086 ffff880026af3ec0 ffff88007a42ffd8 [18600.418542] ffff88007a42ffd8 ffff88007a42ffd8 ffff880026af3ec0 ffffffffa0e1d9b9 [18600.422329] 0000000000000126 ffffffffa0dfef70 ffffffffa0e1dcd8 ffff88007b996850 [18600.426025] Call Trace: [18600.428721] [<ffffffff8168c969>] schedule+0x29/0x70 [18600.431740] [<ffffffffa0bee355>] spl_panic+0xf5/0x110 [spl] [18600.434684] [<ffffffffa0d42fe0>] ? zap_lookup_norm+0x60/0xd0 [zfs] [18600.437608] [<ffffffffa0d45345>] spa_feature_enabled_txg+0x105/0x110 [zfs] [18600.440576] [<ffffffffa0cd931b>] traverse_impl+0x3db/0x460 [zfs] [18600.443208] [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0 [18600.445769] [<ffffffffa0bea319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl] [18600.448395] [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [18600.451054] [<ffffffffa0cd9834>] traverse_pool+0x84/0x1d0 [zfs] [18600.453420] [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [18600.455770] [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [18600.458073] [<ffffffffa0d7d02a>] ? zio_null+0x6a/0x70 [zfs] [18600.460240] [<ffffffffa0d16128>] spa_load+0x1a58/0x2030 [zfs] [18600.462362] [<ffffffffa0d1675e>] spa_load_best+0x5e/0x290 [zfs] [18600.464379] [<ffffffffa0d184a2>] spa_import+0x212/0x730 [zfs] [18600.466391] [<ffffffffa0d53a27>] zfs_ioc_pool_import+0x147/0x160 [zfs] [18600.468478] [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs] [18600.470429] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [18600.472467] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [18600.474422] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [18600.476269] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [18600.478246] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [18600.480194] INFO: task zpool:16142 blocked for more than 120 seconds. [18600.482172] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [18600.484232] zpool D ffffffffa0f22fa8 0 16142 1 0x00000084 [18600.486254] ffff880022043d60 0000000000000082 ffff880036440fb0 ffff880022043fd8 [18600.488358] ffff880022043fd8 ffff880022043fd8 ffff880036440fb0 ffffffffa0f22fa0 [18600.490470] ffffffffa0f22fa4 ffff880036440fb0 00000000ffffffff ffffffffa0f22fa8 [18600.492548] Call Trace: [18600.494035] [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70 [18600.496033] [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0 [18600.497883] [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f [18600.499666] [<ffffffffa0d1b6c0>] spa_all_configs+0x40/0x120 [zfs] [18600.501572] [<ffffffffa0d57cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs] [18600.503400] [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs] [18600.505153] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [18600.506874] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [18600.508643] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [18600.510361] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [18600.512129] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [18720.513066] INFO: task zpool:15981 blocked for more than 120 seconds. [18720.516508] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [18720.519948] zpool D ffff88007b996850 0 15981 1 0x00000084 [18720.523326] ffff88007a42f8f8 0000000000000086 ffff880026af3ec0 ffff88007a42ffd8 [18720.526586] ffff88007a42ffd8 ffff88007a42ffd8 ffff880026af3ec0 ffffffffa0e1d9b9 [18720.529828] 0000000000000126 ffffffffa0dfef70 ffffffffa0e1dcd8 ffff88007b996850 [18720.533034] Call Trace: [18720.535282] [<ffffffff8168c969>] schedule+0x29/0x70 [18720.537973] [<ffffffffa0bee355>] spl_panic+0xf5/0x110 [spl] [18720.540701] [<ffffffffa0d42fe0>] ? zap_lookup_norm+0x60/0xd0 [zfs] [18720.543465] [<ffffffffa0d45345>] spa_feature_enabled_txg+0x105/0x110 [zfs] [18720.546169] [<ffffffffa0cd931b>] traverse_impl+0x3db/0x460 [zfs] [18720.548680] [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0 [18720.551141] [<ffffffffa0bea319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl] [18720.553787] [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [18720.556222] [<ffffffffa0cd9834>] traverse_pool+0x84/0x1d0 [zfs] [18720.558741] [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [18720.561158] [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [18720.563569] [<ffffffffa0d7d02a>] ? zio_null+0x6a/0x70 [zfs] [18720.565793] [<ffffffffa0d16128>] spa_load+0x1a58/0x2030 [zfs] [18720.568028] [<ffffffffa0d1675e>] spa_load_best+0x5e/0x290 [zfs] [18720.570276] [<ffffffffa0d184a2>] spa_import+0x212/0x730 [zfs] [18720.572496] [<ffffffffa0d53a27>] zfs_ioc_pool_import+0x147/0x160 [zfs] [18720.574745] [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs] [18720.576819] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [18720.578837] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [18720.582488] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [18720.584451] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [18720.586520] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [18720.588603] INFO: task zpool:16142 blocked for more than 120 seconds. [18720.590724] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [18720.592984] zpool D ffffffffa0f22fa8 0 16142 1 0x00000084 [18720.595153] ffff880022043d60 0000000000000082 ffff880036440fb0 ffff880022043fd8 [18720.597356] ffff880022043fd8 ffff880022043fd8 ffff880036440fb0 ffffffffa0f22fa0 [18720.599540] ffffffffa0f22fa4 ffff880036440fb0 00000000ffffffff ffffffffa0f22fa8 [18720.601732] Call Trace: [18720.603309] [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70 [18720.605390] [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0 [18720.607287] [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f [18720.609093] [<ffffffffa0d1b6c0>] spa_all_configs+0x40/0x120 [zfs] [18720.611025] [<ffffffffa0d57cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs] [18720.612979] [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs] [18720.614836] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [18720.616581] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [18720.618358] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [18720.620030] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [18720.621907] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [18720.623680] INFO: task zpool:16158 blocked for more than 120 seconds. [18720.625393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [18720.627258] zpool D ffffffffa0f22fa8 0 16158 1 0x00000084 [18720.629098] ffff880042fd3d60 0000000000000082 ffff880036440000 ffff880042fd3fd8 [18720.630976] ffff880042fd3fd8 ffff880042fd3fd8 ffff880036440000 ffffffffa0f22fa0 [18720.632855] ffffffffa0f22fa4 ffff880036440000 00000000ffffffff ffffffffa0f22fa8 [18720.634731] Call Trace: [18720.636082] [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70 [18720.637802] [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0 [18720.639652] [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f [18720.641290] [<ffffffffa0d1b6c0>] spa_all_configs+0x40/0x120 [zfs] [18720.643023] [<ffffffffa0d57cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs] [18720.644796] [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs] [18720.646489] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [18720.648106] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [18720.649788] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [18720.651362] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [18720.653035] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [18720.654733] INFO: task zpool:16174 blocked for more than 120 seconds. [18720.656571] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [18720.658449] zpool D ffffffffa0f22fa8 0 16174 1 0x00000084 [18720.660311] ffff8800115a7d60 0000000000000086 ffff880026169f60 ffff8800115a7fd8 [18720.662247] ffff8800115a7fd8 ffff8800115a7fd8 ffff880026169f60 ffffffffa0f22fa0 [18720.664184] ffffffffa0f22fa4 ffff880026169f60 00000000ffffffff ffffffffa0f22fa8 [18720.666126] Call Trace: [18720.667565] [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70 [18720.669366] [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0 [18720.671150] [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f [18720.672871] [<ffffffffa0d1b6c0>] spa_all_configs+0x40/0x120 [zfs] [18720.674837] [<ffffffffa0d57cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs] [18720.676696] [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs] [18720.678479] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [18720.680151] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [18720.681882] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [18720.683524] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [18720.685240] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [18754.305036] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503394842/real 1503394842] req@ffff88002681b600 x1576412291013184/t0(0) o250->MGC10.14.83.52@tcp@10.14.83.52@tcp:26/25 lens 520/544 e 0 to 1 dl 1503394897 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [18754.316977] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 16 previous similar messages [18840.686099] INFO: task zpool:15981 blocked for more than 120 seconds. [18840.691071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [18840.694953] zpool D ffff88007b996850 0 15981 1 0x00000084 [18840.698602] ffff88007a42f8f8 0000000000000086 ffff880026af3ec0 ffff88007a42ffd8 [18840.702216] ffff88007a42ffd8 ffff88007a42ffd8 ffff880026af3ec0 ffffffffa0e1d9b9 [18840.705869] 0000000000000126 ffffffffa0dfef70 ffffffffa0e1dcd8 ffff88007b996850 [18840.709323] Call Trace: [18840.711896] [<ffffffff8168c969>] schedule+0x29/0x70 [18840.714985] [<ffffffffa0bee355>] spl_panic+0xf5/0x110 [spl] [18840.718012] [<ffffffffa0d42fe0>] ? zap_lookup_norm+0x60/0xd0 [zfs] [18840.720950] [<ffffffffa0d45345>] spa_feature_enabled_txg+0x105/0x110 [zfs] [18840.723927] [<ffffffffa0cd931b>] traverse_impl+0x3db/0x460 [zfs] [18840.726745] [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0 [18840.729416] [<ffffffffa0bea319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl] [18840.732143] [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [18840.734821] [<ffffffffa0cd9834>] traverse_pool+0x84/0x1d0 [zfs] [18840.737421] [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [18840.739974] [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [18840.742654] [<ffffffffa0d7d02a>] ? zio_null+0x6a/0x70 [zfs] [18840.745083] [<ffffffffa0d16128>] spa_load+0x1a58/0x2030 [zfs] [18840.747490] [<ffffffffa0d1675e>] spa_load_best+0x5e/0x290 [zfs] [18840.749804] [<ffffffffa0d184a2>] spa_import+0x212/0x730 [zfs] [18840.752141] [<ffffffffa0d53a27>] zfs_ioc_pool_import+0x147/0x160 [zfs] [18840.754539] [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs] [18840.756837] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [18840.759036] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [18840.761354] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [18840.763490] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [18840.765856] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [18840.768014] INFO: task zpool:16142 blocked for more than 120 seconds. [18840.770176] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [18840.772502] zpool D ffffffffa0f22fa8 0 16142 1 0x00000084 [18840.774773] ffff880022043d60 0000000000000082 ffff880036440fb0 ffff880022043fd8 [18840.777131] ffff880022043fd8 ffff880022043fd8 ffff880036440fb0 ffffffffa0f22fa0 [18840.779409] ffffffffa0f22fa4 ffff880036440fb0 00000000ffffffff ffffffffa0f22fa8 [18840.781655] Call Trace: [18840.783323] [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70 [18840.785410] [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0 [18840.787611] [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f [18840.789515] [<ffffffffa0d1b6c0>] spa_all_configs+0x40/0x120 [zfs] [18840.791552] [<ffffffffa0d57cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs] [18840.793580] [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs] [18840.795556] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [18840.797462] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [18840.799316] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [18840.801034] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [18840.802791] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [18840.804499] INFO: task zpool:16158 blocked for more than 120 seconds. [18840.806377] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [18840.808248] zpool D ffffffffa0f22fa8 0 16158 1 0x00000084 [18840.810085] ffff880042fd3d60 0000000000000082 ffff880036440000 ffff880042fd3fd8 [18840.811988] ffff880042fd3fd8 ffff880042fd3fd8 ffff880036440000 ffffffffa0f22fa0 [18840.813881] ffffffffa0f22fa4 ffff880036440000 00000000ffffffff ffffffffa0f22fa8 [18840.815763] Call Trace: [18840.817122] [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70 [18840.818878] [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0 [18840.820589] [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f [18840.822233] [<ffffffffa0d1b6c0>] spa_all_configs+0x40/0x120 [zfs] [18840.824108] [<ffffffffa0d57cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs] [18840.825875] [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs] [18840.827558] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [18840.829207] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [18840.830897] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [18840.832483] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [18840.834151] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [18840.835868] INFO: task zpool:16174 blocked for more than 120 seconds. [18840.837608] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [18840.839502] zpool D ffffffffa0f22fa8 0 16174 1 0x00000084 [18840.841413] ffff8800115a7d60 0000000000000086 ffff880026169f60 ffff8800115a7fd8 [18840.843350] ffff8800115a7fd8 ffff8800115a7fd8 ffff880026169f60 ffffffffa0f22fa0 [18840.845311] ffffffffa0f22fa4 ffff880026169f60 00000000ffffffff ffffffffa0f22fa8 [18840.847279] Call Trace: [18840.848725] [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70 [18840.850537] [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0 [18840.852321] [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f [18840.854048] [<ffffffffa0d1b6c0>] spa_all_configs+0x40/0x120 [zfs] [18840.855873] [<ffffffffa0d57cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs] [18840.857717] [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs] [18840.859633] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [18840.861311] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [18840.863054] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [18840.864694] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [18840.866468] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [19354.308243] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503395442/real 1503395442] req@ffff88002681a400 x1576412291013440/t0(0) o250->MGC10.14.83.52@tcp@10.14.83.52@tcp:26/25 lens 520/544 e 0 to 1 dl 1503395497 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [19354.319940] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 15 previous similar messages [19606.633901] Lustre: Failing over testfs-OST0001 [19606.643768] Lustre: server umount testfs-OST0001 complete Relevant entries in the process list on the machine: ... 19296 ? Ds 0:00 | \_ zpool export zfs_pool_scsi0QEMU_QEMU_HARDDISK_disk11 ... 15981 ? D 0:00 zpool import -f zfs_pool_scsi0QEMU_QEMU_HARDDISK_disk15 16142 ? D 0:00 zpool list -H -o name 16158 ? D 0:00 zpool list -H -o name 16174 ? D 0:00 zpool list -H -o name 16190 ? D 0:00 zpool list -H -o name 16214 ? D 0:00 zpool list -H -o name 16231 ? D 0:00 zpool list -H -o name |
| Comments |
| Comment by Brian Murrell (Inactive) [ 22/Aug/17 ] |
|
From the second server, just in case it sheds any different/new light: [15667.882446] Lustre: testfs-OST0001: new disk, initializing [15667.884987] Lustre: srv-testfs-OST0001: No data found on store. Initialize space [15667.917724] Lustre: testfs-OST0001: Imperative Recovery not enabled, recovery window 300-900 [15668.707161] Lustre: Failing over testfs-OST0001 [15668.718588] Lustre: server umount testfs-OST0001 complete [15700.116755] sda: sda1 sda9 [15700.134485] sdc: sdc1 sdc9 [15700.156137] sde: sde1 sde9 [15703.114258] sda: sda1 sda9 [15704.405515] Lustre: testfs-OST0001: Imperative Recovery not enabled, recovery window 300-900 [15709.450388] Lustre: testfs-OST0001: Connection restored to 10.14.83.85@tcp (at 10.14.83.85@tcp) [15711.230525] sdc: sdc1 sdc9 [15886.394557] Lustre: 8675:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503385746/real 1503385746] req@ffff8800443cfc00 x1576406407449968/t0(0) o400->MGC10.14.83.84@tcp@10.14.83.84@tcp:26/25 lens 224/224 e 0 to 1 dl 1503385753 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [15886.405560] LustreError: 166-1: MGC10.14.83.84@tcp: Connection to MGS (at 10.14.83.84@tcp) was lost; in progress operations using this service will fail [15911.419483] Lustre: Evicted from MGS (at 10.14.83.85@tcp) after server handle changed from 0xbb7c0e830fe6b896 to 0x50fc281a1a2827fe [15911.427491] Lustre: MGC10.14.83.84@tcp: Connection restored to 10.14.83.85@tcp (at 10.14.83.85@tcp) [15918.410283] Lustre: 8675:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503385778/real 1503385778] req@ffff8800443cf000 x1576406407450032/t0(0) o400->testfs-MDT0000-lwp-OST0001@10.14.83.85@tcp:12/10 lens 224/224 e 0 to 1 dl 1503385785 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [15918.423482] Lustre: 8675:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 1 previous similar message [15918.427329] Lustre: testfs-MDT0000-lwp-OST0001: Connection to testfs-MDT0000 (at 10.14.83.85@tcp) was lost; in progress operations using this service will wait for recovery to complete [15925.558268] LustreError: 166-1: MGC10.14.83.84@tcp: Connection to MGS (at 10.14.83.85@tcp) was lost; in progress operations using this service will fail [15925.567055] LustreError: 9604:0:(mgc_request.c:603:do_requeue()) failed processing log: -5 [15992.570650] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503385849/real 1503385849] req@ffff8800443ce100 x1576406407450176/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.85@tcp:12/10 lens 520/544 e 0 to 1 dl 1503385860 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [15992.585639] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 6 previous similar messages [16025.570428] LustreError: 9604:0:(mgc_request.c:603:do_requeue()) failed processing log: -5 [16121.575838] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503385968/real 1503385968] req@ffff8800443cd200 x1576406407450336/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.85@tcp:12/10 lens 520/544 e 0 to 1 dl 1503385989 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [16121.590976] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 8 previous similar messages [16125.575897] LustreError: 9604:0:(mgc_request.c:603:do_requeue()) failed processing log: -5 [16225.581271] LustreError: 9604:0:(mgc_request.c:603:do_requeue()) failed processing log: -108 [16325.587390] LustreError: 9604:0:(mgc_request.c:603:do_requeue()) failed processing log: -5 [16386.587316] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503386218/real 1503386218] req@ffff880020c74000 x1576406407450544/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.85@tcp:12/10 lens 520/544 e 0 to 1 dl 1503386254 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [16386.604235] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 11 previous similar messages [16930.589449] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503386743/real 1503386743] req@ffff880020c75b00 x1576406407450832/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.84@tcp:12/10 lens 520/544 e 0 to 1 dl 1503386798 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [16930.606293] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 16 previous similar messages [17530.590000] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503387343/real 1503387343] req@ffff8800443cc300 x1576406407451072/t0(0) o250->MGC10.14.83.84@tcp@10.14.83.85@tcp:26/25 lens 520/544 e 0 to 1 dl 1503387398 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [17530.609959] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 15 previous similar messages [18130.596259] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503387943/real 1503387943] req@ffff8800443cc900 x1576406407451344/t0(0) o250->MGC10.14.83.84@tcp@10.14.83.84@tcp:26/25 lens 520/544 e 0 to 1 dl 1503387998 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [18130.615008] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 16 previous similar messages [18805.596314] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503388618/real 1503388618] req@ffff8800443cea00 x1576406407451648/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.85@tcp:12/10 lens 520/544 e 0 to 1 dl 1503388673 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [18805.613730] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 17 previous similar messages [19124.879146] WARNING: can't open objset 51, error 5 [19124.895291] VERIFY3(0 == zap_lookup(spa->spa_meta_objset, spa->spa_feat_enabled_txg_obj, feature->fi_guid, sizeof (uint64_t), 1, res)) failed (0 == 52) [19124.905980] PANIC at zfeature.c:294:feature_get_enabled_txg() [19124.910754] Showing stack for process 18651 [19124.915295] CPU: 1 PID: 18651 Comm: zpool Tainted: P OE ------------ 3.10.0-514.26.2.el7_lustre.x86_64 #1 [19124.920749] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [19124.925537] ffffffffa0e1f9b9 0000000051f16af8 ffff8800121ab8f8 ffffffff8168729f [19124.930690] ffff8800121ab908 ffffffffa0bf0254 ffff8800121aba90 ffffffffa0bf0329 [19124.935541] 0000000051f16af8 ffff880000000030 ffff8800121abaa0 ffff8800121aba40 [19124.939945] Call Trace: [19124.943537] [<ffffffff8168729f>] dump_stack+0x19/0x1b [19124.947382] [<ffffffffa0bf0254>] spl_dumpstack+0x44/0x50 [spl] [19124.951156] [<ffffffffa0bf0329>] spl_panic+0xc9/0x110 [spl] [19124.954764] [<ffffffffa0d44fe0>] ? zap_lookup_norm+0x60/0xd0 [zfs] [19124.958185] [<ffffffff8168c969>] ? schedule+0x29/0x70 [19124.961438] [<ffffffff8168a3a9>] ? schedule_timeout+0x239/0x2c0 [19124.964782] [<ffffffffa0d47345>] spa_feature_enabled_txg+0x105/0x110 [zfs] [19124.967995] [<ffffffffa0cdb31b>] traverse_impl+0x3db/0x460 [zfs] [19124.971047] [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0 [19124.974060] [<ffffffffa0bec319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl] [19124.977064] [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [19124.980079] [<ffffffffa0cdb834>] traverse_pool+0x84/0x1d0 [zfs] [19124.982868] [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [19124.985617] [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [19124.988234] [<ffffffffa0d7f02a>] ? zio_null+0x6a/0x70 [zfs] [19124.990798] [<ffffffffa0d18128>] spa_load+0x1a58/0x2030 [zfs] [19124.993366] [<ffffffffa0d1875e>] spa_load_best+0x5e/0x290 [zfs] [19124.995935] [<ffffffffa0d1a4a2>] spa_import+0x212/0x730 [zfs] [19124.998374] [<ffffffffa0d55a27>] zfs_ioc_pool_import+0x147/0x160 [zfs] [19125.000830] [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs] [19125.003126] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [19125.005463] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [19125.007630] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [19125.009651] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [19125.011740] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [19320.367509] INFO: task zpool:18651 blocked for more than 120 seconds. [19320.372490] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [19320.376754] zpool D ffff88001d84e850 0 18651 18627 0x00000080 [19320.380708] ffff8800121ab8f8 0000000000000082 ffff880010ba0fb0 ffff8800121abfd8 [19320.384641] ffff8800121abfd8 ffff8800121abfd8 ffff880010ba0fb0 ffffffffa0e1f9b9 [19320.388572] 0000000000000126 ffffffffa0e00f70 ffffffffa0e1fcd8 ffff88001d84e850 [19320.392387] Call Trace: [19320.395307] [<ffffffff8168c969>] schedule+0x29/0x70 [19320.398704] [<ffffffffa0bf0355>] spl_panic+0xf5/0x110 [spl] [19320.401801] [<ffffffffa0d44fe0>] ? zap_lookup_norm+0x60/0xd0 [zfs] [19320.404847] [<ffffffff8168c969>] ? schedule+0x29/0x70 [19320.407738] [<ffffffff8168a3a9>] ? schedule_timeout+0x239/0x2c0 [19320.410561] [<ffffffffa0d47345>] spa_feature_enabled_txg+0x105/0x110 [zfs] [19320.413381] [<ffffffffa0cdb31b>] traverse_impl+0x3db/0x460 [zfs] [19320.416000] [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0 [19320.418611] [<ffffffffa0bec319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl] [19320.421192] [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [19320.423688] [<ffffffffa0cdb834>] traverse_pool+0x84/0x1d0 [zfs] [19320.426251] [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [19320.428743] [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [19320.431079] [<ffffffffa0d7f02a>] ? zio_null+0x6a/0x70 [zfs] [19320.433308] [<ffffffffa0d18128>] spa_load+0x1a58/0x2030 [zfs] [19320.435512] [<ffffffffa0d1875e>] spa_load_best+0x5e/0x290 [zfs] [19320.437740] [<ffffffffa0d1a4a2>] spa_import+0x212/0x730 [zfs] [19320.439853] [<ffffffffa0d55a27>] zfs_ioc_pool_import+0x147/0x160 [zfs] [19320.442007] [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs] [19320.444030] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [19320.445995] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [19320.448208] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [19320.450087] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [19320.452013] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [19320.453955] INFO: task zpool:18768 blocked for more than 120 seconds. [19320.455929] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [19320.458108] zpool D ffffffffa0f24fa8 0 18768 1 0x00000084 [19320.460257] ffff88001250fd60 0000000000000086 ffff8800799dce70 ffff88001250ffd8 [19320.462466] ffff88001250ffd8 ffff88001250ffd8 ffff8800799dce70 ffffffffa0f24fa0 [19320.464712] ffffffffa0f24fa4 ffff8800799dce70 00000000ffffffff ffffffffa0f24fa8 [19320.466924] Call Trace: [19320.468524] [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70 [19320.470708] [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0 [19320.472673] [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f [19320.474542] [<ffffffffa0d1d6c0>] spa_all_configs+0x40/0x120 [zfs] [19320.476510] [<ffffffffa0d59cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs] [19320.478515] [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs] [19320.480341] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [19320.482125] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [19320.483982] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [19320.485702] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [19320.487660] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [19440.489227] INFO: task zpool:18651 blocked for more than 120 seconds. [19440.492719] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [19440.496212] zpool D ffff88001d84e850 0 18651 1 0x00000084 [19440.499703] ffff8800121ab8f8 0000000000000082 ffff880010ba0fb0 ffff8800121abfd8 [19440.503335] ffff8800121abfd8 ffff8800121abfd8 ffff880010ba0fb0 ffffffffa0e1f9b9 [19440.506612] 0000000000000126 ffffffffa0e00f70 ffffffffa0e1fcd8 ffff88001d84e850 [19440.509882] Call Trace: [19440.512287] [<ffffffff8168c969>] schedule+0x29/0x70 [19440.515148] [<ffffffffa0bf0355>] spl_panic+0xf5/0x110 [spl] [19440.518148] [<ffffffffa0d44fe0>] ? zap_lookup_norm+0x60/0xd0 [zfs] [19440.521164] [<ffffffff8168c969>] ? schedule+0x29/0x70 [19440.523946] [<ffffffff8168a3a9>] ? schedule_timeout+0x239/0x2c0 [19440.526764] [<ffffffffa0d47345>] spa_feature_enabled_txg+0x105/0x110 [zfs] [19440.529697] [<ffffffffa0cdb31b>] traverse_impl+0x3db/0x460 [zfs] [19440.532370] [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0 [19440.534989] [<ffffffffa0bec319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl] [19440.537705] [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [19440.540353] [<ffffffffa0cdb834>] traverse_pool+0x84/0x1d0 [zfs] [19440.542846] [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [19440.545361] [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [19440.547887] [<ffffffffa0d7f02a>] ? zio_null+0x6a/0x70 [zfs] [19440.550464] [<ffffffffa0d18128>] spa_load+0x1a58/0x2030 [zfs] [19440.552802] [<ffffffffa0d1875e>] spa_load_best+0x5e/0x290 [zfs] [19440.555110] [<ffffffffa0d1a4a2>] spa_import+0x212/0x730 [zfs] [19440.557406] [<ffffffffa0d55a27>] zfs_ioc_pool_import+0x147/0x160 [zfs] [19440.559811] [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs] [19440.561972] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [19440.564049] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [19440.566271] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [19440.568328] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [19440.570443] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [19440.572608] INFO: task zpool:18768 blocked for more than 120 seconds. [19440.574650] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [19440.576829] zpool D ffffffffa0f24fa8 0 18768 1 0x00000084 [19440.579018] ffff88001250fd60 0000000000000086 ffff8800799dce70 ffff88001250ffd8 [19440.581310] ffff88001250ffd8 ffff88001250ffd8 ffff8800799dce70 ffffffffa0f24fa0 [19440.583615] ffffffffa0f24fa4 ffff8800799dce70 00000000ffffffff ffffffffa0f24fa8 [19440.585846] Call Trace: [19440.587477] [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70 [19440.589545] [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0 [19440.591507] [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f [19440.593495] [<ffffffffa0d1d6c0>] spa_all_configs+0x40/0x120 [zfs] [19440.595469] [<ffffffffa0d59cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs] [19440.597487] [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs] [19440.599490] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [19440.601293] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [19440.603152] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [19440.604895] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [19440.606739] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [19440.608540] INFO: task zpool:18800 blocked for more than 120 seconds. [19440.610354] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [19440.612352] zpool D ffffffffa0f24fa8 0 18800 1 0x00000084 [19440.614246] ffff8800122dbd60 0000000000000082 ffff880010ba6dd0 ffff8800122dbfd8 [19440.616185] ffff8800122dbfd8 ffff8800122dbfd8 ffff880010ba6dd0 ffffffffa0f24fa0 [19440.618092] ffffffffa0f24fa4 ffff880010ba6dd0 00000000ffffffff ffffffffa0f24fa8 [19440.619977] Call Trace: [19440.621338] [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70 [19440.623054] [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0 [19440.624759] [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f [19440.626385] [<ffffffffa0d1d6c0>] spa_all_configs+0x40/0x120 [zfs] [19440.628090] [<ffffffffa0d59cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs] [19440.629920] [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs] [19440.631628] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [19440.633269] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [19440.634968] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [19440.636528] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [19440.638164] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [19440.639822] INFO: task zpool:18832 blocked for more than 120 seconds. [19440.641528] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [19440.643369] zpool D ffffffffa0f24fa8 0 18832 1 0x00000084 [19440.645209] ffff880012507d60 0000000000000082 ffff88001e53edd0 ffff880012507fd8 [19440.647187] ffff880012507fd8 ffff880012507fd8 ffff88001e53edd0 ffffffffa0f24fa0 [19440.649283] ffffffffa0f24fa4 ffff88001e53edd0 00000000ffffffff ffffffffa0f24fa8 [19440.651239] Call Trace: [19440.652709] [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70 [19440.654518] [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0 [19440.656271] [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f [19440.657960] [<ffffffffa0d1d6c0>] spa_all_configs+0x40/0x120 [zfs] [19440.659744] [<ffffffffa0d59cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs] [19440.661554] [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs] [19440.663293] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [19440.665097] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [19440.666816] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [19440.668431] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [19440.670127] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [19480.599139] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503389293/real 1503389293] req@ffff8800443ced00 x1576406407451920/t0(0) o250->MGC10.14.83.84@tcp@10.14.83.84@tcp:26/25 lens 520/544 e 0 to 1 dl 1503389348 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [19480.611983] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 17 previous similar messages [19560.671087] INFO: task zpool:18651 blocked for more than 120 seconds. [19560.678715] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [19560.682600] zpool D ffff88001d84e850 0 18651 1 0x00000084 [19560.686223] ffff8800121ab8f8 0000000000000082 ffff880010ba0fb0 ffff8800121abfd8 [19560.690105] ffff8800121abfd8 ffff8800121abfd8 ffff880010ba0fb0 ffffffffa0e1f9b9 [19560.693820] 0000000000000126 ffffffffa0e00f70 ffffffffa0e1fcd8 ffff88001d84e850 [19560.697666] Call Trace: [19560.700549] [<ffffffff8168c969>] schedule+0x29/0x70 [19560.703956] [<ffffffffa0bf0355>] spl_panic+0xf5/0x110 [spl] [19560.707126] [<ffffffffa0d44fe0>] ? zap_lookup_norm+0x60/0xd0 [zfs] [19560.710365] [<ffffffff8168c969>] ? schedule+0x29/0x70 [19560.713259] [<ffffffff8168a3a9>] ? schedule_timeout+0x239/0x2c0 [19560.716231] [<ffffffffa0d47345>] spa_feature_enabled_txg+0x105/0x110 [zfs] [19560.719335] [<ffffffffa0cdb31b>] traverse_impl+0x3db/0x460 [zfs] [19560.722251] [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0 [19560.724959] [<ffffffffa0bec319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl] [19560.727796] [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [19560.730692] [<ffffffffa0cdb834>] traverse_pool+0x84/0x1d0 [zfs] [19560.733315] [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [19560.735957] [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [19560.738547] [<ffffffffa0d7f02a>] ? zio_null+0x6a/0x70 [zfs] [19560.741076] [<ffffffffa0d18128>] spa_load+0x1a58/0x2030 [zfs] [19560.743504] [<ffffffffa0d1875e>] spa_load_best+0x5e/0x290 [zfs] [19560.745918] [<ffffffffa0d1a4a2>] spa_import+0x212/0x730 [zfs] [19560.748273] [<ffffffffa0d55a27>] zfs_ioc_pool_import+0x147/0x160 [zfs] [19560.750754] [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs] [19560.753059] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [19560.755347] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [19560.757527] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [19560.759572] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [19560.761745] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [19560.763873] INFO: task zpool:18768 blocked for more than 120 seconds. [19560.765997] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [19560.768229] zpool D ffffffffa0f24fa8 0 18768 1 0x00000084 [19560.770453] ffff88001250fd60 0000000000000086 ffff8800799dce70 ffff88001250ffd8 [19560.772778] ffff88001250ffd8 ffff88001250ffd8 ffff8800799dce70 ffffffffa0f24fa0 [19560.774985] ffffffffa0f24fa4 ffff8800799dce70 00000000ffffffff ffffffffa0f24fa8 [19560.777209] Call Trace: [19560.778908] [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70 [19560.780975] [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0 [19560.782995] [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f [19560.784889] [<ffffffffa0d1d6c0>] spa_all_configs+0x40/0x120 [zfs] [19560.786887] [<ffffffffa0d59cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs] [19560.788918] [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs] [19560.790803] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [19560.792588] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [19560.794332] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [19560.796068] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [19560.797746] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [19560.799400] INFO: task zpool:18800 blocked for more than 120 seconds. [19560.801105] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [19560.802941] zpool D ffffffffa0f24fa8 0 18800 1 0x00000084 [19560.804751] ffff8800122dbd60 0000000000000082 ffff880010ba6dd0 ffff8800122dbfd8 [19560.806634] ffff8800122dbfd8 ffff8800122dbfd8 ffff880010ba6dd0 ffffffffa0f24fa0 [19560.808496] ffffffffa0f24fa4 ffff880010ba6dd0 00000000ffffffff ffffffffa0f24fa8 [19560.810349] Call Trace: [19560.811709] [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70 [19560.813542] [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0 [19560.815241] [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f [19560.816884] [<ffffffffa0d1d6c0>] spa_all_configs+0x40/0x120 [zfs] [19560.818617] [<ffffffffa0d59cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs] [19560.820356] [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs] [19560.822025] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [19560.823636] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [19560.825297] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [19560.826866] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [19560.828517] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [19560.830332] INFO: task zpool:18832 blocked for more than 120 seconds. [19560.832078] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [19560.833956] zpool D ffffffffa0f24fa8 0 18832 1 0x00000084 [19560.835816] ffff880012507d60 0000000000000082 ffff88001e53edd0 ffff880012507fd8 [19560.837763] ffff880012507fd8 ffff880012507fd8 ffff88001e53edd0 ffffffffa0f24fa0 [19560.839700] ffffffffa0f24fa4 ffff88001e53edd0 00000000ffffffff ffffffffa0f24fa8 [19560.841638] Call Trace: [19560.843062] [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70 [19560.844855] [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0 [19560.846633] [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f [19560.848464] [<ffffffffa0d1d6c0>] spa_all_configs+0x40/0x120 [zfs] [19560.850290] [<ffffffffa0d59cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs] [19560.852156] [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs] [19560.853906] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [19560.855574] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [19560.857280] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [19560.858890] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [19560.860593] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [20080.600849] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503389893/real 1503389893] req@ffff880020c77600 x1576406407452176/t0(0) o250->MGC10.14.83.84@tcp@10.14.83.84@tcp:26/25 lens 520/544 e 0 to 1 dl 1503389948 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [20080.612492] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 15 previous similar messages [20630.092975] Lustre: Failing over testfs-OST0001 [20630.107077] Lustre: server umount testfs-OST0001 complete And the relevant process list on that machine: ... 23250 ? Ds 0:00 | \_ zpool export zfs_pool_scsi0QEMU_QEMU_HARDDISK_disk11 ... 18651 ? D 0:00 zpool import -f zfs_pool_scsi0QEMU_QEMU_HARDDISK_disk13 18768 ? D 0:00 zpool list -H -o name 18800 ? D 0:00 zpool list -H -o name 18832 ? D 0:00 zpool list -H -o name 18864 ? D 0:00 zpool list -H -o name 18896 ? D 0:00 zpool list -H -o name 19018 ? D 0:00 zpool list -H -o name But to be clear, these two machines are completely unrelated to each other. They are from different clusters running different instances of the same test. |
| Comment by John Hammond [ 22/Aug/17 ] |
|
Brian, are you using 2.10.0 exactly or a descendant? Is the ZFS version 0.6.5.9 or 0.7.1? |
| Comment by Brian Murrell (Inactive) [ 22/Aug/17 ] |
|
jhammond: Apologies for not filling in the Environment field which should have told you what you needed to know. I have updated it. The answer specifically is that we are using the tip of b2_10. |
| Comment by Brian Murrell (Inactive) [ 22/Aug/17 ] |
| Comment by Joe Grund [ 03/Oct/17 ] |
|
We think we've addressed this by integrating with MMP, and removing all force imports / exports. |