[LU-9901] PANIC at zfeature.c:294:feature_get_enabled_txg() Created: 22/Aug/17  Updated: 23/Aug/18  Resolved: 11/Nov/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Brian Murrell (Inactive) Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: zfs
Environment:

Lustre: Build Version: 2.10.0_61_g6aabd4a
ZFS: 0.7.1-1


Attachments: Text File chroma_test.log     File sosreport-lotus-59vm18.LU-9901-20170822073158.tar.xz     Text File zfs_pool_scsi0QEMU_QEMU_HARDDISK_disk13.zdb-dump-vvvv.txt    
Issue Links:
Duplicate
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

We had two test runs stop at the exact same spot last night with the exact same error messages:

[14833.875452] Lustre: testfs-OST0001: new disk, initializing
[14833.878250] Lustre: srv-testfs-OST0001: No data found on store. Initialize space
[14833.902088] Lustre: testfs-OST0001: Imperative Recovery not enabled, recovery window 300-900
[14834.388551] Lustre: Failing over testfs-OST0001
[14834.398323] Lustre: server umount testfs-OST0001 complete
[14836.789429]  sda: sda1 sda9
[14837.830153]  sde: sde1 sde9
[14840.744138]  sda: sda1 sda9
[14854.985498]  sde: sde1 sde9
[14885.314589] Lustre: testfs-OST0001: Imperative Recovery not enabled, recovery window 300-900
[14891.448270] Lustre: testfs-OST0001: Connection restored to testfs-MDT0000-mdtlov_UUID (at 10.14.83.53@tcp)
[15042.279898] Lustre: 1995:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503391178/real 1503391178]  req@ffff88002681ad00 x1576412291011264/t0(0) o400->MGC10.14.83.52@tcp@10.14.83.52@tcp:26/25 lens 224/224 e 0 to 1 dl 1503391185 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[15042.289748] Lustre: 1995:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 1 previous similar message
[15042.292641] LustreError: 166-1: MGC10.14.83.52@tcp: Connection to MGS (at 10.14.83.52@tcp) was lost; in progress operations using this service will fail
[15074.296779] Lustre: testfs-MDT0000-lwp-OST0001: Connection to testfs-MDT0000 (at 10.14.83.53@tcp) was lost; in progress operations using this service will wait for recovery to complete
[15135.302559] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503391267/real 1503391267]  req@ffff88002681b900 x1576412291011408/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.53@tcp:12/10 lens 520/544 e 0 to 1 dl 1503391278 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[15135.316461] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 6 previous similar messages
[15300.302154] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503391417/real 1503391417]  req@ffff88002681a700 x1576412291011584/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.53@tcp:12/10 lens 520/544 e 0 to 1 dl 1503391443 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[15300.313985] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 11 previous similar messages
[15615.301742] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503391717/real 1503391717]  req@ffff880026818c00 x1576412291011776/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.53@tcp:12/10 lens 520/544 e 0 to 1 dl 1503391758 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[15615.315431] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 11 previous similar messages
[16254.304740] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503392342/real 1503392342]  req@ffff880021452100 x1576412291012080/t0(0) o250->MGC10.14.83.52@tcp@10.14.83.52@tcp:26/25 lens 520/544 e 0 to 1 dl 1503392397 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[16254.318093] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 18 previous similar messages
[16879.303144] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503392967/real 1503392967]  req@ffff880021450300 x1576412291012352/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.53@tcp:12/10 lens 520/544 e 0 to 1 dl 1503393022 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[16879.317746] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 16 previous similar messages
[17529.303018] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503393617/real 1503393617]  req@ffff880021451500 x1576412291012656/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.53@tcp:12/10 lens 520/544 e 0 to 1 dl 1503393672 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[17529.319175] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 18 previous similar messages
[18129.305069] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503394217/real 1503394217]  req@ffff88002681a700 x1576412291012912/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.53@tcp:12/10 lens 520/544 e 0 to 1 dl 1503394272 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[18129.320603] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 15 previous similar messages
[18400.583192] WARNING: can't open objset 51, error 5
[18400.608243] VERIFY3(0 == zap_lookup(spa->spa_meta_objset, spa->spa_feat_enabled_txg_obj, feature->fi_guid, sizeof (uint64_t), 1, res)) failed (0 == 52)
[18400.618086] PANIC at zfeature.c:294:feature_get_enabled_txg()
[18400.622541] Showing stack for process 15981
[18400.626693] CPU: 1 PID: 15981 Comm: zpool Tainted: P           OE  ------------   3.10.0-514.26.2.el7_lustre.x86_64 #1
[18400.631653] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[18400.635887]  ffffffffa0e1d9b9 000000009e369596 ffff88007a42f8f8 ffffffff8168729f
[18400.640500]  ffff88007a42f908 ffffffffa0bee254 ffff88007a42fa90 ffffffffa0bee329
[18400.644779]  000000009e369596 ffff880000000030 ffff88007a42faa0 ffff88007a42fa40
[18400.649018] Call Trace:
[18400.652394]  [<ffffffff8168729f>] dump_stack+0x19/0x1b
[18400.656133]  [<ffffffffa0bee254>] spl_dumpstack+0x44/0x50 [spl]
[18400.659779]  [<ffffffffa0bee329>] spl_panic+0xc9/0x110 [spl]
[18400.663224]  [<ffffffffa0d42fe0>] ? zap_lookup_norm+0x60/0xd0 [zfs]
[18400.666665]  [<ffffffffa0d45345>] spa_feature_enabled_txg+0x105/0x110 [zfs]
[18400.670112]  [<ffffffffa0cd931b>] traverse_impl+0x3db/0x460 [zfs]
[18400.673268]  [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0
[18400.676337]  [<ffffffffa0bea319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl]
[18400.679462]  [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
[18400.682391]  [<ffffffffa0cd9834>] traverse_pool+0x84/0x1d0 [zfs]
[18400.685219]  [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
[18400.688157]  [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
[18400.690844]  [<ffffffffa0d7d02a>] ? zio_null+0x6a/0x70 [zfs]
[18400.693403]  [<ffffffffa0d16128>] spa_load+0x1a58/0x2030 [zfs]
[18400.695969]  [<ffffffffa0d1675e>] spa_load_best+0x5e/0x290 [zfs]
[18400.698506]  [<ffffffffa0d184a2>] spa_import+0x212/0x730 [zfs]
[18400.700874]  [<ffffffffa0d53a27>] zfs_ioc_pool_import+0x147/0x160 [zfs]
[18400.703281]  [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs]
[18400.705594]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
[18400.707786]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
[18400.710014]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
[18400.712253]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
[18400.714424]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
[18600.401090] INFO: task zpool:15981 blocked for more than 120 seconds.
[18600.406480] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[18600.410711] zpool           D ffff88007b996850     0 15981  15941 0x00000080
[18600.414668]  ffff88007a42f8f8 0000000000000086 ffff880026af3ec0 ffff88007a42ffd8
[18600.418542]  ffff88007a42ffd8 ffff88007a42ffd8 ffff880026af3ec0 ffffffffa0e1d9b9
[18600.422329]  0000000000000126 ffffffffa0dfef70 ffffffffa0e1dcd8 ffff88007b996850
[18600.426025] Call Trace:
[18600.428721]  [<ffffffff8168c969>] schedule+0x29/0x70
[18600.431740]  [<ffffffffa0bee355>] spl_panic+0xf5/0x110 [spl]
[18600.434684]  [<ffffffffa0d42fe0>] ? zap_lookup_norm+0x60/0xd0 [zfs]
[18600.437608]  [<ffffffffa0d45345>] spa_feature_enabled_txg+0x105/0x110 [zfs]
[18600.440576]  [<ffffffffa0cd931b>] traverse_impl+0x3db/0x460 [zfs]
[18600.443208]  [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0
[18600.445769]  [<ffffffffa0bea319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl]
[18600.448395]  [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
[18600.451054]  [<ffffffffa0cd9834>] traverse_pool+0x84/0x1d0 [zfs]
[18600.453420]  [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
[18600.455770]  [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
[18600.458073]  [<ffffffffa0d7d02a>] ? zio_null+0x6a/0x70 [zfs]
[18600.460240]  [<ffffffffa0d16128>] spa_load+0x1a58/0x2030 [zfs]
[18600.462362]  [<ffffffffa0d1675e>] spa_load_best+0x5e/0x290 [zfs]
[18600.464379]  [<ffffffffa0d184a2>] spa_import+0x212/0x730 [zfs]
[18600.466391]  [<ffffffffa0d53a27>] zfs_ioc_pool_import+0x147/0x160 [zfs]
[18600.468478]  [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs]
[18600.470429]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
[18600.472467]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
[18600.474422]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
[18600.476269]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
[18600.478246]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
[18600.480194] INFO: task zpool:16142 blocked for more than 120 seconds.
[18600.482172] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[18600.484232] zpool           D ffffffffa0f22fa8     0 16142      1 0x00000084
[18600.486254]  ffff880022043d60 0000000000000082 ffff880036440fb0 ffff880022043fd8
[18600.488358]  ffff880022043fd8 ffff880022043fd8 ffff880036440fb0 ffffffffa0f22fa0
[18600.490470]  ffffffffa0f22fa4 ffff880036440fb0 00000000ffffffff ffffffffa0f22fa8
[18600.492548] Call Trace:
[18600.494035]  [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70
[18600.496033]  [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0
[18600.497883]  [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f
[18600.499666]  [<ffffffffa0d1b6c0>] spa_all_configs+0x40/0x120 [zfs]
[18600.501572]  [<ffffffffa0d57cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs]
[18600.503400]  [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs]
[18600.505153]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
[18600.506874]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
[18600.508643]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
[18600.510361]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
[18600.512129]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
[18720.513066] INFO: task zpool:15981 blocked for more than 120 seconds.
[18720.516508] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[18720.519948] zpool           D ffff88007b996850     0 15981      1 0x00000084
[18720.523326]  ffff88007a42f8f8 0000000000000086 ffff880026af3ec0 ffff88007a42ffd8
[18720.526586]  ffff88007a42ffd8 ffff88007a42ffd8 ffff880026af3ec0 ffffffffa0e1d9b9
[18720.529828]  0000000000000126 ffffffffa0dfef70 ffffffffa0e1dcd8 ffff88007b996850
[18720.533034] Call Trace:
[18720.535282]  [<ffffffff8168c969>] schedule+0x29/0x70
[18720.537973]  [<ffffffffa0bee355>] spl_panic+0xf5/0x110 [spl]
[18720.540701]  [<ffffffffa0d42fe0>] ? zap_lookup_norm+0x60/0xd0 [zfs]
[18720.543465]  [<ffffffffa0d45345>] spa_feature_enabled_txg+0x105/0x110 [zfs]
[18720.546169]  [<ffffffffa0cd931b>] traverse_impl+0x3db/0x460 [zfs]
[18720.548680]  [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0
[18720.551141]  [<ffffffffa0bea319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl]
[18720.553787]  [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
[18720.556222]  [<ffffffffa0cd9834>] traverse_pool+0x84/0x1d0 [zfs]
[18720.558741]  [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
[18720.561158]  [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
[18720.563569]  [<ffffffffa0d7d02a>] ? zio_null+0x6a/0x70 [zfs]
[18720.565793]  [<ffffffffa0d16128>] spa_load+0x1a58/0x2030 [zfs]
[18720.568028]  [<ffffffffa0d1675e>] spa_load_best+0x5e/0x290 [zfs]
[18720.570276]  [<ffffffffa0d184a2>] spa_import+0x212/0x730 [zfs]
[18720.572496]  [<ffffffffa0d53a27>] zfs_ioc_pool_import+0x147/0x160 [zfs]
[18720.574745]  [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs]
[18720.576819]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
[18720.578837]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
[18720.582488]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
[18720.584451]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
[18720.586520]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
[18720.588603] INFO: task zpool:16142 blocked for more than 120 seconds.
[18720.590724] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[18720.592984] zpool           D ffffffffa0f22fa8     0 16142      1 0x00000084
[18720.595153]  ffff880022043d60 0000000000000082 ffff880036440fb0 ffff880022043fd8
[18720.597356]  ffff880022043fd8 ffff880022043fd8 ffff880036440fb0 ffffffffa0f22fa0
[18720.599540]  ffffffffa0f22fa4 ffff880036440fb0 00000000ffffffff ffffffffa0f22fa8
[18720.601732] Call Trace:
[18720.603309]  [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70
[18720.605390]  [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0
[18720.607287]  [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f
[18720.609093]  [<ffffffffa0d1b6c0>] spa_all_configs+0x40/0x120 [zfs]
[18720.611025]  [<ffffffffa0d57cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs]
[18720.612979]  [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs]
[18720.614836]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
[18720.616581]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
[18720.618358]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
[18720.620030]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
[18720.621907]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
[18720.623680] INFO: task zpool:16158 blocked for more than 120 seconds.
[18720.625393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[18720.627258] zpool           D ffffffffa0f22fa8     0 16158      1 0x00000084
[18720.629098]  ffff880042fd3d60 0000000000000082 ffff880036440000 ffff880042fd3fd8
[18720.630976]  ffff880042fd3fd8 ffff880042fd3fd8 ffff880036440000 ffffffffa0f22fa0
[18720.632855]  ffffffffa0f22fa4 ffff880036440000 00000000ffffffff ffffffffa0f22fa8
[18720.634731] Call Trace:
[18720.636082]  [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70
[18720.637802]  [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0
[18720.639652]  [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f
[18720.641290]  [<ffffffffa0d1b6c0>] spa_all_configs+0x40/0x120 [zfs]
[18720.643023]  [<ffffffffa0d57cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs]
[18720.644796]  [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs]
[18720.646489]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
[18720.648106]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
[18720.649788]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
[18720.651362]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
[18720.653035]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
[18720.654733] INFO: task zpool:16174 blocked for more than 120 seconds.
[18720.656571] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[18720.658449] zpool           D ffffffffa0f22fa8     0 16174      1 0x00000084
[18720.660311]  ffff8800115a7d60 0000000000000086 ffff880026169f60 ffff8800115a7fd8
[18720.662247]  ffff8800115a7fd8 ffff8800115a7fd8 ffff880026169f60 ffffffffa0f22fa0
[18720.664184]  ffffffffa0f22fa4 ffff880026169f60 00000000ffffffff ffffffffa0f22fa8
[18720.666126] Call Trace:
[18720.667565]  [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70
[18720.669366]  [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0
[18720.671150]  [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f
[18720.672871]  [<ffffffffa0d1b6c0>] spa_all_configs+0x40/0x120 [zfs]
[18720.674837]  [<ffffffffa0d57cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs]
[18720.676696]  [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs]
[18720.678479]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
[18720.680151]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
[18720.681882]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
[18720.683524]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
[18720.685240]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
[18754.305036] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503394842/real 1503394842]  req@ffff88002681b600 x1576412291013184/t0(0) o250->MGC10.14.83.52@tcp@10.14.83.52@tcp:26/25 lens 520/544 e 0 to 1 dl 1503394897 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[18754.316977] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 16 previous similar messages
[18840.686099] INFO: task zpool:15981 blocked for more than 120 seconds.
[18840.691071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[18840.694953] zpool           D ffff88007b996850     0 15981      1 0x00000084
[18840.698602]  ffff88007a42f8f8 0000000000000086 ffff880026af3ec0 ffff88007a42ffd8
[18840.702216]  ffff88007a42ffd8 ffff88007a42ffd8 ffff880026af3ec0 ffffffffa0e1d9b9
[18840.705869]  0000000000000126 ffffffffa0dfef70 ffffffffa0e1dcd8 ffff88007b996850
[18840.709323] Call Trace:
[18840.711896]  [<ffffffff8168c969>] schedule+0x29/0x70
[18840.714985]  [<ffffffffa0bee355>] spl_panic+0xf5/0x110 [spl]
[18840.718012]  [<ffffffffa0d42fe0>] ? zap_lookup_norm+0x60/0xd0 [zfs]
[18840.720950]  [<ffffffffa0d45345>] spa_feature_enabled_txg+0x105/0x110 [zfs]
[18840.723927]  [<ffffffffa0cd931b>] traverse_impl+0x3db/0x460 [zfs]
[18840.726745]  [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0
[18840.729416]  [<ffffffffa0bea319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl]
[18840.732143]  [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
[18840.734821]  [<ffffffffa0cd9834>] traverse_pool+0x84/0x1d0 [zfs]
[18840.737421]  [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
[18840.739974]  [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
[18840.742654]  [<ffffffffa0d7d02a>] ? zio_null+0x6a/0x70 [zfs]
[18840.745083]  [<ffffffffa0d16128>] spa_load+0x1a58/0x2030 [zfs]
[18840.747490]  [<ffffffffa0d1675e>] spa_load_best+0x5e/0x290 [zfs]
[18840.749804]  [<ffffffffa0d184a2>] spa_import+0x212/0x730 [zfs]
[18840.752141]  [<ffffffffa0d53a27>] zfs_ioc_pool_import+0x147/0x160 [zfs]
[18840.754539]  [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs]
[18840.756837]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
[18840.759036]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
[18840.761354]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
[18840.763490]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
[18840.765856]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
[18840.768014] INFO: task zpool:16142 blocked for more than 120 seconds.
[18840.770176] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[18840.772502] zpool           D ffffffffa0f22fa8     0 16142      1 0x00000084
[18840.774773]  ffff880022043d60 0000000000000082 ffff880036440fb0 ffff880022043fd8
[18840.777131]  ffff880022043fd8 ffff880022043fd8 ffff880036440fb0 ffffffffa0f22fa0
[18840.779409]  ffffffffa0f22fa4 ffff880036440fb0 00000000ffffffff ffffffffa0f22fa8
[18840.781655] Call Trace:
[18840.783323]  [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70
[18840.785410]  [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0
[18840.787611]  [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f
[18840.789515]  [<ffffffffa0d1b6c0>] spa_all_configs+0x40/0x120 [zfs]
[18840.791552]  [<ffffffffa0d57cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs]
[18840.793580]  [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs]
[18840.795556]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
[18840.797462]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
[18840.799316]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
[18840.801034]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
[18840.802791]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
[18840.804499] INFO: task zpool:16158 blocked for more than 120 seconds.
[18840.806377] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[18840.808248] zpool           D ffffffffa0f22fa8     0 16158      1 0x00000084
[18840.810085]  ffff880042fd3d60 0000000000000082 ffff880036440000 ffff880042fd3fd8
[18840.811988]  ffff880042fd3fd8 ffff880042fd3fd8 ffff880036440000 ffffffffa0f22fa0
[18840.813881]  ffffffffa0f22fa4 ffff880036440000 00000000ffffffff ffffffffa0f22fa8
[18840.815763] Call Trace:
[18840.817122]  [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70
[18840.818878]  [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0
[18840.820589]  [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f
[18840.822233]  [<ffffffffa0d1b6c0>] spa_all_configs+0x40/0x120 [zfs]
[18840.824108]  [<ffffffffa0d57cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs]
[18840.825875]  [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs]
[18840.827558]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
[18840.829207]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
[18840.830897]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
[18840.832483]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
[18840.834151]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
[18840.835868] INFO: task zpool:16174 blocked for more than 120 seconds.
[18840.837608] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[18840.839502] zpool           D ffffffffa0f22fa8     0 16174      1 0x00000084
[18840.841413]  ffff8800115a7d60 0000000000000086 ffff880026169f60 ffff8800115a7fd8
[18840.843350]  ffff8800115a7fd8 ffff8800115a7fd8 ffff880026169f60 ffffffffa0f22fa0
[18840.845311]  ffffffffa0f22fa4 ffff880026169f60 00000000ffffffff ffffffffa0f22fa8
[18840.847279] Call Trace:
[18840.848725]  [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70
[18840.850537]  [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0
[18840.852321]  [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f
[18840.854048]  [<ffffffffa0d1b6c0>] spa_all_configs+0x40/0x120 [zfs]
[18840.855873]  [<ffffffffa0d57cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs]
[18840.857717]  [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs]
[18840.859633]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
[18840.861311]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
[18840.863054]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
[18840.864694]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
[18840.866468]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
[19354.308243] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503395442/real 1503395442]  req@ffff88002681a400 x1576412291013440/t0(0) o250->MGC10.14.83.52@tcp@10.14.83.52@tcp:26/25 lens 520/544 e 0 to 1 dl 1503395497 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[19354.319940] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 15 previous similar messages
[19606.633901] Lustre: Failing over testfs-OST0001
[19606.643768] Lustre: server umount testfs-OST0001 complete

Relevant entries in the process list on the machine:

...
19296 ?        Ds     0:00  |   \_ zpool export zfs_pool_scsi0QEMU_QEMU_HARDDISK_disk11
...
15981 ?        D      0:00 zpool import -f zfs_pool_scsi0QEMU_QEMU_HARDDISK_disk15
16142 ?        D      0:00 zpool list -H -o name
16158 ?        D      0:00 zpool list -H -o name
16174 ?        D      0:00 zpool list -H -o name
16190 ?        D      0:00 zpool list -H -o name
16214 ?        D      0:00 zpool list -H -o name
16231 ?        D      0:00 zpool list -H -o name


 Comments   
Comment by Brian Murrell (Inactive) [ 22/Aug/17 ]

From the second server, just in case it sheds any different/new light:

[15667.882446] Lustre: testfs-OST0001: new disk, initializing
[15667.884987] Lustre: srv-testfs-OST0001: No data found on store. Initialize space
[15667.917724] Lustre: testfs-OST0001: Imperative Recovery not enabled, recovery window 300-900
[15668.707161] Lustre: Failing over testfs-OST0001
[15668.718588] Lustre: server umount testfs-OST0001 complete
[15700.116755]  sda: sda1 sda9
[15700.134485]  sdc: sdc1 sdc9
[15700.156137]  sde: sde1 sde9
[15703.114258]  sda: sda1 sda9
[15704.405515] Lustre: testfs-OST0001: Imperative Recovery not enabled, recovery window 300-900
[15709.450388] Lustre: testfs-OST0001: Connection restored to 10.14.83.85@tcp (at 10.14.83.85@tcp)
[15711.230525]  sdc: sdc1 sdc9
[15886.394557] Lustre: 8675:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503385746/real 1503385746]  req@ffff8800443cfc00 x1576406407449968/t0(0) o400->MGC10.14.83.84@tcp@10.14.83.84@tcp:26/25 lens 224/224 e 0 to 1 dl 1503385753 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[15886.405560] LustreError: 166-1: MGC10.14.83.84@tcp: Connection to MGS (at 10.14.83.84@tcp) was lost; in progress operations using this service will fail
[15911.419483] Lustre: Evicted from MGS (at 10.14.83.85@tcp) after server handle changed from 0xbb7c0e830fe6b896 to 0x50fc281a1a2827fe
[15911.427491] Lustre: MGC10.14.83.84@tcp: Connection restored to 10.14.83.85@tcp (at 10.14.83.85@tcp)
[15918.410283] Lustre: 8675:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503385778/real 1503385778]  req@ffff8800443cf000 x1576406407450032/t0(0) o400->testfs-MDT0000-lwp-OST0001@10.14.83.85@tcp:12/10 lens 224/224 e 0 to 1 dl 1503385785 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[15918.423482] Lustre: 8675:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 1 previous similar message
[15918.427329] Lustre: testfs-MDT0000-lwp-OST0001: Connection to testfs-MDT0000 (at 10.14.83.85@tcp) was lost; in progress operations using this service will wait for recovery to complete
[15925.558268] LustreError: 166-1: MGC10.14.83.84@tcp: Connection to MGS (at 10.14.83.85@tcp) was lost; in progress operations using this service will fail
[15925.567055] LustreError: 9604:0:(mgc_request.c:603:do_requeue()) failed processing log: -5
[15992.570650] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503385849/real 1503385849]  req@ffff8800443ce100 x1576406407450176/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.85@tcp:12/10 lens 520/544 e 0 to 1 dl 1503385860 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[15992.585639] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 6 previous similar messages
[16025.570428] LustreError: 9604:0:(mgc_request.c:603:do_requeue()) failed processing log: -5
[16121.575838] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503385968/real 1503385968]  req@ffff8800443cd200 x1576406407450336/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.85@tcp:12/10 lens 520/544 e 0 to 1 dl 1503385989 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[16121.590976] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 8 previous similar messages
[16125.575897] LustreError: 9604:0:(mgc_request.c:603:do_requeue()) failed processing log: -5
[16225.581271] LustreError: 9604:0:(mgc_request.c:603:do_requeue()) failed processing log: -108
[16325.587390] LustreError: 9604:0:(mgc_request.c:603:do_requeue()) failed processing log: -5
[16386.587316] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503386218/real 1503386218]  req@ffff880020c74000 x1576406407450544/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.85@tcp:12/10 lens 520/544 e 0 to 1 dl 1503386254 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[16386.604235] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 11 previous similar messages
[16930.589449] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503386743/real 1503386743]  req@ffff880020c75b00 x1576406407450832/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.84@tcp:12/10 lens 520/544 e 0 to 1 dl 1503386798 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[16930.606293] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 16 previous similar messages
[17530.590000] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503387343/real 1503387343]  req@ffff8800443cc300 x1576406407451072/t0(0) o250->MGC10.14.83.84@tcp@10.14.83.85@tcp:26/25 lens 520/544 e 0 to 1 dl 1503387398 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[17530.609959] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 15 previous similar messages
[18130.596259] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503387943/real 1503387943]  req@ffff8800443cc900 x1576406407451344/t0(0) o250->MGC10.14.83.84@tcp@10.14.83.84@tcp:26/25 lens 520/544 e 0 to 1 dl 1503387998 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[18130.615008] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 16 previous similar messages
[18805.596314] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503388618/real 1503388618]  req@ffff8800443cea00 x1576406407451648/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.85@tcp:12/10 lens 520/544 e 0 to 1 dl 1503388673 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[18805.613730] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 17 previous similar messages
[19124.879146] WARNING: can't open objset 51, error 5
[19124.895291] VERIFY3(0 == zap_lookup(spa->spa_meta_objset, spa->spa_feat_enabled_txg_obj, feature->fi_guid, sizeof (uint64_t), 1, res)) failed (0 == 52)
[19124.905980] PANIC at zfeature.c:294:feature_get_enabled_txg()
[19124.910754] Showing stack for process 18651
[19124.915295] CPU: 1 PID: 18651 Comm: zpool Tainted: P           OE  ------------   3.10.0-514.26.2.el7_lustre.x86_64 #1
[19124.920749] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[19124.925537]  ffffffffa0e1f9b9 0000000051f16af8 ffff8800121ab8f8 ffffffff8168729f
[19124.930690]  ffff8800121ab908 ffffffffa0bf0254 ffff8800121aba90 ffffffffa0bf0329
[19124.935541]  0000000051f16af8 ffff880000000030 ffff8800121abaa0 ffff8800121aba40
[19124.939945] Call Trace:
[19124.943537]  [<ffffffff8168729f>] dump_stack+0x19/0x1b
[19124.947382]  [<ffffffffa0bf0254>] spl_dumpstack+0x44/0x50 [spl]
[19124.951156]  [<ffffffffa0bf0329>] spl_panic+0xc9/0x110 [spl]
[19124.954764]  [<ffffffffa0d44fe0>] ? zap_lookup_norm+0x60/0xd0 [zfs]
[19124.958185]  [<ffffffff8168c969>] ? schedule+0x29/0x70
[19124.961438]  [<ffffffff8168a3a9>] ? schedule_timeout+0x239/0x2c0
[19124.964782]  [<ffffffffa0d47345>] spa_feature_enabled_txg+0x105/0x110 [zfs]
[19124.967995]  [<ffffffffa0cdb31b>] traverse_impl+0x3db/0x460 [zfs]
[19124.971047]  [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0
[19124.974060]  [<ffffffffa0bec319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl]
[19124.977064]  [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
[19124.980079]  [<ffffffffa0cdb834>] traverse_pool+0x84/0x1d0 [zfs]
[19124.982868]  [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
[19124.985617]  [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
[19124.988234]  [<ffffffffa0d7f02a>] ? zio_null+0x6a/0x70 [zfs]
[19124.990798]  [<ffffffffa0d18128>] spa_load+0x1a58/0x2030 [zfs]
[19124.993366]  [<ffffffffa0d1875e>] spa_load_best+0x5e/0x290 [zfs]
[19124.995935]  [<ffffffffa0d1a4a2>] spa_import+0x212/0x730 [zfs]
[19124.998374]  [<ffffffffa0d55a27>] zfs_ioc_pool_import+0x147/0x160 [zfs]
[19125.000830]  [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs]
[19125.003126]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
[19125.005463]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
[19125.007630]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
[19125.009651]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
[19125.011740]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
[19320.367509] INFO: task zpool:18651 blocked for more than 120 seconds.
[19320.372490] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[19320.376754] zpool           D ffff88001d84e850     0 18651  18627 0x00000080
[19320.380708]  ffff8800121ab8f8 0000000000000082 ffff880010ba0fb0 ffff8800121abfd8
[19320.384641]  ffff8800121abfd8 ffff8800121abfd8 ffff880010ba0fb0 ffffffffa0e1f9b9
[19320.388572]  0000000000000126 ffffffffa0e00f70 ffffffffa0e1fcd8 ffff88001d84e850
[19320.392387] Call Trace:
[19320.395307]  [<ffffffff8168c969>] schedule+0x29/0x70
[19320.398704]  [<ffffffffa0bf0355>] spl_panic+0xf5/0x110 [spl]
[19320.401801]  [<ffffffffa0d44fe0>] ? zap_lookup_norm+0x60/0xd0 [zfs]
[19320.404847]  [<ffffffff8168c969>] ? schedule+0x29/0x70
[19320.407738]  [<ffffffff8168a3a9>] ? schedule_timeout+0x239/0x2c0
[19320.410561]  [<ffffffffa0d47345>] spa_feature_enabled_txg+0x105/0x110 [zfs]
[19320.413381]  [<ffffffffa0cdb31b>] traverse_impl+0x3db/0x460 [zfs]
[19320.416000]  [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0
[19320.418611]  [<ffffffffa0bec319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl]
[19320.421192]  [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
[19320.423688]  [<ffffffffa0cdb834>] traverse_pool+0x84/0x1d0 [zfs]
[19320.426251]  [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
[19320.428743]  [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
[19320.431079]  [<ffffffffa0d7f02a>] ? zio_null+0x6a/0x70 [zfs]
[19320.433308]  [<ffffffffa0d18128>] spa_load+0x1a58/0x2030 [zfs]
[19320.435512]  [<ffffffffa0d1875e>] spa_load_best+0x5e/0x290 [zfs]
[19320.437740]  [<ffffffffa0d1a4a2>] spa_import+0x212/0x730 [zfs]
[19320.439853]  [<ffffffffa0d55a27>] zfs_ioc_pool_import+0x147/0x160 [zfs]
[19320.442007]  [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs]
[19320.444030]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
[19320.445995]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
[19320.448208]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
[19320.450087]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
[19320.452013]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
[19320.453955] INFO: task zpool:18768 blocked for more than 120 seconds.
[19320.455929] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[19320.458108] zpool           D ffffffffa0f24fa8     0 18768      1 0x00000084
[19320.460257]  ffff88001250fd60 0000000000000086 ffff8800799dce70 ffff88001250ffd8
[19320.462466]  ffff88001250ffd8 ffff88001250ffd8 ffff8800799dce70 ffffffffa0f24fa0
[19320.464712]  ffffffffa0f24fa4 ffff8800799dce70 00000000ffffffff ffffffffa0f24fa8
[19320.466924] Call Trace:
[19320.468524]  [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70
[19320.470708]  [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0
[19320.472673]  [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f
[19320.474542]  [<ffffffffa0d1d6c0>] spa_all_configs+0x40/0x120 [zfs]
[19320.476510]  [<ffffffffa0d59cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs]
[19320.478515]  [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs]
[19320.480341]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
[19320.482125]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
[19320.483982]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
[19320.485702]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
[19320.487660]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
[19440.489227] INFO: task zpool:18651 blocked for more than 120 seconds.
[19440.492719] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[19440.496212] zpool           D ffff88001d84e850     0 18651      1 0x00000084
[19440.499703]  ffff8800121ab8f8 0000000000000082 ffff880010ba0fb0 ffff8800121abfd8
[19440.503335]  ffff8800121abfd8 ffff8800121abfd8 ffff880010ba0fb0 ffffffffa0e1f9b9
[19440.506612]  0000000000000126 ffffffffa0e00f70 ffffffffa0e1fcd8 ffff88001d84e850
[19440.509882] Call Trace:
[19440.512287]  [<ffffffff8168c969>] schedule+0x29/0x70
[19440.515148]  [<ffffffffa0bf0355>] spl_panic+0xf5/0x110 [spl]
[19440.518148]  [<ffffffffa0d44fe0>] ? zap_lookup_norm+0x60/0xd0 [zfs]
[19440.521164]  [<ffffffff8168c969>] ? schedule+0x29/0x70
[19440.523946]  [<ffffffff8168a3a9>] ? schedule_timeout+0x239/0x2c0
[19440.526764]  [<ffffffffa0d47345>] spa_feature_enabled_txg+0x105/0x110 [zfs]
[19440.529697]  [<ffffffffa0cdb31b>] traverse_impl+0x3db/0x460 [zfs]
[19440.532370]  [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0
[19440.534989]  [<ffffffffa0bec319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl]
[19440.537705]  [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
[19440.540353]  [<ffffffffa0cdb834>] traverse_pool+0x84/0x1d0 [zfs]
[19440.542846]  [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
[19440.545361]  [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
[19440.547887]  [<ffffffffa0d7f02a>] ? zio_null+0x6a/0x70 [zfs]
[19440.550464]  [<ffffffffa0d18128>] spa_load+0x1a58/0x2030 [zfs]
[19440.552802]  [<ffffffffa0d1875e>] spa_load_best+0x5e/0x290 [zfs]
[19440.555110]  [<ffffffffa0d1a4a2>] spa_import+0x212/0x730 [zfs]
[19440.557406]  [<ffffffffa0d55a27>] zfs_ioc_pool_import+0x147/0x160 [zfs]
[19440.559811]  [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs]
[19440.561972]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
[19440.564049]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
[19440.566271]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
[19440.568328]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
[19440.570443]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
[19440.572608] INFO: task zpool:18768 blocked for more than 120 seconds.
[19440.574650] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[19440.576829] zpool           D ffffffffa0f24fa8     0 18768      1 0x00000084
[19440.579018]  ffff88001250fd60 0000000000000086 ffff8800799dce70 ffff88001250ffd8
[19440.581310]  ffff88001250ffd8 ffff88001250ffd8 ffff8800799dce70 ffffffffa0f24fa0
[19440.583615]  ffffffffa0f24fa4 ffff8800799dce70 00000000ffffffff ffffffffa0f24fa8
[19440.585846] Call Trace:
[19440.587477]  [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70
[19440.589545]  [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0
[19440.591507]  [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f
[19440.593495]  [<ffffffffa0d1d6c0>] spa_all_configs+0x40/0x120 [zfs]
[19440.595469]  [<ffffffffa0d59cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs]
[19440.597487]  [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs]
[19440.599490]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
[19440.601293]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
[19440.603152]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
[19440.604895]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
[19440.606739]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
[19440.608540] INFO: task zpool:18800 blocked for more than 120 seconds.
[19440.610354] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[19440.612352] zpool           D ffffffffa0f24fa8     0 18800      1 0x00000084
[19440.614246]  ffff8800122dbd60 0000000000000082 ffff880010ba6dd0 ffff8800122dbfd8
[19440.616185]  ffff8800122dbfd8 ffff8800122dbfd8 ffff880010ba6dd0 ffffffffa0f24fa0
[19440.618092]  ffffffffa0f24fa4 ffff880010ba6dd0 00000000ffffffff ffffffffa0f24fa8
[19440.619977] Call Trace:
[19440.621338]  [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70
[19440.623054]  [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0
[19440.624759]  [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f
[19440.626385]  [<ffffffffa0d1d6c0>] spa_all_configs+0x40/0x120 [zfs]
[19440.628090]  [<ffffffffa0d59cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs]
[19440.629920]  [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs]
[19440.631628]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
[19440.633269]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
[19440.634968]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
[19440.636528]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
[19440.638164]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
[19440.639822] INFO: task zpool:18832 blocked for more than 120 seconds.
[19440.641528] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[19440.643369] zpool           D ffffffffa0f24fa8     0 18832      1 0x00000084
[19440.645209]  ffff880012507d60 0000000000000082 ffff88001e53edd0 ffff880012507fd8
[19440.647187]  ffff880012507fd8 ffff880012507fd8 ffff88001e53edd0 ffffffffa0f24fa0
[19440.649283]  ffffffffa0f24fa4 ffff88001e53edd0 00000000ffffffff ffffffffa0f24fa8
[19440.651239] Call Trace:
[19440.652709]  [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70
[19440.654518]  [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0
[19440.656271]  [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f
[19440.657960]  [<ffffffffa0d1d6c0>] spa_all_configs+0x40/0x120 [zfs]
[19440.659744]  [<ffffffffa0d59cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs]
[19440.661554]  [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs]
[19440.663293]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
[19440.665097]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
[19440.666816]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
[19440.668431]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
[19440.670127]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
[19480.599139] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503389293/real 1503389293]  req@ffff8800443ced00 x1576406407451920/t0(0) o250->MGC10.14.83.84@tcp@10.14.83.84@tcp:26/25 lens 520/544 e 0 to 1 dl 1503389348 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[19480.611983] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 17 previous similar messages
[19560.671087] INFO: task zpool:18651 blocked for more than 120 seconds.
[19560.678715] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[19560.682600] zpool           D ffff88001d84e850     0 18651      1 0x00000084
[19560.686223]  ffff8800121ab8f8 0000000000000082 ffff880010ba0fb0 ffff8800121abfd8
[19560.690105]  ffff8800121abfd8 ffff8800121abfd8 ffff880010ba0fb0 ffffffffa0e1f9b9
[19560.693820]  0000000000000126 ffffffffa0e00f70 ffffffffa0e1fcd8 ffff88001d84e850
[19560.697666] Call Trace:
[19560.700549]  [<ffffffff8168c969>] schedule+0x29/0x70
[19560.703956]  [<ffffffffa0bf0355>] spl_panic+0xf5/0x110 [spl]
[19560.707126]  [<ffffffffa0d44fe0>] ? zap_lookup_norm+0x60/0xd0 [zfs]
[19560.710365]  [<ffffffff8168c969>] ? schedule+0x29/0x70
[19560.713259]  [<ffffffff8168a3a9>] ? schedule_timeout+0x239/0x2c0
[19560.716231]  [<ffffffffa0d47345>] spa_feature_enabled_txg+0x105/0x110 [zfs]
[19560.719335]  [<ffffffffa0cdb31b>] traverse_impl+0x3db/0x460 [zfs]
[19560.722251]  [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0
[19560.724959]  [<ffffffffa0bec319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl]
[19560.727796]  [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
[19560.730692]  [<ffffffffa0cdb834>] traverse_pool+0x84/0x1d0 [zfs]
[19560.733315]  [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
[19560.735957]  [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
[19560.738547]  [<ffffffffa0d7f02a>] ? zio_null+0x6a/0x70 [zfs]
[19560.741076]  [<ffffffffa0d18128>] spa_load+0x1a58/0x2030 [zfs]
[19560.743504]  [<ffffffffa0d1875e>] spa_load_best+0x5e/0x290 [zfs]
[19560.745918]  [<ffffffffa0d1a4a2>] spa_import+0x212/0x730 [zfs]
[19560.748273]  [<ffffffffa0d55a27>] zfs_ioc_pool_import+0x147/0x160 [zfs]
[19560.750754]  [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs]
[19560.753059]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
[19560.755347]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
[19560.757527]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
[19560.759572]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
[19560.761745]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
[19560.763873] INFO: task zpool:18768 blocked for more than 120 seconds.
[19560.765997] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[19560.768229] zpool           D ffffffffa0f24fa8     0 18768      1 0x00000084
[19560.770453]  ffff88001250fd60 0000000000000086 ffff8800799dce70 ffff88001250ffd8
[19560.772778]  ffff88001250ffd8 ffff88001250ffd8 ffff8800799dce70 ffffffffa0f24fa0
[19560.774985]  ffffffffa0f24fa4 ffff8800799dce70 00000000ffffffff ffffffffa0f24fa8
[19560.777209] Call Trace:
[19560.778908]  [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70
[19560.780975]  [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0
[19560.782995]  [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f
[19560.784889]  [<ffffffffa0d1d6c0>] spa_all_configs+0x40/0x120 [zfs]
[19560.786887]  [<ffffffffa0d59cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs]
[19560.788918]  [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs]
[19560.790803]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
[19560.792588]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
[19560.794332]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
[19560.796068]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
[19560.797746]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
[19560.799400] INFO: task zpool:18800 blocked for more than 120 seconds.
[19560.801105] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[19560.802941] zpool           D ffffffffa0f24fa8     0 18800      1 0x00000084
[19560.804751]  ffff8800122dbd60 0000000000000082 ffff880010ba6dd0 ffff8800122dbfd8
[19560.806634]  ffff8800122dbfd8 ffff8800122dbfd8 ffff880010ba6dd0 ffffffffa0f24fa0
[19560.808496]  ffffffffa0f24fa4 ffff880010ba6dd0 00000000ffffffff ffffffffa0f24fa8
[19560.810349] Call Trace:
[19560.811709]  [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70
[19560.813542]  [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0
[19560.815241]  [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f
[19560.816884]  [<ffffffffa0d1d6c0>] spa_all_configs+0x40/0x120 [zfs]
[19560.818617]  [<ffffffffa0d59cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs]
[19560.820356]  [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs]
[19560.822025]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
[19560.823636]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
[19560.825297]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
[19560.826866]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
[19560.828517]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
[19560.830332] INFO: task zpool:18832 blocked for more than 120 seconds.
[19560.832078] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[19560.833956] zpool           D ffffffffa0f24fa8     0 18832      1 0x00000084
[19560.835816]  ffff880012507d60 0000000000000082 ffff88001e53edd0 ffff880012507fd8
[19560.837763]  ffff880012507fd8 ffff880012507fd8 ffff88001e53edd0 ffffffffa0f24fa0
[19560.839700]  ffffffffa0f24fa4 ffff88001e53edd0 00000000ffffffff ffffffffa0f24fa8
[19560.841638] Call Trace:
[19560.843062]  [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70
[19560.844855]  [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0
[19560.846633]  [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f
[19560.848464]  [<ffffffffa0d1d6c0>] spa_all_configs+0x40/0x120 [zfs]
[19560.850290]  [<ffffffffa0d59cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs]
[19560.852156]  [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs]
[19560.853906]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
[19560.855574]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
[19560.857280]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
[19560.858890]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
[19560.860593]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
[20080.600849] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503389893/real 1503389893]  req@ffff880020c77600 x1576406407452176/t0(0) o250->MGC10.14.83.84@tcp@10.14.83.84@tcp:26/25 lens 520/544 e 0 to 1 dl 1503389948 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[20080.612492] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 15 previous similar messages
[20630.092975] Lustre: Failing over testfs-OST0001
[20630.107077] Lustre: server umount testfs-OST0001 complete

And the relevant process list on that machine:

...
23250 ?        Ds     0:00  |   \_ zpool export zfs_pool_scsi0QEMU_QEMU_HARDDISK_disk11
...
18651 ?        D      0:00 zpool import -f zfs_pool_scsi0QEMU_QEMU_HARDDISK_disk13
18768 ?        D      0:00 zpool list -H -o name
18800 ?        D      0:00 zpool list -H -o name
18832 ?        D      0:00 zpool list -H -o name
18864 ?        D      0:00 zpool list -H -o name
18896 ?        D      0:00 zpool list -H -o name
19018 ?        D      0:00 zpool list -H -o name

But to be clear, these two machines are completely unrelated to each other. They are from different clusters running different instances of the same test.

Comment by John Hammond [ 22/Aug/17 ]

Brian, are you using 2.10.0 exactly or a descendant? Is the ZFS version 0.6.5.9 or 0.7.1?

Comment by Brian Murrell (Inactive) [ 22/Aug/17 ]

jhammond: Apologies for not filling in the Environment field which should have told you what you needed to know. I have updated it. The answer specifically is that we are using the tip of b2_10.

Comment by Brian Murrell (Inactive) [ 22/Aug/17 ]

Upstream ticket

Comment by Joe Grund [ 03/Oct/17 ]

We think we've addressed this by integrating with MMP, and removing all force imports / exports.

Generated at Sat Feb 10 02:30:18 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.