Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9901

PANIC at zfeature.c:294:feature_get_enabled_txg()

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • None
    • Lustre 2.10.0
    • Lustre: Build Version: 2.10.0_61_g6aabd4a
      ZFS: 0.7.1-1
    • 3
    • 9223372036854775807

    Description

      We had two test runs stop at the exact same spot last night with the exact same error messages:

      [14833.875452] Lustre: testfs-OST0001: new disk, initializing
      [14833.878250] Lustre: srv-testfs-OST0001: No data found on store. Initialize space
      [14833.902088] Lustre: testfs-OST0001: Imperative Recovery not enabled, recovery window 300-900
      [14834.388551] Lustre: Failing over testfs-OST0001
      [14834.398323] Lustre: server umount testfs-OST0001 complete
      [14836.789429]  sda: sda1 sda9
      [14837.830153]  sde: sde1 sde9
      [14840.744138]  sda: sda1 sda9
      [14854.985498]  sde: sde1 sde9
      [14885.314589] Lustre: testfs-OST0001: Imperative Recovery not enabled, recovery window 300-900
      [14891.448270] Lustre: testfs-OST0001: Connection restored to testfs-MDT0000-mdtlov_UUID (at 10.14.83.53@tcp)
      [15042.279898] Lustre: 1995:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503391178/real 1503391178]  req@ffff88002681ad00 x1576412291011264/t0(0) o400->MGC10.14.83.52@tcp@10.14.83.52@tcp:26/25 lens 224/224 e 0 to 1 dl 1503391185 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      [15042.289748] Lustre: 1995:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 1 previous similar message
      [15042.292641] LustreError: 166-1: MGC10.14.83.52@tcp: Connection to MGS (at 10.14.83.52@tcp) was lost; in progress operations using this service will fail
      [15074.296779] Lustre: testfs-MDT0000-lwp-OST0001: Connection to testfs-MDT0000 (at 10.14.83.53@tcp) was lost; in progress operations using this service will wait for recovery to complete
      [15135.302559] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503391267/real 1503391267]  req@ffff88002681b900 x1576412291011408/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.53@tcp:12/10 lens 520/544 e 0 to 1 dl 1503391278 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      [15135.316461] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 6 previous similar messages
      [15300.302154] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503391417/real 1503391417]  req@ffff88002681a700 x1576412291011584/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.53@tcp:12/10 lens 520/544 e 0 to 1 dl 1503391443 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      [15300.313985] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 11 previous similar messages
      [15615.301742] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503391717/real 1503391717]  req@ffff880026818c00 x1576412291011776/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.53@tcp:12/10 lens 520/544 e 0 to 1 dl 1503391758 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      [15615.315431] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 11 previous similar messages
      [16254.304740] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503392342/real 1503392342]  req@ffff880021452100 x1576412291012080/t0(0) o250->MGC10.14.83.52@tcp@10.14.83.52@tcp:26/25 lens 520/544 e 0 to 1 dl 1503392397 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      [16254.318093] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 18 previous similar messages
      [16879.303144] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503392967/real 1503392967]  req@ffff880021450300 x1576412291012352/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.53@tcp:12/10 lens 520/544 e 0 to 1 dl 1503393022 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      [16879.317746] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 16 previous similar messages
      [17529.303018] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503393617/real 1503393617]  req@ffff880021451500 x1576412291012656/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.53@tcp:12/10 lens 520/544 e 0 to 1 dl 1503393672 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      [17529.319175] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 18 previous similar messages
      [18129.305069] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503394217/real 1503394217]  req@ffff88002681a700 x1576412291012912/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.53@tcp:12/10 lens 520/544 e 0 to 1 dl 1503394272 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      [18129.320603] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 15 previous similar messages
      [18400.583192] WARNING: can't open objset 51, error 5
      [18400.608243] VERIFY3(0 == zap_lookup(spa->spa_meta_objset, spa->spa_feat_enabled_txg_obj, feature->fi_guid, sizeof (uint64_t), 1, res)) failed (0 == 52)
      [18400.618086] PANIC at zfeature.c:294:feature_get_enabled_txg()
      [18400.622541] Showing stack for process 15981
      [18400.626693] CPU: 1 PID: 15981 Comm: zpool Tainted: P           OE  ------------   3.10.0-514.26.2.el7_lustre.x86_64 #1
      [18400.631653] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [18400.635887]  ffffffffa0e1d9b9 000000009e369596 ffff88007a42f8f8 ffffffff8168729f
      [18400.640500]  ffff88007a42f908 ffffffffa0bee254 ffff88007a42fa90 ffffffffa0bee329
      [18400.644779]  000000009e369596 ffff880000000030 ffff88007a42faa0 ffff88007a42fa40
      [18400.649018] Call Trace:
      [18400.652394]  [<ffffffff8168729f>] dump_stack+0x19/0x1b
      [18400.656133]  [<ffffffffa0bee254>] spl_dumpstack+0x44/0x50 [spl]
      [18400.659779]  [<ffffffffa0bee329>] spl_panic+0xc9/0x110 [spl]
      [18400.663224]  [<ffffffffa0d42fe0>] ? zap_lookup_norm+0x60/0xd0 [zfs]
      [18400.666665]  [<ffffffffa0d45345>] spa_feature_enabled_txg+0x105/0x110 [zfs]
      [18400.670112]  [<ffffffffa0cd931b>] traverse_impl+0x3db/0x460 [zfs]
      [18400.673268]  [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0
      [18400.676337]  [<ffffffffa0bea319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl]
      [18400.679462]  [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
      [18400.682391]  [<ffffffffa0cd9834>] traverse_pool+0x84/0x1d0 [zfs]
      [18400.685219]  [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
      [18400.688157]  [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
      [18400.690844]  [<ffffffffa0d7d02a>] ? zio_null+0x6a/0x70 [zfs]
      [18400.693403]  [<ffffffffa0d16128>] spa_load+0x1a58/0x2030 [zfs]
      [18400.695969]  [<ffffffffa0d1675e>] spa_load_best+0x5e/0x290 [zfs]
      [18400.698506]  [<ffffffffa0d184a2>] spa_import+0x212/0x730 [zfs]
      [18400.700874]  [<ffffffffa0d53a27>] zfs_ioc_pool_import+0x147/0x160 [zfs]
      [18400.703281]  [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs]
      [18400.705594]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
      [18400.707786]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
      [18400.710014]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
      [18400.712253]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
      [18400.714424]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
      [18600.401090] INFO: task zpool:15981 blocked for more than 120 seconds.
      [18600.406480] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [18600.410711] zpool           D ffff88007b996850     0 15981  15941 0x00000080
      [18600.414668]  ffff88007a42f8f8 0000000000000086 ffff880026af3ec0 ffff88007a42ffd8
      [18600.418542]  ffff88007a42ffd8 ffff88007a42ffd8 ffff880026af3ec0 ffffffffa0e1d9b9
      [18600.422329]  0000000000000126 ffffffffa0dfef70 ffffffffa0e1dcd8 ffff88007b996850
      [18600.426025] Call Trace:
      [18600.428721]  [<ffffffff8168c969>] schedule+0x29/0x70
      [18600.431740]  [<ffffffffa0bee355>] spl_panic+0xf5/0x110 [spl]
      [18600.434684]  [<ffffffffa0d42fe0>] ? zap_lookup_norm+0x60/0xd0 [zfs]
      [18600.437608]  [<ffffffffa0d45345>] spa_feature_enabled_txg+0x105/0x110 [zfs]
      [18600.440576]  [<ffffffffa0cd931b>] traverse_impl+0x3db/0x460 [zfs]
      [18600.443208]  [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0
      [18600.445769]  [<ffffffffa0bea319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl]
      [18600.448395]  [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
      [18600.451054]  [<ffffffffa0cd9834>] traverse_pool+0x84/0x1d0 [zfs]
      [18600.453420]  [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
      [18600.455770]  [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
      [18600.458073]  [<ffffffffa0d7d02a>] ? zio_null+0x6a/0x70 [zfs]
      [18600.460240]  [<ffffffffa0d16128>] spa_load+0x1a58/0x2030 [zfs]
      [18600.462362]  [<ffffffffa0d1675e>] spa_load_best+0x5e/0x290 [zfs]
      [18600.464379]  [<ffffffffa0d184a2>] spa_import+0x212/0x730 [zfs]
      [18600.466391]  [<ffffffffa0d53a27>] zfs_ioc_pool_import+0x147/0x160 [zfs]
      [18600.468478]  [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs]
      [18600.470429]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
      [18600.472467]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
      [18600.474422]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
      [18600.476269]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
      [18600.478246]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
      [18600.480194] INFO: task zpool:16142 blocked for more than 120 seconds.
      [18600.482172] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [18600.484232] zpool           D ffffffffa0f22fa8     0 16142      1 0x00000084
      [18600.486254]  ffff880022043d60 0000000000000082 ffff880036440fb0 ffff880022043fd8
      [18600.488358]  ffff880022043fd8 ffff880022043fd8 ffff880036440fb0 ffffffffa0f22fa0
      [18600.490470]  ffffffffa0f22fa4 ffff880036440fb0 00000000ffffffff ffffffffa0f22fa8
      [18600.492548] Call Trace:
      [18600.494035]  [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70
      [18600.496033]  [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0
      [18600.497883]  [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f
      [18600.499666]  [<ffffffffa0d1b6c0>] spa_all_configs+0x40/0x120 [zfs]
      [18600.501572]  [<ffffffffa0d57cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs]
      [18600.503400]  [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs]
      [18600.505153]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
      [18600.506874]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
      [18600.508643]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
      [18600.510361]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
      [18600.512129]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
      [18720.513066] INFO: task zpool:15981 blocked for more than 120 seconds.
      [18720.516508] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [18720.519948] zpool           D ffff88007b996850     0 15981      1 0x00000084
      [18720.523326]  ffff88007a42f8f8 0000000000000086 ffff880026af3ec0 ffff88007a42ffd8
      [18720.526586]  ffff88007a42ffd8 ffff88007a42ffd8 ffff880026af3ec0 ffffffffa0e1d9b9
      [18720.529828]  0000000000000126 ffffffffa0dfef70 ffffffffa0e1dcd8 ffff88007b996850
      [18720.533034] Call Trace:
      [18720.535282]  [<ffffffff8168c969>] schedule+0x29/0x70
      [18720.537973]  [<ffffffffa0bee355>] spl_panic+0xf5/0x110 [spl]
      [18720.540701]  [<ffffffffa0d42fe0>] ? zap_lookup_norm+0x60/0xd0 [zfs]
      [18720.543465]  [<ffffffffa0d45345>] spa_feature_enabled_txg+0x105/0x110 [zfs]
      [18720.546169]  [<ffffffffa0cd931b>] traverse_impl+0x3db/0x460 [zfs]
      [18720.548680]  [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0
      [18720.551141]  [<ffffffffa0bea319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl]
      [18720.553787]  [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
      [18720.556222]  [<ffffffffa0cd9834>] traverse_pool+0x84/0x1d0 [zfs]
      [18720.558741]  [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
      [18720.561158]  [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
      [18720.563569]  [<ffffffffa0d7d02a>] ? zio_null+0x6a/0x70 [zfs]
      [18720.565793]  [<ffffffffa0d16128>] spa_load+0x1a58/0x2030 [zfs]
      [18720.568028]  [<ffffffffa0d1675e>] spa_load_best+0x5e/0x290 [zfs]
      [18720.570276]  [<ffffffffa0d184a2>] spa_import+0x212/0x730 [zfs]
      [18720.572496]  [<ffffffffa0d53a27>] zfs_ioc_pool_import+0x147/0x160 [zfs]
      [18720.574745]  [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs]
      [18720.576819]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
      [18720.578837]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
      [18720.582488]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
      [18720.584451]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
      [18720.586520]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
      [18720.588603] INFO: task zpool:16142 blocked for more than 120 seconds.
      [18720.590724] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [18720.592984] zpool           D ffffffffa0f22fa8     0 16142      1 0x00000084
      [18720.595153]  ffff880022043d60 0000000000000082 ffff880036440fb0 ffff880022043fd8
      [18720.597356]  ffff880022043fd8 ffff880022043fd8 ffff880036440fb0 ffffffffa0f22fa0
      [18720.599540]  ffffffffa0f22fa4 ffff880036440fb0 00000000ffffffff ffffffffa0f22fa8
      [18720.601732] Call Trace:
      [18720.603309]  [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70
      [18720.605390]  [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0
      [18720.607287]  [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f
      [18720.609093]  [<ffffffffa0d1b6c0>] spa_all_configs+0x40/0x120 [zfs]
      [18720.611025]  [<ffffffffa0d57cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs]
      [18720.612979]  [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs]
      [18720.614836]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
      [18720.616581]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
      [18720.618358]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
      [18720.620030]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
      [18720.621907]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
      [18720.623680] INFO: task zpool:16158 blocked for more than 120 seconds.
      [18720.625393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [18720.627258] zpool           D ffffffffa0f22fa8     0 16158      1 0x00000084
      [18720.629098]  ffff880042fd3d60 0000000000000082 ffff880036440000 ffff880042fd3fd8
      [18720.630976]  ffff880042fd3fd8 ffff880042fd3fd8 ffff880036440000 ffffffffa0f22fa0
      [18720.632855]  ffffffffa0f22fa4 ffff880036440000 00000000ffffffff ffffffffa0f22fa8
      [18720.634731] Call Trace:
      [18720.636082]  [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70
      [18720.637802]  [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0
      [18720.639652]  [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f
      [18720.641290]  [<ffffffffa0d1b6c0>] spa_all_configs+0x40/0x120 [zfs]
      [18720.643023]  [<ffffffffa0d57cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs]
      [18720.644796]  [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs]
      [18720.646489]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
      [18720.648106]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
      [18720.649788]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
      [18720.651362]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
      [18720.653035]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
      [18720.654733] INFO: task zpool:16174 blocked for more than 120 seconds.
      [18720.656571] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [18720.658449] zpool           D ffffffffa0f22fa8     0 16174      1 0x00000084
      [18720.660311]  ffff8800115a7d60 0000000000000086 ffff880026169f60 ffff8800115a7fd8
      [18720.662247]  ffff8800115a7fd8 ffff8800115a7fd8 ffff880026169f60 ffffffffa0f22fa0
      [18720.664184]  ffffffffa0f22fa4 ffff880026169f60 00000000ffffffff ffffffffa0f22fa8
      [18720.666126] Call Trace:
      [18720.667565]  [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70
      [18720.669366]  [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0
      [18720.671150]  [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f
      [18720.672871]  [<ffffffffa0d1b6c0>] spa_all_configs+0x40/0x120 [zfs]
      [18720.674837]  [<ffffffffa0d57cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs]
      [18720.676696]  [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs]
      [18720.678479]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
      [18720.680151]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
      [18720.681882]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
      [18720.683524]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
      [18720.685240]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
      [18754.305036] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503394842/real 1503394842]  req@ffff88002681b600 x1576412291013184/t0(0) o250->MGC10.14.83.52@tcp@10.14.83.52@tcp:26/25 lens 520/544 e 0 to 1 dl 1503394897 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      [18754.316977] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 16 previous similar messages
      [18840.686099] INFO: task zpool:15981 blocked for more than 120 seconds.
      [18840.691071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [18840.694953] zpool           D ffff88007b996850     0 15981      1 0x00000084
      [18840.698602]  ffff88007a42f8f8 0000000000000086 ffff880026af3ec0 ffff88007a42ffd8
      [18840.702216]  ffff88007a42ffd8 ffff88007a42ffd8 ffff880026af3ec0 ffffffffa0e1d9b9
      [18840.705869]  0000000000000126 ffffffffa0dfef70 ffffffffa0e1dcd8 ffff88007b996850
      [18840.709323] Call Trace:
      [18840.711896]  [<ffffffff8168c969>] schedule+0x29/0x70
      [18840.714985]  [<ffffffffa0bee355>] spl_panic+0xf5/0x110 [spl]
      [18840.718012]  [<ffffffffa0d42fe0>] ? zap_lookup_norm+0x60/0xd0 [zfs]
      [18840.720950]  [<ffffffffa0d45345>] spa_feature_enabled_txg+0x105/0x110 [zfs]
      [18840.723927]  [<ffffffffa0cd931b>] traverse_impl+0x3db/0x460 [zfs]
      [18840.726745]  [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0
      [18840.729416]  [<ffffffffa0bea319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl]
      [18840.732143]  [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
      [18840.734821]  [<ffffffffa0cd9834>] traverse_pool+0x84/0x1d0 [zfs]
      [18840.737421]  [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
      [18840.739974]  [<ffffffffa0d0ed10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
      [18840.742654]  [<ffffffffa0d7d02a>] ? zio_null+0x6a/0x70 [zfs]
      [18840.745083]  [<ffffffffa0d16128>] spa_load+0x1a58/0x2030 [zfs]
      [18840.747490]  [<ffffffffa0d1675e>] spa_load_best+0x5e/0x290 [zfs]
      [18840.749804]  [<ffffffffa0d184a2>] spa_import+0x212/0x730 [zfs]
      [18840.752141]  [<ffffffffa0d53a27>] zfs_ioc_pool_import+0x147/0x160 [zfs]
      [18840.754539]  [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs]
      [18840.756837]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
      [18840.759036]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
      [18840.761354]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
      [18840.763490]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
      [18840.765856]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
      [18840.768014] INFO: task zpool:16142 blocked for more than 120 seconds.
      [18840.770176] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [18840.772502] zpool           D ffffffffa0f22fa8     0 16142      1 0x00000084
      [18840.774773]  ffff880022043d60 0000000000000082 ffff880036440fb0 ffff880022043fd8
      [18840.777131]  ffff880022043fd8 ffff880022043fd8 ffff880036440fb0 ffffffffa0f22fa0
      [18840.779409]  ffffffffa0f22fa4 ffff880036440fb0 00000000ffffffff ffffffffa0f22fa8
      [18840.781655] Call Trace:
      [18840.783323]  [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70
      [18840.785410]  [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0
      [18840.787611]  [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f
      [18840.789515]  [<ffffffffa0d1b6c0>] spa_all_configs+0x40/0x120 [zfs]
      [18840.791552]  [<ffffffffa0d57cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs]
      [18840.793580]  [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs]
      [18840.795556]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
      [18840.797462]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
      [18840.799316]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
      [18840.801034]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
      [18840.802791]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
      [18840.804499] INFO: task zpool:16158 blocked for more than 120 seconds.
      [18840.806377] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [18840.808248] zpool           D ffffffffa0f22fa8     0 16158      1 0x00000084
      [18840.810085]  ffff880042fd3d60 0000000000000082 ffff880036440000 ffff880042fd3fd8
      [18840.811988]  ffff880042fd3fd8 ffff880042fd3fd8 ffff880036440000 ffffffffa0f22fa0
      [18840.813881]  ffffffffa0f22fa4 ffff880036440000 00000000ffffffff ffffffffa0f22fa8
      [18840.815763] Call Trace:
      [18840.817122]  [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70
      [18840.818878]  [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0
      [18840.820589]  [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f
      [18840.822233]  [<ffffffffa0d1b6c0>] spa_all_configs+0x40/0x120 [zfs]
      [18840.824108]  [<ffffffffa0d57cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs]
      [18840.825875]  [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs]
      [18840.827558]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
      [18840.829207]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
      [18840.830897]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
      [18840.832483]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
      [18840.834151]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
      [18840.835868] INFO: task zpool:16174 blocked for more than 120 seconds.
      [18840.837608] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [18840.839502] zpool           D ffffffffa0f22fa8     0 16174      1 0x00000084
      [18840.841413]  ffff8800115a7d60 0000000000000086 ffff880026169f60 ffff8800115a7fd8
      [18840.843350]  ffff8800115a7fd8 ffff8800115a7fd8 ffff880026169f60 ffffffffa0f22fa0
      [18840.845311]  ffffffffa0f22fa4 ffff880026169f60 00000000ffffffff ffffffffa0f22fa8
      [18840.847279] Call Trace:
      [18840.848725]  [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70
      [18840.850537]  [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0
      [18840.852321]  [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f
      [18840.854048]  [<ffffffffa0d1b6c0>] spa_all_configs+0x40/0x120 [zfs]
      [18840.855873]  [<ffffffffa0d57cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs]
      [18840.857717]  [<ffffffffa0d58626>] zfsdev_ioctl+0x606/0x650 [zfs]
      [18840.859633]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
      [18840.861311]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
      [18840.863054]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
      [18840.864694]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
      [18840.866468]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
      [19354.308243] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503395442/real 1503395442]  req@ffff88002681a400 x1576412291013440/t0(0) o250->MGC10.14.83.52@tcp@10.14.83.52@tcp:26/25 lens 520/544 e 0 to 1 dl 1503395497 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      [19354.319940] Lustre: 1994:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 15 previous similar messages
      [19606.633901] Lustre: Failing over testfs-OST0001
      [19606.643768] Lustre: server umount testfs-OST0001 complete
      

      Relevant entries in the process list on the machine:

      ...
      19296 ?        Ds     0:00  |   \_ zpool export zfs_pool_scsi0QEMU_QEMU_HARDDISK_disk11
      ...
      15981 ?        D      0:00 zpool import -f zfs_pool_scsi0QEMU_QEMU_HARDDISK_disk15
      16142 ?        D      0:00 zpool list -H -o name
      16158 ?        D      0:00 zpool list -H -o name
      16174 ?        D      0:00 zpool list -H -o name
      16190 ?        D      0:00 zpool list -H -o name
      16214 ?        D      0:00 zpool list -H -o name
      16231 ?        D      0:00 zpool list -H -o name
      

      Attachments

        Issue Links

          Activity

            [LU-9901] PANIC at zfeature.c:294:feature_get_enabled_txg()
            joe.grund Joe Grund added a comment -

            We think we've addressed this by integrating with MMP, and removing all force imports / exports.

            joe.grund Joe Grund added a comment - We think we've addressed this by integrating with MMP, and removing all force imports / exports.

            jhammond: Apologies for not filling in the Environment field which should have told you what you needed to know. I have updated it. The answer specifically is that we are using the tip of b2_10.

            brian Brian Murrell (Inactive) added a comment - jhammond : Apologies for not filling in the Environment field which should have told you what you needed to know. I have updated it. The answer specifically is that we are using the tip of b2_10.
            jhammond John Hammond added a comment -

            Brian, are you using 2.10.0 exactly or a descendant? Is the ZFS version 0.6.5.9 or 0.7.1?

            jhammond John Hammond added a comment - Brian, are you using 2.10.0 exactly or a descendant? Is the ZFS version 0.6.5.9 or 0.7.1?

            From the second server, just in case it sheds any different/new light:

            [15667.882446] Lustre: testfs-OST0001: new disk, initializing
            [15667.884987] Lustre: srv-testfs-OST0001: No data found on store. Initialize space
            [15667.917724] Lustre: testfs-OST0001: Imperative Recovery not enabled, recovery window 300-900
            [15668.707161] Lustre: Failing over testfs-OST0001
            [15668.718588] Lustre: server umount testfs-OST0001 complete
            [15700.116755]  sda: sda1 sda9
            [15700.134485]  sdc: sdc1 sdc9
            [15700.156137]  sde: sde1 sde9
            [15703.114258]  sda: sda1 sda9
            [15704.405515] Lustre: testfs-OST0001: Imperative Recovery not enabled, recovery window 300-900
            [15709.450388] Lustre: testfs-OST0001: Connection restored to 10.14.83.85@tcp (at 10.14.83.85@tcp)
            [15711.230525]  sdc: sdc1 sdc9
            [15886.394557] Lustre: 8675:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503385746/real 1503385746]  req@ffff8800443cfc00 x1576406407449968/t0(0) o400->MGC10.14.83.84@tcp@10.14.83.84@tcp:26/25 lens 224/224 e 0 to 1 dl 1503385753 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
            [15886.405560] LustreError: 166-1: MGC10.14.83.84@tcp: Connection to MGS (at 10.14.83.84@tcp) was lost; in progress operations using this service will fail
            [15911.419483] Lustre: Evicted from MGS (at 10.14.83.85@tcp) after server handle changed from 0xbb7c0e830fe6b896 to 0x50fc281a1a2827fe
            [15911.427491] Lustre: MGC10.14.83.84@tcp: Connection restored to 10.14.83.85@tcp (at 10.14.83.85@tcp)
            [15918.410283] Lustre: 8675:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503385778/real 1503385778]  req@ffff8800443cf000 x1576406407450032/t0(0) o400->testfs-MDT0000-lwp-OST0001@10.14.83.85@tcp:12/10 lens 224/224 e 0 to 1 dl 1503385785 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
            [15918.423482] Lustre: 8675:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 1 previous similar message
            [15918.427329] Lustre: testfs-MDT0000-lwp-OST0001: Connection to testfs-MDT0000 (at 10.14.83.85@tcp) was lost; in progress operations using this service will wait for recovery to complete
            [15925.558268] LustreError: 166-1: MGC10.14.83.84@tcp: Connection to MGS (at 10.14.83.85@tcp) was lost; in progress operations using this service will fail
            [15925.567055] LustreError: 9604:0:(mgc_request.c:603:do_requeue()) failed processing log: -5
            [15992.570650] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503385849/real 1503385849]  req@ffff8800443ce100 x1576406407450176/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.85@tcp:12/10 lens 520/544 e 0 to 1 dl 1503385860 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
            [15992.585639] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 6 previous similar messages
            [16025.570428] LustreError: 9604:0:(mgc_request.c:603:do_requeue()) failed processing log: -5
            [16121.575838] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503385968/real 1503385968]  req@ffff8800443cd200 x1576406407450336/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.85@tcp:12/10 lens 520/544 e 0 to 1 dl 1503385989 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
            [16121.590976] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 8 previous similar messages
            [16125.575897] LustreError: 9604:0:(mgc_request.c:603:do_requeue()) failed processing log: -5
            [16225.581271] LustreError: 9604:0:(mgc_request.c:603:do_requeue()) failed processing log: -108
            [16325.587390] LustreError: 9604:0:(mgc_request.c:603:do_requeue()) failed processing log: -5
            [16386.587316] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503386218/real 1503386218]  req@ffff880020c74000 x1576406407450544/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.85@tcp:12/10 lens 520/544 e 0 to 1 dl 1503386254 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
            [16386.604235] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 11 previous similar messages
            [16930.589449] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503386743/real 1503386743]  req@ffff880020c75b00 x1576406407450832/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.84@tcp:12/10 lens 520/544 e 0 to 1 dl 1503386798 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
            [16930.606293] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 16 previous similar messages
            [17530.590000] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503387343/real 1503387343]  req@ffff8800443cc300 x1576406407451072/t0(0) o250->MGC10.14.83.84@tcp@10.14.83.85@tcp:26/25 lens 520/544 e 0 to 1 dl 1503387398 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
            [17530.609959] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 15 previous similar messages
            [18130.596259] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503387943/real 1503387943]  req@ffff8800443cc900 x1576406407451344/t0(0) o250->MGC10.14.83.84@tcp@10.14.83.84@tcp:26/25 lens 520/544 e 0 to 1 dl 1503387998 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
            [18130.615008] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 16 previous similar messages
            [18805.596314] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503388618/real 1503388618]  req@ffff8800443cea00 x1576406407451648/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.85@tcp:12/10 lens 520/544 e 0 to 1 dl 1503388673 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
            [18805.613730] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 17 previous similar messages
            [19124.879146] WARNING: can't open objset 51, error 5
            [19124.895291] VERIFY3(0 == zap_lookup(spa->spa_meta_objset, spa->spa_feat_enabled_txg_obj, feature->fi_guid, sizeof (uint64_t), 1, res)) failed (0 == 52)
            [19124.905980] PANIC at zfeature.c:294:feature_get_enabled_txg()
            [19124.910754] Showing stack for process 18651
            [19124.915295] CPU: 1 PID: 18651 Comm: zpool Tainted: P           OE  ------------   3.10.0-514.26.2.el7_lustre.x86_64 #1
            [19124.920749] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
            [19124.925537]  ffffffffa0e1f9b9 0000000051f16af8 ffff8800121ab8f8 ffffffff8168729f
            [19124.930690]  ffff8800121ab908 ffffffffa0bf0254 ffff8800121aba90 ffffffffa0bf0329
            [19124.935541]  0000000051f16af8 ffff880000000030 ffff8800121abaa0 ffff8800121aba40
            [19124.939945] Call Trace:
            [19124.943537]  [<ffffffff8168729f>] dump_stack+0x19/0x1b
            [19124.947382]  [<ffffffffa0bf0254>] spl_dumpstack+0x44/0x50 [spl]
            [19124.951156]  [<ffffffffa0bf0329>] spl_panic+0xc9/0x110 [spl]
            [19124.954764]  [<ffffffffa0d44fe0>] ? zap_lookup_norm+0x60/0xd0 [zfs]
            [19124.958185]  [<ffffffff8168c969>] ? schedule+0x29/0x70
            [19124.961438]  [<ffffffff8168a3a9>] ? schedule_timeout+0x239/0x2c0
            [19124.964782]  [<ffffffffa0d47345>] spa_feature_enabled_txg+0x105/0x110 [zfs]
            [19124.967995]  [<ffffffffa0cdb31b>] traverse_impl+0x3db/0x460 [zfs]
            [19124.971047]  [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0
            [19124.974060]  [<ffffffffa0bec319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl]
            [19124.977064]  [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
            [19124.980079]  [<ffffffffa0cdb834>] traverse_pool+0x84/0x1d0 [zfs]
            [19124.982868]  [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
            [19124.985617]  [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
            [19124.988234]  [<ffffffffa0d7f02a>] ? zio_null+0x6a/0x70 [zfs]
            [19124.990798]  [<ffffffffa0d18128>] spa_load+0x1a58/0x2030 [zfs]
            [19124.993366]  [<ffffffffa0d1875e>] spa_load_best+0x5e/0x290 [zfs]
            [19124.995935]  [<ffffffffa0d1a4a2>] spa_import+0x212/0x730 [zfs]
            [19124.998374]  [<ffffffffa0d55a27>] zfs_ioc_pool_import+0x147/0x160 [zfs]
            [19125.000830]  [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs]
            [19125.003126]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
            [19125.005463]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
            [19125.007630]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
            [19125.009651]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
            [19125.011740]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
            [19320.367509] INFO: task zpool:18651 blocked for more than 120 seconds.
            [19320.372490] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
            [19320.376754] zpool           D ffff88001d84e850     0 18651  18627 0x00000080
            [19320.380708]  ffff8800121ab8f8 0000000000000082 ffff880010ba0fb0 ffff8800121abfd8
            [19320.384641]  ffff8800121abfd8 ffff8800121abfd8 ffff880010ba0fb0 ffffffffa0e1f9b9
            [19320.388572]  0000000000000126 ffffffffa0e00f70 ffffffffa0e1fcd8 ffff88001d84e850
            [19320.392387] Call Trace:
            [19320.395307]  [<ffffffff8168c969>] schedule+0x29/0x70
            [19320.398704]  [<ffffffffa0bf0355>] spl_panic+0xf5/0x110 [spl]
            [19320.401801]  [<ffffffffa0d44fe0>] ? zap_lookup_norm+0x60/0xd0 [zfs]
            [19320.404847]  [<ffffffff8168c969>] ? schedule+0x29/0x70
            [19320.407738]  [<ffffffff8168a3a9>] ? schedule_timeout+0x239/0x2c0
            [19320.410561]  [<ffffffffa0d47345>] spa_feature_enabled_txg+0x105/0x110 [zfs]
            [19320.413381]  [<ffffffffa0cdb31b>] traverse_impl+0x3db/0x460 [zfs]
            [19320.416000]  [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0
            [19320.418611]  [<ffffffffa0bec319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl]
            [19320.421192]  [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
            [19320.423688]  [<ffffffffa0cdb834>] traverse_pool+0x84/0x1d0 [zfs]
            [19320.426251]  [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
            [19320.428743]  [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
            [19320.431079]  [<ffffffffa0d7f02a>] ? zio_null+0x6a/0x70 [zfs]
            [19320.433308]  [<ffffffffa0d18128>] spa_load+0x1a58/0x2030 [zfs]
            [19320.435512]  [<ffffffffa0d1875e>] spa_load_best+0x5e/0x290 [zfs]
            [19320.437740]  [<ffffffffa0d1a4a2>] spa_import+0x212/0x730 [zfs]
            [19320.439853]  [<ffffffffa0d55a27>] zfs_ioc_pool_import+0x147/0x160 [zfs]
            [19320.442007]  [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs]
            [19320.444030]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
            [19320.445995]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
            [19320.448208]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
            [19320.450087]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
            [19320.452013]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
            [19320.453955] INFO: task zpool:18768 blocked for more than 120 seconds.
            [19320.455929] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
            [19320.458108] zpool           D ffffffffa0f24fa8     0 18768      1 0x00000084
            [19320.460257]  ffff88001250fd60 0000000000000086 ffff8800799dce70 ffff88001250ffd8
            [19320.462466]  ffff88001250ffd8 ffff88001250ffd8 ffff8800799dce70 ffffffffa0f24fa0
            [19320.464712]  ffffffffa0f24fa4 ffff8800799dce70 00000000ffffffff ffffffffa0f24fa8
            [19320.466924] Call Trace:
            [19320.468524]  [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70
            [19320.470708]  [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0
            [19320.472673]  [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f
            [19320.474542]  [<ffffffffa0d1d6c0>] spa_all_configs+0x40/0x120 [zfs]
            [19320.476510]  [<ffffffffa0d59cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs]
            [19320.478515]  [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs]
            [19320.480341]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
            [19320.482125]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
            [19320.483982]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
            [19320.485702]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
            [19320.487660]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
            [19440.489227] INFO: task zpool:18651 blocked for more than 120 seconds.
            [19440.492719] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
            [19440.496212] zpool           D ffff88001d84e850     0 18651      1 0x00000084
            [19440.499703]  ffff8800121ab8f8 0000000000000082 ffff880010ba0fb0 ffff8800121abfd8
            [19440.503335]  ffff8800121abfd8 ffff8800121abfd8 ffff880010ba0fb0 ffffffffa0e1f9b9
            [19440.506612]  0000000000000126 ffffffffa0e00f70 ffffffffa0e1fcd8 ffff88001d84e850
            [19440.509882] Call Trace:
            [19440.512287]  [<ffffffff8168c969>] schedule+0x29/0x70
            [19440.515148]  [<ffffffffa0bf0355>] spl_panic+0xf5/0x110 [spl]
            [19440.518148]  [<ffffffffa0d44fe0>] ? zap_lookup_norm+0x60/0xd0 [zfs]
            [19440.521164]  [<ffffffff8168c969>] ? schedule+0x29/0x70
            [19440.523946]  [<ffffffff8168a3a9>] ? schedule_timeout+0x239/0x2c0
            [19440.526764]  [<ffffffffa0d47345>] spa_feature_enabled_txg+0x105/0x110 [zfs]
            [19440.529697]  [<ffffffffa0cdb31b>] traverse_impl+0x3db/0x460 [zfs]
            [19440.532370]  [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0
            [19440.534989]  [<ffffffffa0bec319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl]
            [19440.537705]  [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
            [19440.540353]  [<ffffffffa0cdb834>] traverse_pool+0x84/0x1d0 [zfs]
            [19440.542846]  [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
            [19440.545361]  [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
            [19440.547887]  [<ffffffffa0d7f02a>] ? zio_null+0x6a/0x70 [zfs]
            [19440.550464]  [<ffffffffa0d18128>] spa_load+0x1a58/0x2030 [zfs]
            [19440.552802]  [<ffffffffa0d1875e>] spa_load_best+0x5e/0x290 [zfs]
            [19440.555110]  [<ffffffffa0d1a4a2>] spa_import+0x212/0x730 [zfs]
            [19440.557406]  [<ffffffffa0d55a27>] zfs_ioc_pool_import+0x147/0x160 [zfs]
            [19440.559811]  [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs]
            [19440.561972]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
            [19440.564049]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
            [19440.566271]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
            [19440.568328]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
            [19440.570443]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
            [19440.572608] INFO: task zpool:18768 blocked for more than 120 seconds.
            [19440.574650] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
            [19440.576829] zpool           D ffffffffa0f24fa8     0 18768      1 0x00000084
            [19440.579018]  ffff88001250fd60 0000000000000086 ffff8800799dce70 ffff88001250ffd8
            [19440.581310]  ffff88001250ffd8 ffff88001250ffd8 ffff8800799dce70 ffffffffa0f24fa0
            [19440.583615]  ffffffffa0f24fa4 ffff8800799dce70 00000000ffffffff ffffffffa0f24fa8
            [19440.585846] Call Trace:
            [19440.587477]  [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70
            [19440.589545]  [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0
            [19440.591507]  [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f
            [19440.593495]  [<ffffffffa0d1d6c0>] spa_all_configs+0x40/0x120 [zfs]
            [19440.595469]  [<ffffffffa0d59cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs]
            [19440.597487]  [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs]
            [19440.599490]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
            [19440.601293]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
            [19440.603152]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
            [19440.604895]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
            [19440.606739]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
            [19440.608540] INFO: task zpool:18800 blocked for more than 120 seconds.
            [19440.610354] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
            [19440.612352] zpool           D ffffffffa0f24fa8     0 18800      1 0x00000084
            [19440.614246]  ffff8800122dbd60 0000000000000082 ffff880010ba6dd0 ffff8800122dbfd8
            [19440.616185]  ffff8800122dbfd8 ffff8800122dbfd8 ffff880010ba6dd0 ffffffffa0f24fa0
            [19440.618092]  ffffffffa0f24fa4 ffff880010ba6dd0 00000000ffffffff ffffffffa0f24fa8
            [19440.619977] Call Trace:
            [19440.621338]  [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70
            [19440.623054]  [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0
            [19440.624759]  [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f
            [19440.626385]  [<ffffffffa0d1d6c0>] spa_all_configs+0x40/0x120 [zfs]
            [19440.628090]  [<ffffffffa0d59cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs]
            [19440.629920]  [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs]
            [19440.631628]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
            [19440.633269]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
            [19440.634968]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
            [19440.636528]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
            [19440.638164]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
            [19440.639822] INFO: task zpool:18832 blocked for more than 120 seconds.
            [19440.641528] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
            [19440.643369] zpool           D ffffffffa0f24fa8     0 18832      1 0x00000084
            [19440.645209]  ffff880012507d60 0000000000000082 ffff88001e53edd0 ffff880012507fd8
            [19440.647187]  ffff880012507fd8 ffff880012507fd8 ffff88001e53edd0 ffffffffa0f24fa0
            [19440.649283]  ffffffffa0f24fa4 ffff88001e53edd0 00000000ffffffff ffffffffa0f24fa8
            [19440.651239] Call Trace:
            [19440.652709]  [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70
            [19440.654518]  [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0
            [19440.656271]  [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f
            [19440.657960]  [<ffffffffa0d1d6c0>] spa_all_configs+0x40/0x120 [zfs]
            [19440.659744]  [<ffffffffa0d59cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs]
            [19440.661554]  [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs]
            [19440.663293]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
            [19440.665097]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
            [19440.666816]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
            [19440.668431]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
            [19440.670127]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
            [19480.599139] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503389293/real 1503389293]  req@ffff8800443ced00 x1576406407451920/t0(0) o250->MGC10.14.83.84@tcp@10.14.83.84@tcp:26/25 lens 520/544 e 0 to 1 dl 1503389348 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
            [19480.611983] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 17 previous similar messages
            [19560.671087] INFO: task zpool:18651 blocked for more than 120 seconds.
            [19560.678715] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
            [19560.682600] zpool           D ffff88001d84e850     0 18651      1 0x00000084
            [19560.686223]  ffff8800121ab8f8 0000000000000082 ffff880010ba0fb0 ffff8800121abfd8
            [19560.690105]  ffff8800121abfd8 ffff8800121abfd8 ffff880010ba0fb0 ffffffffa0e1f9b9
            [19560.693820]  0000000000000126 ffffffffa0e00f70 ffffffffa0e1fcd8 ffff88001d84e850
            [19560.697666] Call Trace:
            [19560.700549]  [<ffffffff8168c969>] schedule+0x29/0x70
            [19560.703956]  [<ffffffffa0bf0355>] spl_panic+0xf5/0x110 [spl]
            [19560.707126]  [<ffffffffa0d44fe0>] ? zap_lookup_norm+0x60/0xd0 [zfs]
            [19560.710365]  [<ffffffff8168c969>] ? schedule+0x29/0x70
            [19560.713259]  [<ffffffff8168a3a9>] ? schedule_timeout+0x239/0x2c0
            [19560.716231]  [<ffffffffa0d47345>] spa_feature_enabled_txg+0x105/0x110 [zfs]
            [19560.719335]  [<ffffffffa0cdb31b>] traverse_impl+0x3db/0x460 [zfs]
            [19560.722251]  [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0
            [19560.724959]  [<ffffffffa0bec319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl]
            [19560.727796]  [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
            [19560.730692]  [<ffffffffa0cdb834>] traverse_pool+0x84/0x1d0 [zfs]
            [19560.733315]  [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
            [19560.735957]  [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs]
            [19560.738547]  [<ffffffffa0d7f02a>] ? zio_null+0x6a/0x70 [zfs]
            [19560.741076]  [<ffffffffa0d18128>] spa_load+0x1a58/0x2030 [zfs]
            [19560.743504]  [<ffffffffa0d1875e>] spa_load_best+0x5e/0x290 [zfs]
            [19560.745918]  [<ffffffffa0d1a4a2>] spa_import+0x212/0x730 [zfs]
            [19560.748273]  [<ffffffffa0d55a27>] zfs_ioc_pool_import+0x147/0x160 [zfs]
            [19560.750754]  [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs]
            [19560.753059]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
            [19560.755347]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
            [19560.757527]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
            [19560.759572]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
            [19560.761745]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
            [19560.763873] INFO: task zpool:18768 blocked for more than 120 seconds.
            [19560.765997] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
            [19560.768229] zpool           D ffffffffa0f24fa8     0 18768      1 0x00000084
            [19560.770453]  ffff88001250fd60 0000000000000086 ffff8800799dce70 ffff88001250ffd8
            [19560.772778]  ffff88001250ffd8 ffff88001250ffd8 ffff8800799dce70 ffffffffa0f24fa0
            [19560.774985]  ffffffffa0f24fa4 ffff8800799dce70 00000000ffffffff ffffffffa0f24fa8
            [19560.777209] Call Trace:
            [19560.778908]  [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70
            [19560.780975]  [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0
            [19560.782995]  [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f
            [19560.784889]  [<ffffffffa0d1d6c0>] spa_all_configs+0x40/0x120 [zfs]
            [19560.786887]  [<ffffffffa0d59cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs]
            [19560.788918]  [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs]
            [19560.790803]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
            [19560.792588]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
            [19560.794332]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
            [19560.796068]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
            [19560.797746]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
            [19560.799400] INFO: task zpool:18800 blocked for more than 120 seconds.
            [19560.801105] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
            [19560.802941] zpool           D ffffffffa0f24fa8     0 18800      1 0x00000084
            [19560.804751]  ffff8800122dbd60 0000000000000082 ffff880010ba6dd0 ffff8800122dbfd8
            [19560.806634]  ffff8800122dbfd8 ffff8800122dbfd8 ffff880010ba6dd0 ffffffffa0f24fa0
            [19560.808496]  ffffffffa0f24fa4 ffff880010ba6dd0 00000000ffffffff ffffffffa0f24fa8
            [19560.810349] Call Trace:
            [19560.811709]  [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70
            [19560.813542]  [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0
            [19560.815241]  [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f
            [19560.816884]  [<ffffffffa0d1d6c0>] spa_all_configs+0x40/0x120 [zfs]
            [19560.818617]  [<ffffffffa0d59cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs]
            [19560.820356]  [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs]
            [19560.822025]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
            [19560.823636]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
            [19560.825297]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
            [19560.826866]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
            [19560.828517]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
            [19560.830332] INFO: task zpool:18832 blocked for more than 120 seconds.
            [19560.832078] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
            [19560.833956] zpool           D ffffffffa0f24fa8     0 18832      1 0x00000084
            [19560.835816]  ffff880012507d60 0000000000000082 ffff88001e53edd0 ffff880012507fd8
            [19560.837763]  ffff880012507fd8 ffff880012507fd8 ffff88001e53edd0 ffffffffa0f24fa0
            [19560.839700]  ffffffffa0f24fa4 ffff88001e53edd0 00000000ffffffff ffffffffa0f24fa8
            [19560.841638] Call Trace:
            [19560.843062]  [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70
            [19560.844855]  [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0
            [19560.846633]  [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f
            [19560.848464]  [<ffffffffa0d1d6c0>] spa_all_configs+0x40/0x120 [zfs]
            [19560.850290]  [<ffffffffa0d59cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs]
            [19560.852156]  [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs]
            [19560.853906]  [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0
            [19560.855574]  [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150
            [19560.857280]  [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0
            [19560.858890]  [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0
            [19560.860593]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
            [20080.600849] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503389893/real 1503389893]  req@ffff880020c77600 x1576406407452176/t0(0) o250->MGC10.14.83.84@tcp@10.14.83.84@tcp:26/25 lens 520/544 e 0 to 1 dl 1503389948 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
            [20080.612492] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 15 previous similar messages
            [20630.092975] Lustre: Failing over testfs-OST0001
            [20630.107077] Lustre: server umount testfs-OST0001 complete
            

            And the relevant process list on that machine:

            ...
            23250 ?        Ds     0:00  |   \_ zpool export zfs_pool_scsi0QEMU_QEMU_HARDDISK_disk11
            ...
            18651 ?        D      0:00 zpool import -f zfs_pool_scsi0QEMU_QEMU_HARDDISK_disk13
            18768 ?        D      0:00 zpool list -H -o name
            18800 ?        D      0:00 zpool list -H -o name
            18832 ?        D      0:00 zpool list -H -o name
            18864 ?        D      0:00 zpool list -H -o name
            18896 ?        D      0:00 zpool list -H -o name
            19018 ?        D      0:00 zpool list -H -o name
            

            But to be clear, these two machines are completely unrelated to each other. They are from different clusters running different instances of the same test.

            brian Brian Murrell (Inactive) added a comment - From the second server, just in case it sheds any different/new light: [15667.882446] Lustre: testfs-OST0001: new disk, initializing [15667.884987] Lustre: srv-testfs-OST0001: No data found on store. Initialize space [15667.917724] Lustre: testfs-OST0001: Imperative Recovery not enabled, recovery window 300-900 [15668.707161] Lustre: Failing over testfs-OST0001 [15668.718588] Lustre: server umount testfs-OST0001 complete [15700.116755] sda: sda1 sda9 [15700.134485] sdc: sdc1 sdc9 [15700.156137] sde: sde1 sde9 [15703.114258] sda: sda1 sda9 [15704.405515] Lustre: testfs-OST0001: Imperative Recovery not enabled, recovery window 300-900 [15709.450388] Lustre: testfs-OST0001: Connection restored to 10.14.83.85@tcp (at 10.14.83.85@tcp) [15711.230525] sdc: sdc1 sdc9 [15886.394557] Lustre: 8675:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503385746/real 1503385746] req@ffff8800443cfc00 x1576406407449968/t0(0) o400->MGC10.14.83.84@tcp@10.14.83.84@tcp:26/25 lens 224/224 e 0 to 1 dl 1503385753 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [15886.405560] LustreError: 166-1: MGC10.14.83.84@tcp: Connection to MGS (at 10.14.83.84@tcp) was lost; in progress operations using this service will fail [15911.419483] Lustre: Evicted from MGS (at 10.14.83.85@tcp) after server handle changed from 0xbb7c0e830fe6b896 to 0x50fc281a1a2827fe [15911.427491] Lustre: MGC10.14.83.84@tcp: Connection restored to 10.14.83.85@tcp (at 10.14.83.85@tcp) [15918.410283] Lustre: 8675:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503385778/real 1503385778] req@ffff8800443cf000 x1576406407450032/t0(0) o400->testfs-MDT0000-lwp-OST0001@10.14.83.85@tcp:12/10 lens 224/224 e 0 to 1 dl 1503385785 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [15918.423482] Lustre: 8675:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 1 previous similar message [15918.427329] Lustre: testfs-MDT0000-lwp-OST0001: Connection to testfs-MDT0000 (at 10.14.83.85@tcp) was lost; in progress operations using this service will wait for recovery to complete [15925.558268] LustreError: 166-1: MGC10.14.83.84@tcp: Connection to MGS (at 10.14.83.85@tcp) was lost; in progress operations using this service will fail [15925.567055] LustreError: 9604:0:(mgc_request.c:603:do_requeue()) failed processing log: -5 [15992.570650] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503385849/real 1503385849] req@ffff8800443ce100 x1576406407450176/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.85@tcp:12/10 lens 520/544 e 0 to 1 dl 1503385860 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [15992.585639] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 6 previous similar messages [16025.570428] LustreError: 9604:0:(mgc_request.c:603:do_requeue()) failed processing log: -5 [16121.575838] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503385968/real 1503385968] req@ffff8800443cd200 x1576406407450336/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.85@tcp:12/10 lens 520/544 e 0 to 1 dl 1503385989 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [16121.590976] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 8 previous similar messages [16125.575897] LustreError: 9604:0:(mgc_request.c:603:do_requeue()) failed processing log: -5 [16225.581271] LustreError: 9604:0:(mgc_request.c:603:do_requeue()) failed processing log: -108 [16325.587390] LustreError: 9604:0:(mgc_request.c:603:do_requeue()) failed processing log: -5 [16386.587316] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503386218/real 1503386218] req@ffff880020c74000 x1576406407450544/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.85@tcp:12/10 lens 520/544 e 0 to 1 dl 1503386254 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [16386.604235] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 11 previous similar messages [16930.589449] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503386743/real 1503386743] req@ffff880020c75b00 x1576406407450832/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.84@tcp:12/10 lens 520/544 e 0 to 1 dl 1503386798 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [16930.606293] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 16 previous similar messages [17530.590000] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503387343/real 1503387343] req@ffff8800443cc300 x1576406407451072/t0(0) o250->MGC10.14.83.84@tcp@10.14.83.85@tcp:26/25 lens 520/544 e 0 to 1 dl 1503387398 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [17530.609959] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 15 previous similar messages [18130.596259] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503387943/real 1503387943] req@ffff8800443cc900 x1576406407451344/t0(0) o250->MGC10.14.83.84@tcp@10.14.83.84@tcp:26/25 lens 520/544 e 0 to 1 dl 1503387998 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [18130.615008] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 16 previous similar messages [18805.596314] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503388618/real 1503388618] req@ffff8800443cea00 x1576406407451648/t0(0) o38->testfs-MDT0000-lwp-OST0001@10.14.83.85@tcp:12/10 lens 520/544 e 0 to 1 dl 1503388673 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [18805.613730] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 17 previous similar messages [19124.879146] WARNING: can't open objset 51, error 5 [19124.895291] VERIFY3(0 == zap_lookup(spa->spa_meta_objset, spa->spa_feat_enabled_txg_obj, feature->fi_guid, sizeof (uint64_t), 1, res)) failed (0 == 52) [19124.905980] PANIC at zfeature.c:294:feature_get_enabled_txg() [19124.910754] Showing stack for process 18651 [19124.915295] CPU: 1 PID: 18651 Comm: zpool Tainted: P OE ------------ 3.10.0-514.26.2.el7_lustre.x86_64 #1 [19124.920749] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [19124.925537] ffffffffa0e1f9b9 0000000051f16af8 ffff8800121ab8f8 ffffffff8168729f [19124.930690] ffff8800121ab908 ffffffffa0bf0254 ffff8800121aba90 ffffffffa0bf0329 [19124.935541] 0000000051f16af8 ffff880000000030 ffff8800121abaa0 ffff8800121aba40 [19124.939945] Call Trace: [19124.943537] [<ffffffff8168729f>] dump_stack+0x19/0x1b [19124.947382] [<ffffffffa0bf0254>] spl_dumpstack+0x44/0x50 [spl] [19124.951156] [<ffffffffa0bf0329>] spl_panic+0xc9/0x110 [spl] [19124.954764] [<ffffffffa0d44fe0>] ? zap_lookup_norm+0x60/0xd0 [zfs] [19124.958185] [<ffffffff8168c969>] ? schedule+0x29/0x70 [19124.961438] [<ffffffff8168a3a9>] ? schedule_timeout+0x239/0x2c0 [19124.964782] [<ffffffffa0d47345>] spa_feature_enabled_txg+0x105/0x110 [zfs] [19124.967995] [<ffffffffa0cdb31b>] traverse_impl+0x3db/0x460 [zfs] [19124.971047] [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0 [19124.974060] [<ffffffffa0bec319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl] [19124.977064] [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [19124.980079] [<ffffffffa0cdb834>] traverse_pool+0x84/0x1d0 [zfs] [19124.982868] [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [19124.985617] [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [19124.988234] [<ffffffffa0d7f02a>] ? zio_null+0x6a/0x70 [zfs] [19124.990798] [<ffffffffa0d18128>] spa_load+0x1a58/0x2030 [zfs] [19124.993366] [<ffffffffa0d1875e>] spa_load_best+0x5e/0x290 [zfs] [19124.995935] [<ffffffffa0d1a4a2>] spa_import+0x212/0x730 [zfs] [19124.998374] [<ffffffffa0d55a27>] zfs_ioc_pool_import+0x147/0x160 [zfs] [19125.000830] [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs] [19125.003126] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [19125.005463] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [19125.007630] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [19125.009651] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [19125.011740] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [19320.367509] INFO: task zpool:18651 blocked for more than 120 seconds. [19320.372490] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [19320.376754] zpool D ffff88001d84e850 0 18651 18627 0x00000080 [19320.380708] ffff8800121ab8f8 0000000000000082 ffff880010ba0fb0 ffff8800121abfd8 [19320.384641] ffff8800121abfd8 ffff8800121abfd8 ffff880010ba0fb0 ffffffffa0e1f9b9 [19320.388572] 0000000000000126 ffffffffa0e00f70 ffffffffa0e1fcd8 ffff88001d84e850 [19320.392387] Call Trace: [19320.395307] [<ffffffff8168c969>] schedule+0x29/0x70 [19320.398704] [<ffffffffa0bf0355>] spl_panic+0xf5/0x110 [spl] [19320.401801] [<ffffffffa0d44fe0>] ? zap_lookup_norm+0x60/0xd0 [zfs] [19320.404847] [<ffffffff8168c969>] ? schedule+0x29/0x70 [19320.407738] [<ffffffff8168a3a9>] ? schedule_timeout+0x239/0x2c0 [19320.410561] [<ffffffffa0d47345>] spa_feature_enabled_txg+0x105/0x110 [zfs] [19320.413381] [<ffffffffa0cdb31b>] traverse_impl+0x3db/0x460 [zfs] [19320.416000] [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0 [19320.418611] [<ffffffffa0bec319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl] [19320.421192] [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [19320.423688] [<ffffffffa0cdb834>] traverse_pool+0x84/0x1d0 [zfs] [19320.426251] [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [19320.428743] [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [19320.431079] [<ffffffffa0d7f02a>] ? zio_null+0x6a/0x70 [zfs] [19320.433308] [<ffffffffa0d18128>] spa_load+0x1a58/0x2030 [zfs] [19320.435512] [<ffffffffa0d1875e>] spa_load_best+0x5e/0x290 [zfs] [19320.437740] [<ffffffffa0d1a4a2>] spa_import+0x212/0x730 [zfs] [19320.439853] [<ffffffffa0d55a27>] zfs_ioc_pool_import+0x147/0x160 [zfs] [19320.442007] [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs] [19320.444030] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [19320.445995] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [19320.448208] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [19320.450087] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [19320.452013] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [19320.453955] INFO: task zpool:18768 blocked for more than 120 seconds. [19320.455929] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [19320.458108] zpool D ffffffffa0f24fa8 0 18768 1 0x00000084 [19320.460257] ffff88001250fd60 0000000000000086 ffff8800799dce70 ffff88001250ffd8 [19320.462466] ffff88001250ffd8 ffff88001250ffd8 ffff8800799dce70 ffffffffa0f24fa0 [19320.464712] ffffffffa0f24fa4 ffff8800799dce70 00000000ffffffff ffffffffa0f24fa8 [19320.466924] Call Trace: [19320.468524] [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70 [19320.470708] [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0 [19320.472673] [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f [19320.474542] [<ffffffffa0d1d6c0>] spa_all_configs+0x40/0x120 [zfs] [19320.476510] [<ffffffffa0d59cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs] [19320.478515] [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs] [19320.480341] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [19320.482125] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [19320.483982] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [19320.485702] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [19320.487660] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [19440.489227] INFO: task zpool:18651 blocked for more than 120 seconds. [19440.492719] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [19440.496212] zpool D ffff88001d84e850 0 18651 1 0x00000084 [19440.499703] ffff8800121ab8f8 0000000000000082 ffff880010ba0fb0 ffff8800121abfd8 [19440.503335] ffff8800121abfd8 ffff8800121abfd8 ffff880010ba0fb0 ffffffffa0e1f9b9 [19440.506612] 0000000000000126 ffffffffa0e00f70 ffffffffa0e1fcd8 ffff88001d84e850 [19440.509882] Call Trace: [19440.512287] [<ffffffff8168c969>] schedule+0x29/0x70 [19440.515148] [<ffffffffa0bf0355>] spl_panic+0xf5/0x110 [spl] [19440.518148] [<ffffffffa0d44fe0>] ? zap_lookup_norm+0x60/0xd0 [zfs] [19440.521164] [<ffffffff8168c969>] ? schedule+0x29/0x70 [19440.523946] [<ffffffff8168a3a9>] ? schedule_timeout+0x239/0x2c0 [19440.526764] [<ffffffffa0d47345>] spa_feature_enabled_txg+0x105/0x110 [zfs] [19440.529697] [<ffffffffa0cdb31b>] traverse_impl+0x3db/0x460 [zfs] [19440.532370] [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0 [19440.534989] [<ffffffffa0bec319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl] [19440.537705] [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [19440.540353] [<ffffffffa0cdb834>] traverse_pool+0x84/0x1d0 [zfs] [19440.542846] [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [19440.545361] [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [19440.547887] [<ffffffffa0d7f02a>] ? zio_null+0x6a/0x70 [zfs] [19440.550464] [<ffffffffa0d18128>] spa_load+0x1a58/0x2030 [zfs] [19440.552802] [<ffffffffa0d1875e>] spa_load_best+0x5e/0x290 [zfs] [19440.555110] [<ffffffffa0d1a4a2>] spa_import+0x212/0x730 [zfs] [19440.557406] [<ffffffffa0d55a27>] zfs_ioc_pool_import+0x147/0x160 [zfs] [19440.559811] [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs] [19440.561972] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [19440.564049] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [19440.566271] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [19440.568328] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [19440.570443] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [19440.572608] INFO: task zpool:18768 blocked for more than 120 seconds. [19440.574650] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [19440.576829] zpool D ffffffffa0f24fa8 0 18768 1 0x00000084 [19440.579018] ffff88001250fd60 0000000000000086 ffff8800799dce70 ffff88001250ffd8 [19440.581310] ffff88001250ffd8 ffff88001250ffd8 ffff8800799dce70 ffffffffa0f24fa0 [19440.583615] ffffffffa0f24fa4 ffff8800799dce70 00000000ffffffff ffffffffa0f24fa8 [19440.585846] Call Trace: [19440.587477] [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70 [19440.589545] [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0 [19440.591507] [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f [19440.593495] [<ffffffffa0d1d6c0>] spa_all_configs+0x40/0x120 [zfs] [19440.595469] [<ffffffffa0d59cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs] [19440.597487] [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs] [19440.599490] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [19440.601293] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [19440.603152] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [19440.604895] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [19440.606739] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [19440.608540] INFO: task zpool:18800 blocked for more than 120 seconds. [19440.610354] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [19440.612352] zpool D ffffffffa0f24fa8 0 18800 1 0x00000084 [19440.614246] ffff8800122dbd60 0000000000000082 ffff880010ba6dd0 ffff8800122dbfd8 [19440.616185] ffff8800122dbfd8 ffff8800122dbfd8 ffff880010ba6dd0 ffffffffa0f24fa0 [19440.618092] ffffffffa0f24fa4 ffff880010ba6dd0 00000000ffffffff ffffffffa0f24fa8 [19440.619977] Call Trace: [19440.621338] [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70 [19440.623054] [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0 [19440.624759] [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f [19440.626385] [<ffffffffa0d1d6c0>] spa_all_configs+0x40/0x120 [zfs] [19440.628090] [<ffffffffa0d59cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs] [19440.629920] [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs] [19440.631628] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [19440.633269] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [19440.634968] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [19440.636528] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [19440.638164] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [19440.639822] INFO: task zpool:18832 blocked for more than 120 seconds. [19440.641528] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [19440.643369] zpool D ffffffffa0f24fa8 0 18832 1 0x00000084 [19440.645209] ffff880012507d60 0000000000000082 ffff88001e53edd0 ffff880012507fd8 [19440.647187] ffff880012507fd8 ffff880012507fd8 ffff88001e53edd0 ffffffffa0f24fa0 [19440.649283] ffffffffa0f24fa4 ffff88001e53edd0 00000000ffffffff ffffffffa0f24fa8 [19440.651239] Call Trace: [19440.652709] [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70 [19440.654518] [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0 [19440.656271] [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f [19440.657960] [<ffffffffa0d1d6c0>] spa_all_configs+0x40/0x120 [zfs] [19440.659744] [<ffffffffa0d59cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs] [19440.661554] [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs] [19440.663293] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [19440.665097] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [19440.666816] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [19440.668431] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [19440.670127] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [19480.599139] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503389293/real 1503389293] req@ffff8800443ced00 x1576406407451920/t0(0) o250->MGC10.14.83.84@tcp@10.14.83.84@tcp:26/25 lens 520/544 e 0 to 1 dl 1503389348 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [19480.611983] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 17 previous similar messages [19560.671087] INFO: task zpool:18651 blocked for more than 120 seconds. [19560.678715] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [19560.682600] zpool D ffff88001d84e850 0 18651 1 0x00000084 [19560.686223] ffff8800121ab8f8 0000000000000082 ffff880010ba0fb0 ffff8800121abfd8 [19560.690105] ffff8800121abfd8 ffff8800121abfd8 ffff880010ba0fb0 ffffffffa0e1f9b9 [19560.693820] 0000000000000126 ffffffffa0e00f70 ffffffffa0e1fcd8 ffff88001d84e850 [19560.697666] Call Trace: [19560.700549] [<ffffffff8168c969>] schedule+0x29/0x70 [19560.703956] [<ffffffffa0bf0355>] spl_panic+0xf5/0x110 [spl] [19560.707126] [<ffffffffa0d44fe0>] ? zap_lookup_norm+0x60/0xd0 [zfs] [19560.710365] [<ffffffff8168c969>] ? schedule+0x29/0x70 [19560.713259] [<ffffffff8168a3a9>] ? schedule_timeout+0x239/0x2c0 [19560.716231] [<ffffffffa0d47345>] spa_feature_enabled_txg+0x105/0x110 [zfs] [19560.719335] [<ffffffffa0cdb31b>] traverse_impl+0x3db/0x460 [zfs] [19560.722251] [<ffffffff811de835>] ? kmem_cache_alloc+0x35/0x1e0 [19560.724959] [<ffffffffa0bec319>] ? spl_kmem_cache_alloc+0x99/0x150 [spl] [19560.727796] [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [19560.730692] [<ffffffffa0cdb834>] traverse_pool+0x84/0x1d0 [zfs] [19560.733315] [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [19560.735957] [<ffffffffa0d10d10>] ? spa_async_suspend+0xa0/0xa0 [zfs] [19560.738547] [<ffffffffa0d7f02a>] ? zio_null+0x6a/0x70 [zfs] [19560.741076] [<ffffffffa0d18128>] spa_load+0x1a58/0x2030 [zfs] [19560.743504] [<ffffffffa0d1875e>] spa_load_best+0x5e/0x290 [zfs] [19560.745918] [<ffffffffa0d1a4a2>] spa_import+0x212/0x730 [zfs] [19560.748273] [<ffffffffa0d55a27>] zfs_ioc_pool_import+0x147/0x160 [zfs] [19560.750754] [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs] [19560.753059] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [19560.755347] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [19560.757527] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [19560.759572] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [19560.761745] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [19560.763873] INFO: task zpool:18768 blocked for more than 120 seconds. [19560.765997] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [19560.768229] zpool D ffffffffa0f24fa8 0 18768 1 0x00000084 [19560.770453] ffff88001250fd60 0000000000000086 ffff8800799dce70 ffff88001250ffd8 [19560.772778] ffff88001250ffd8 ffff88001250ffd8 ffff8800799dce70 ffffffffa0f24fa0 [19560.774985] ffffffffa0f24fa4 ffff8800799dce70 00000000ffffffff ffffffffa0f24fa8 [19560.777209] Call Trace: [19560.778908] [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70 [19560.780975] [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0 [19560.782995] [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f [19560.784889] [<ffffffffa0d1d6c0>] spa_all_configs+0x40/0x120 [zfs] [19560.786887] [<ffffffffa0d59cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs] [19560.788918] [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs] [19560.790803] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [19560.792588] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [19560.794332] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [19560.796068] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [19560.797746] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [19560.799400] INFO: task zpool:18800 blocked for more than 120 seconds. [19560.801105] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [19560.802941] zpool D ffffffffa0f24fa8 0 18800 1 0x00000084 [19560.804751] ffff8800122dbd60 0000000000000082 ffff880010ba6dd0 ffff8800122dbfd8 [19560.806634] ffff8800122dbfd8 ffff8800122dbfd8 ffff880010ba6dd0 ffffffffa0f24fa0 [19560.808496] ffffffffa0f24fa4 ffff880010ba6dd0 00000000ffffffff ffffffffa0f24fa8 [19560.810349] Call Trace: [19560.811709] [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70 [19560.813542] [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0 [19560.815241] [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f [19560.816884] [<ffffffffa0d1d6c0>] spa_all_configs+0x40/0x120 [zfs] [19560.818617] [<ffffffffa0d59cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs] [19560.820356] [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs] [19560.822025] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [19560.823636] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [19560.825297] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [19560.826866] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [19560.828517] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [19560.830332] INFO: task zpool:18832 blocked for more than 120 seconds. [19560.832078] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [19560.833956] zpool D ffffffffa0f24fa8 0 18832 1 0x00000084 [19560.835816] ffff880012507d60 0000000000000082 ffff88001e53edd0 ffff880012507fd8 [19560.837763] ffff880012507fd8 ffff880012507fd8 ffff88001e53edd0 ffffffffa0f24fa0 [19560.839700] ffffffffa0f24fa4 ffff88001e53edd0 00000000ffffffff ffffffffa0f24fa8 [19560.841638] Call Trace: [19560.843062] [<ffffffff8168da59>] schedule_preempt_disabled+0x29/0x70 [19560.844855] [<ffffffff8168b6a5>] __mutex_lock_slowpath+0xc5/0x1d0 [19560.846633] [<ffffffff8168aaff>] mutex_lock+0x1f/0x2f [19560.848464] [<ffffffffa0d1d6c0>] spa_all_configs+0x40/0x120 [zfs] [19560.850290] [<ffffffffa0d59cdb>] zfs_ioc_pool_configs+0x1b/0x70 [zfs] [19560.852156] [<ffffffffa0d5a626>] zfsdev_ioctl+0x606/0x650 [zfs] [19560.853906] [<ffffffff812127a5>] do_vfs_ioctl+0x2d5/0x4b0 [19560.855574] [<ffffffff81693226>] ? trace_do_page_fault+0x56/0x150 [19560.857280] [<ffffffff81212a21>] SyS_ioctl+0xa1/0xc0 [19560.858890] [<ffffffff816928cb>] ? do_async_page_fault+0x1b/0xd0 [19560.860593] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b [20080.600849] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1503389893/real 1503389893] req@ffff880020c77600 x1576406407452176/t0(0) o250->MGC10.14.83.84@tcp@10.14.83.84@tcp:26/25 lens 520/544 e 0 to 1 dl 1503389948 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [20080.612492] Lustre: 8673:0:(client.c:2114:ptlrpc_expire_one_request()) Skipped 15 previous similar messages [20630.092975] Lustre: Failing over testfs-OST0001 [20630.107077] Lustre: server umount testfs-OST0001 complete And the relevant process list on that machine: ... 23250 ? Ds 0:00 | \_ zpool export zfs_pool_scsi0QEMU_QEMU_HARDDISK_disk11 ... 18651 ? D 0:00 zpool import -f zfs_pool_scsi0QEMU_QEMU_HARDDISK_disk13 18768 ? D 0:00 zpool list -H -o name 18800 ? D 0:00 zpool list -H -o name 18832 ? D 0:00 zpool list -H -o name 18864 ? D 0:00 zpool list -H -o name 18896 ? D 0:00 zpool list -H -o name 19018 ? D 0:00 zpool list -H -o name But to be clear, these two machines are completely unrelated to each other. They are from different clusters running different instances of the same test.

            People

              wc-triage WC Triage
              brian Brian Murrell (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: