[LU-9845] ost-pools test_22 hangs with ‘WARNING: Pool 'lustre-mdt1' has encountered an uncorrectable I/O failure and has been suspended.’ Created: 08/Aug/17  Updated: 11/Sep/20  Resolved: 22/May/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0, Lustre 2.10.2, Lustre 2.10.3, Lustre 2.10.7, Lustre 2.12.3
Fix Version/s: Lustre 2.11.0, Lustre 2.10.4

Type: Bug Priority: Critical
Reporter: James Nunez (Inactive) Assignee: Nathaniel Clark
Resolution: Fixed Votes: 0
Labels: zfs
Environment:

ZFS 0.7.x


Attachments: File mds02_faulted.log     File zdb-lustre-ost1.log    
Issue Links:
Duplicate
is duplicated by LU-10342 sanity test_27F timeout Resolved
is duplicated by LU-10832 sanity-flr test_200: Timeout occurred... Resolved
is duplicated by LU-11217 conf-sanity: test_32c client crashed Resolved
is duplicated by LU-12281 replay-single: ZFS mmp crashed Resolved
is duplicated by LU-9991 sanity-hsm test 31c fails with 'copyt... Closed
is duplicated by LU-10109 sanityn test_55c: Timeout occurred af... Closed
is duplicated by LU-10585 parallel-scale test_compilebench: Lus... Closed
Related
is related to LU-10441 sanityn: test 36 timeout Open
is related to LU-8619 Lustre to build against ZFS 0.7.0 Resolved
is related to LU-10956 sanity-pfl test_3: Kernel panic - not... Open
is related to LU-12258 sanity test_101d timeout when doing r... Open
is related to LU-11704 sanity-lfsck test 2a hangs with ‘Pool... Resolved
is related to LU-12393 insanity: Kernel panic - not syncing:... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

ost-pools test 22 hangs when testing with ZFS 0.7.0. The last lines in the test_log before the hang are:

loop for testpool2 complete
total: 100 open/close in 0.09 seconds: 1061.34 ops/second
CMD: onyx-35vm9 lctl pool_remove lustre.testpool OST[0001-0001/1]
onyx-35vm9: OST lustre-OST0001_UUID removed from pool lustre.testpool
Pool testpool, iteration 2
CMD: onyx-35vm9 lctl pool_remove lustre.testpool OST[0001-0001/1]
onyx-35vm9: OST lustre-OST0001_UUID removed from pool lustre.testpool
Pool testpool, iteration 3
CMD: onyx-35vm9 lctl pool_remove lustre.testpool OST[0001-0001/1]
onyx-35vm9: OST lustre-OST0001_UUID removed from pool lustre.testpool
Pool testpool, iteration 4

In the MDS console logs, we see:

04:17:47:[11669.022658] Lustre: DEBUG MARKER: lctl pool_remove lustre.testpool OST[0001-0001/1]
04:17:47:[11677.206802] Lustre: DEBUG MARKER: lctl pool_add lustre.testpool OST[0000-0001/1]
04:17:47:[11687.391683] Lustre: DEBUG MARKER: lctl pool_remove lustre.testpool OST[0001-0001/1]
04:17:47:[11693.557227] WARNING: Pool 'lustre-mdt1' has encountered an uncorrectable I/O failure and has been suspended.
04:17:47:[11693.557227] 
04:17:47:[11697.575911] Lustre: DEBUG MARKER: lctl pool_add lustre.testpool OST[0000-0001/1]
04:17:47:[15221.223579] SysRq : Show State
04:17:47:[15221.224071]   task                        PC stack   pid father
04:17:47:[15221.224729] systemd         S 0000000000000000     0     1      0 0x00000000
04:17:47:[15221.225617]  ffff88007c7bbdb8 0000000000000082 ffff88007c7b0000 ffff88007c7bbfd8
04:17:47:[15221.226469]  ffff88007c7bbfd8 ffff88007c7bbfd8 ffff88007c7b0000 0000000000000000
04:17:47:[15221.227325]  0000000000000000 ffff8800369d1a20 ffff88007c7b0000 0000000000000000
04:17:47:[15221.228217] Call Trace:
04:17:47:[15221.228581]  [<ffffffff8168ca49>] schedule+0x29/0x70
04:17:47:[15221.229094]  [<ffffffff8168bb4d>] schedule_hrtimeout_range_clock+0x12d/0x150
04:17:47:[15221.229864]  [<ffffffff812486f9>] ? ep_scan_ready_list.isra.7+0x1b9/0x1f0
04:17:47:[15221.230669]  [<ffffffff8168bb83>] schedule_hrtimeout_range+0x13/0x20
04:17:47:[15221.231363]  [<ffffffff8124898e>] ep_poll+0x23e/0x360
04:17:47:[15221.231927]  [<ffffffff81200873>] ? __fput+0x183/0x260
04:17:47:[15221.232481]  [<ffffffff810c54e0>] ? wake_up_state+0x20/0x20
04:17:47:[15221.233121]  [<ffffffff81249bfd>] SyS_epoll_wait+0xed/0x120
04:17:47:[15221.233737]  [<ffffffff81697a49>] system_call_fastpath+0x16/0x1b
…
04:17:47:[15222.536161] cfs_rh_00       S ffff88007a8667b0     0  2617      2 0x00000080
04:17:47:[15222.536916]  ffff8800793ebe28 0000000000000046 ffff88000001bec0 ffff8800793ebfd8
04:17:47:[15222.537791]  ffff8800793ebfd8 ffff8800793ebfd8 ffff88000001bec0 ffff88007a866780
04:17:47:[15222.538646]  ffff88007a866798 ffff88007a866790 0000000000000000 ffff88007a8667b0
04:17:47:[15222.539495] Call Trace:
04:17:47:[15222.539749]  [<ffffffff8168ca49>] schedule+0x29/0x70
04:17:47:[15222.540295]  [<ffffffffa05cab58>] cfs_wi_scheduler+0x2f8/0x440 [libcfs]
04:17:47:[15222.541047]  [<ffffffff810b1b20>] ? wake_up_atomic_t+0x30/0x30
04:17:47:[15222.541647]  [<ffffffffa05ca860>] ? cfs_wi_schedule+0x190/0x190 [libcfs]
04:17:47:[15222.542384]  [<ffffffff810b0a4f>] kthread+0xcf/0xe0
04:17:47:[15222.542905]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
04:17:47:[15222.543561]  [<ffffffff81697998>] ret_from_fork+0x58/0x90
04:17:47:[15222.544161]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
04:17:47:[15222.544827] cfs_rh_01       S ffff88007a8667b0     0  2618      2 0x00000080
04:17:47:[15222.545646]  ffff88007b4ffe28 0000000000000046 ffff880000018fb0 ffff88007b4fffd8
04:17:47:[15222.546465]  ffff88007b4fffd8 ffff88007b4fffd8 ffff880000018fb0 ffff88007a866780
04:17:47:[15222.547284]  ffff88007a866798 ffff88007a866790 0000000000000000 ffff88007a8667b0
04:17:47:[15222.548152] Call Trace:
04:17:47:[15222.548437]  [<ffffffff8168ca49>] schedule+0x29/0x70
04:17:47:[15222.549020]  [<ffffffffa05cab58>] cfs_wi_scheduler+0x2f8/0x440 [libcfs]
04:17:47:[15222.549696]  [<ffffffff810b1b20>] ? wake_up_atomic_t+0x30/0x30
04:17:47:[15222.550348]  [<ffffffffa05ca860>] ? cfs_wi_schedule+0x190/0x190 [libcfs]
04:17:47:[15222.551049]  [<ffffffff810b0a4f>] kthread+0xcf/0xe0
04:17:47:[15222.551554]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
04:17:47:[15222.552281]  [<ffffffff81697998>] ret_from_fork+0x58/0x90
04:17:47:[15222.552858]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
04:17:47:[15222.553513] spl_system_task S ffff8800791e43c0     0  3362      2 0x00000080
04:17:47:[15222.554288]  ffff88007bae3e00 0000000000000046 ffff88007a24af10 ffff88007bae3fd8
04:17:47:[15222.555149]  ffff88007bae3fd8 ffff88007bae3fd8 ffff88007a24af10 ffff8800791e4300
04:17:47:[15222.556011]  ffff88007a9c6de0 ffff8800791e4390 ffff8800366fbd30 ffff8800791e43c0
04:17:47:[15222.556870] Call Trace:
04:17:47:[15222.557125]  [<ffffffff8168ca49>] schedule+0x29/0x70
04:17:47:[15222.557656]  [<ffffffffa061f0a9>] taskq_thread+0x429/0x470 [spl]
04:17:47:[15222.558369]  [<ffffffff810c54e0>] ? wake_up_state+0x20/0x20
04:17:47:[15222.558965]  [<ffffffffa061ec80>] ? taskq_thread_spawn+0x60/0x60 [spl]
04:17:47:[15222.559630]  [<ffffffff810b0a4f>] kthread+0xcf/0xe0
04:17:47:[15222.560181]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
04:17:47:[15222.560842]  [<ffffffff81697998>] ret_from_fork+0x58/0x90
04:17:47:[15222.561451]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
04:17:47:[15222.562133] spl_delay_taskq S ffff88007b8fe4c0     0  3363      2 0x00000080
04:17:47:[15222.562894]  ffff88007a907e00 0000000000000046 ffff88007a24bec0 ffff88007a907fd8
04:17:47:[15222.563748]  ffff88007a907fd8 ffff88007a907fd8 ffff88007a24bec0 ffff88007b8fe400
04:17:47:[15222.564605]  ffff88007a9c6f00 ffff88007b8fe490 0000000000000000 ffff88007b8fe4c0
04:17:47:[15222.565474] Call Trace:
04:17:47:[15222.565733]  [<ffffffff8168ca49>] schedule+0x29/0x70
04:17:47:[15222.566242]  [<ffffffffa061f0a9>] taskq_thread+0x429/0x470 [spl]
04:17:47:[15222.566941]  [<ffffffff810c54e0>] ? wake_up_state+0x20/0x20
04:17:47:[15222.567515]  [<ffffffffa061ec80>] ? taskq_thread_spawn+0x60/0x60 [spl]
04:17:47:[15222.568229]  [<ffffffff810b0a4f>] kthread+0xcf/0xe0
04:17:47:[15222.568761]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
04:17:47:[15222.569429]  [<ffffffff81697998>] ret_from_fork+0x58/0x90
04:17:47:[15222.570037]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
04:17:47:[15222.570703] spl_dynamic_tas S ffff88007b8fe2c0     0  3364      2 0x00000080
04:17:47:[15222.571482]  ffff88007b28fe00 0000000000000046 ffff88007a248000 ffff88007b28ffd8
04:17:47:[15222.572328]  ffff88007b28ffd8 ffff88007b28ffd8 ffff88007a248000 ffff88007b8fe200
04:17:47:[15222.573178]  ffff88007a9c6c60 ffff88007b8fe290 ffff8800668c9670 ffff88007b8fe2c0
04:17:47:[15222.574045] Call Trace:
04:17:47:[15222.574314]  [<ffffffff8168ca49>] schedule+0x29/0x70
04:17:47:[15222.574893]  [<ffffffffa061f0a9>] taskq_thread+0x429/0x470 [spl]
04:17:47:[15222.575505]  [<ffffffff810c54e0>] ? wake_up_state+0x20/0x20
04:17:47:[15222.576122]  [<ffffffffa061ec80>] ? taskq_thread_spawn+0x60/0x60 [spl]
04:17:47:[15222.576785]  [<ffffffff810b0a4f>] kthread+0xcf/0xe0
04:17:47:[15222.577352]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
04:17:47:[15222.578036]  [<ffffffff81697998>] ret_from_fork+0x58/0x90
04:17:47:[15222.578596]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
04:17:47:[15222.579309] spl_kmem_cache  S ffff88007b8feec0     0  3365      2 0x00000080
04:17:47:[15222.580047]  ffff88007ba3fe00 0000000000000046 ffff88007a24ce70 ffff88007ba3ffd8
04:17:47:[15222.580902]  ffff88007ba3ffd8 ffff88007ba3ffd8 ffff88007a24ce70 ffff88007b8fee00
04:17:47:[15222.581750]  ffff88007a9c6a20 ffff88007b8fee90 ffff8800387fe800 ffff88007b8feec0
04:17:47:[15222.582597] Call Trace:
04:17:47:[15222.582853]  [<ffffffff8168ca49>] schedule+0x29/0x70
04:17:47:[15222.583358]  [<ffffffffa061f0a9>] taskq_thread+0x429/0x470 [spl]
04:17:47:[15222.584057]  [<ffffffff810c54e0>] ? wake_up_state+0x20/0x20
04:17:47:[15222.584645]  [<ffffffffa061ec80>] ? taskq_thread_spawn+0x60/0x60 [spl]
04:17:47:[15222.585372]  [<ffffffff810b0a4f>] kthread+0xcf/0xe0
04:17:47:[15222.585891]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
04:17:47:[15222.586549]  [<ffffffff81697998>] ret_from_fork+0x58/0x90
04:17:47:[15222.587153]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
04:17:47:[15222.587820] zvol            S ffff88007b5abbc0     0  3366      2 0x00000080
04:17:47:[15222.588648]  ffff88007b29be00 0000000000000046 ffff88007a249f60 ffff88007b29bfd8
04:17:47:[15222.589473]  ffff88007b29bfd8 ffff88007b29bfd8 ffff88007a249f60 ffff88007b5abb00
04:17:47:[15222.590300]  ffff8800364fa6c0 ffff88007b5abb90 0000000000000000 ffff88007b5abbc0
04:17:47:[15222.591167] Call Trace:
04:17:47:[15222.591437]  [<ffffffff8168ca49>] schedule+0x29/0x70
04:17:47:[15222.592015]  [<ffffffffa061f0a9>] taskq_thread+0x429/0x470 [spl]
04:17:47:[15222.592643]  [<ffffffff810c54e0>] ? wake_up_state+0x20/0x20
04:17:47:[15222.593264]  [<ffffffffa061ec80>] ? taskq_thread_spawn+0x60/0x60 [spl]
04:17:47:[15222.593951]  [<ffffffff810b0a4f>] kthread+0xcf/0xe0
04:17:47:[15222.594450]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
04:17:47:[15222.595156]  [<ffffffff81697998>] ret_from_fork+0x58/0x90
04:17:47:[15222.595709]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
04:17:47:[15222.596425] arc_prune       S ffff880079bc1ac0     0  3372      2 0x00000080
04:17:47:[15222.597166]  ffff88007b8a7e00 0000000000000046 ffff88007a9d8000 ffff88007b8a7fd8
04:17:47:[15222.598033]  ffff88007b8a7fd8 ffff88007b8a7fd8 ffff88007a9d8000 ffff880079bc1a00
04:17:47:[15222.598901]  ffff8800796aad80 ffff880079bc1a90 0000000000000000 ffff880079bc1ac0
04:17:47:[15222.599755] Call Trace:
04:17:47:[15222.600008]  [<ffffffff8168ca49>] schedule+0x29/0x70
04:17:47:[15222.600537]  [<ffffffffa061f0a9>] taskq_thread+0x429/0x470 [spl]
04:17:47:[15222.601256]  [<ffffffff810c54e0>] ? wake_up_state+0x20/0x20
04:17:47:[15222.601857]  [<ffffffffa061ec80>] ? taskq_thread_spawn+0x60/0x60 [spl]
04:17:47:[15222.602514]  [<ffffffff810b0a4f>] kthread+0xcf/0xe0
04:17:47:[15222.603059]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
04:17:47:[15222.603722]  [<ffffffff81697998>] ret_from_fork+0x58/0x90
04:17:47:[15222.604329]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
04:17:47:[15222.605006] arc_reclaim     S ffffffffa0914a08     0  3373      2 0x00000080
04:17:47:[15222.605764]  ffff88007b933cf8 0000000000000046 ffff88007a9dedd0 ffff88007b933fd8
04:17:47:[15222.606622]  ffff88007b933fd8 ffff88007b933fd8 ffff88007a9dedd0 ffff88007b933dc8
04:17:47:[15222.607473]  00000000000186a0 0000000000000001 ffffffffa0914a88 ffffffffa0914a08
04:17:47:[15222.608292] Call Trace:
04:17:47:[15222.608635]  [<ffffffff8168ca49>] schedule+0x29/0x70
04:17:47:[15222.609129]  [<ffffffff8168bad2>] schedule_hrtimeout_range_clock+0xb2/0x150
04:17:47:[15222.609843]  [<ffffffff810b4640>] ? hrtimer_get_res+0x50/0x50
04:17:47:[15222.610414]  [<ffffffff8168bac6>] ? schedule_hrtimeout_range_clock+0xa6/0x150
04:17:47:[15222.611121]  [<ffffffff8168bb83>] schedule_hrtimeout_range+0x13/0x20
04:17:47:[15222.611769]  [<ffffffffa06236ab>] cv_timedwait_hires_common+0x12b/0x220 [spl]
04:17:47:[15222.612503]  [<ffffffff810b1b20>] ? wake_up_atomic_t+0x30/0x30
04:17:47:[15222.613080]  [<ffffffffa06237d4>] cv_timedwait_sig_hires+0x14/0x20 [spl]
04:17:47:[15222.613844]  [<ffffffffa071ad31>] arc_reclaim_thread+0x121/0x260 [zfs]
04:17:47:[15222.614529]  [<ffffffffa071ac10>] ? arc_shrink+0xc0/0xc0 [zfs]
04:17:47:[15222.615337]  [<ffffffffa061df91>] thread_generic_wrapper+0x71/0x80 [spl]
04:17:47:[15222.616077]  [<ffffffffa061df20>] ? __thread_exit+0x20/0x20 [spl]
04:17:47:[15222.616766]  [<ffffffff810b0a4f>] kthread+0xcf/0xe0
04:17:47:[15222.617319]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
04:17:47:[15222.618001]  [<ffffffff81697998>] ret_from_fork+0x58/0x90
04:17:47:[15222.618548]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
04:17:47:[15222.619262] dbu_evict       S ffff88003641fbc0     0  3374      2 0x00000080
04:17:47:[15222.620014]  ffff88000005be00 0000000000000046 ffff88007a9daf10 ffff88000005bfd8
04:17:47:[15222.620866]  ffff88000005bfd8 ffff88000005bfd8 ffff88007a9daf10 ffff88003641fb00
04:17:47:[15222.621722]  ffff8800796aad20 ffff88003641fb90 ffff88004caa2070 ffff88003641fbc0
04:17:47:[15222.622573] Call Trace:
04:17:47:[15222.622826]  [<ffffffff8168ca49>] schedule+0x29/0x70
04:17:47:[15222.623336]  [<ffffffffa061f0a9>] taskq_thread+0x429/0x470 [spl]
04:17:47:[15222.624031]  [<ffffffff810c54e0>] ? wake_up_state+0x20/0x20
04:17:47:[15222.624606]  [<ffffffffa061ec80>] ? taskq_thread_spawn+0x60/0x60 [spl]
04:17:47:[15222.625315]  [<ffffffff810b0a4f>] kthread+0xcf/0xe0
04:17:47:[15222.625832]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
04:17:47:[15222.626491]  [<ffffffff81697998>] ret_from_fork+0x58/0x90
04:17:47:[15222.627083]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140

From ost-pools.sh,

1196 test_22() {
1197         local POOL_ROOT=${POOL_ROOT:-$DIR/$tdir}
1198         [[ $OSTCOUNT -le 1 ]] && skip_env "Need at least 2 OSTs" && return
1199 
1200         local numfiles=100
1201 
1202         create_pool_nofail $POOL
1203         add_pool $POOL "OST0000" "$FSNAME-OST0000_UUID "
1204         create_pool_nofail $POOL2
1205         add_pool $POOL2 "OST0000" "$FSNAME-OST0000_UUID "
1206 
1207         add_loop $POOL 1 &
1208         add_loop $POOL2 2 &
1209         sleep 5
1210         create_dir $POOL_ROOT $POOL
1211         createmany -o $POOL_ROOT/${tfile} $numfiles ||
1212                 error "createmany $POOL_ROOT/${tfile} failed!"
1213         wait
1214 
1215         return 0
1216 }
1217 run_test 22 "Simultaneous manipulation of a pool"

There’s only one instance of this failure. Logs are at:
https://testing.hpdd.intel.com/test_sets/d0619996-799f-11e7-9b69-5254006e85c2



 Comments   
Comment by Jinshan Xiong (Inactive) [ 10/Aug/17 ]

new failure at: https://testing.hpdd.intel.com/test_sets/1831f49c-7d82-11e7-9d88-5254006e85c2

This time it was in recovery-small test.

Comment by Nathaniel Clark [ 13/Sep/17 ]

sanity/test_116a with ZFS 0.7.1
https://testing.hpdd.intel.com/test_sets/5c39ea02-980b-11e7-b775-5254006e85c2
OSS (onyx-41vm11)

18:47:36:[ 2630.525032] Lustre: DEBUG MARKER: == sanity test 116a: stripe QOS: free space balance ================================================== 17:47:09 (1505238429)
18:47:36:[ 2664.291335] WARNING: Pool 'lustre-ost1' has encountered an uncorrectable I/O failure and has been suspended.
18:47:36:[ 2664.291335] 
18:47:36:[ 2664.364291] WARNING: Pool 'lustre-ost2' has encountered an uncorrectable I/O failure and has been suspended.
Comment by Andreas Dilger [ 15/Sep/17 ]

Hit this again https://testing.hpdd.intel.com/test_sets/426d4a64-99b0-11e7-ba20-5254006e85c2

Comment by Jian Yu [ 02/Oct/17 ]

sanity test_101d: https://testing.hpdd.intel.com/test_sets/5e3dbaaa-a73c-11e7-b786-5254006e85c2

Comment by Jian Yu [ 03/Oct/17 ]

replay-single test_70c: https://testing.hpdd.intel.com/test_sets/eb3eccaa-a7b4-11e7-bb19-5254006e85c2
replay-single test_101: https://testing.hpdd.intel.com/test_sets/6bcb137a-a958-11e7-bb19-5254006e85c2

Comment by Bob Glossman (Inactive) [ 25/Oct/17 ]

similar error seen on master in conf-sanity, test_106:
https://testing.hpdd.intel.com/test_sets/09fd20c4-b935-11e7-9d39-52540065bddc

Comment by Bob Glossman (Inactive) [ 25/Oct/17 ]

another on master, this time in conf-sanity, test_50f:
https://testing.hpdd.intel.com/test_sets/71e5d1e8-b9cd-11e7-84a9-52540065bddc

seems to be happening a lot now.
I wonder if something recent went on master making this more frequent.

Comment by John Hammond [ 13/Nov/17 ]

Nathaniel,

It's really strange to see the "Pool 'blahblah' has encountered...' message with no other messages indicating an I/O error before or after it. Is this a known phenomenon to the ZFS folks?

Comment by Nathaniel Clark [ 13/Nov/17 ]

jhammond,

I'm not seeing this as a known issue, but I also haven't gotten to dig into one of the fubar'd zpools.

Comment by Nathaniel Clark [ 15/Nov/17 ]

Reproduced with Lustre 2.10.55 ZFS 0.7.3: (test conf-sanity/test_6)

[root@onyx-121vm4 ~]# zpool status
  pool: lustre-ost1
 state: ONLINE
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: http://zfsonlinux.org/msg/ZFS-8000-HC
  scan: none requested
config:

	NAME                STATE     READ WRITE CKSUM
	lustre-ost1         ONLINE       0     0     0
	  /tmp/lustre-ost1  ONLINE       0     0     0

Corresponding messages:

Nov 15 15:40:49 onyx-121vm4.onyx.hpdd.intel.com kernel: WARNING: Pool 'lustre-ost1' has encountered an uncorrectable I/O failure and has been suspended.
Nov 15 15:40:54 onyx-121vm4.onyx.hpdd.intel.com kernel: Lustre: lustre-OST0000 is waiting for obd_unlinked_exports more than 8 seconds. The obd refcount = 6. Is it stuck?
Nov 15 15:40:54 onyx-121vm4.onyx.hpdd.intel.com kernel: Lustre: lustre-OST0000: UNLINKED ffff880017838c00 lustre-MDT0000-mdtlov_UUID 10.2.103.29@tcp 1 (0 1 0) 1 0 0 0:           (null)  0 stale:0
Nov 15 15:40:54 onyx-121vm4.onyx.hpdd.intel.com kernel: Lustre: lustre-OST0000: UNLINKED ffff88007bb6f400 eb0ec782-9708-eeb0-2cc9-b935c49ad433 10.2.103.27@tcp 1 (0 1 0) 1 0 0 0:           (null)  51539607553 stale:0
Nov 15 15:41:10 onyx-121vm4.onyx.hpdd.intel.com kernel: Lustre: lustre-OST0000 is waiting for obd_unlinked_exports more than 16 seconds. The obd refcount = 2. Is it stuck?

A ZDB Dump of the pool is at attached: zdb-lustre-ost1.log

Doing a zpool clear does remove error condition. It's weird that there's no actual error count on any of the devices.

Comment by Jian Yu [ 17/Nov/17 ]

replay-ost-single test 1 also hit this issue:
https://testing.hpdd.intel.com/test_sets/f5106ff6-cbbb-11e7-a066-52540065bddc

This is affecting patch review testing on master branch.

Comment by Jian Yu [ 22/Nov/17 ]

sanity test 181 and conf-sanity test 106 in ZFS test sessions hit this issue:
https://testing.hpdd.intel.com/test_sets/ffa41cec-cf21-11e7-8027-52540065bddc
https://testing.hpdd.intel.com/test_sets/1c200c8e-cf2a-11e7-a066-52540065bddc

Comment by Sarah Liu [ 29/Nov/17 ]

sanity-benchmark test_bonnie, ost-pools test_23b hit the similar issue
2.10.2 RC1 zfs
https://testing.hpdd.intel.com/test_sets/b94986ae-d4fa-11e7-9c63-52540065bddc
https://testing.hpdd.intel.com/test_sets/bb79157a-d4fa-11e7-9c63-52540065bddc

2.10.2 RC1 zfs DNE
https://testing.hpdd.intel.com/test_sets/1aabe594-d69b-11e7-9c63-52540065bddc

Comment by John Hammond [ 08/Dec/17 ]

https://testing.hpdd.intel.com/test_sets/81c7aa0e-db9d-11e7-8027-52540065bddc

Comment by Mikhail Pershin [ 08/Dec/17 ]

sanity 101d this time:
https://testing.hpdd.intel.com/test_sets/cbf6bcd8-dc2d-11e7-a066-52540065bddc

Comment by Kim Sebo [ 09/Dec/17 ]

We're also seeing what seems to be this problem. It only seems to occur during very high MDS loads. It can be reproduced reliably within minutes during a mdtest run. It occurs with all the following software combinations, all with CentOS 7.4 3.10.0-693.2.2 kernel.

Lustre 2.10.1 DNE, ZFS 0.7.3, Mellanox ofed 4.2.1
Lustre 2.10.1 DNE, ZFS 0.7.2, Mellanox ofed 4.2.1
Lustre 2.10.1 DNE, ZFS 0.7.2, Mellanox ofed 4.1.1 (locally patched)

The pool that faults seems random and occurs on all four MDS/MDTs. i/o will resume after a "zpool clear", until another random MDT pool faults soon after.

Syslog has only a single entry:
Dec 9 15:21:38 g1b-mds02 kernel: WARNING: Pool 'zmdt1' has encountered an uncorrectable I/O failure and has been suspended.

# zpool status
  pool: zmdt1
 state: ONLINE
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: http://zfsonlinux.org/msg/ZFS-8000-HC
  scan: none requested
config:

        NAME             STATE  READ WRITE CKSUM
        zmdt1            ONLINE    0     0     0
          mirror-0       ONLINE    0     0     0
             MDT_LUN_03  ONLINE    0     0     0
             MDT_LUN_04  ONLINE    0     0     0

errors: No known data errors

Attached is an excerpt from /proc/spl/kstat/zfs/dbgmsg . The pool faults in between timestamps ...3292 and ...3298). Both logging and "zpool status" was hung for ~5 seconds whilst this occurred.
mds02_faulted.log

Comment by John Hammond [ 09/Dec/17 ]

https://testing.hpdd.intel.com/test_sets/ee2fac14-dc7e-11e7-8027-52540065bddc

Comment by John Hammond [ 18/Dec/17 ]

https://testing.hpdd.intel.com/test_sets/185deada-e22b-11e7-8027-52540065bddc

Comment by John Hammond [ 21/Dec/17 ]

https://testing.hpdd.intel.com/test_sets/17be7004-e22b-11e7-8027-52540065bddc

Comment by John Hammond [ 21/Dec/17 ]

I was able to reproduce the "Pool 'lustre-mdt1' has encountered an uncorrectable I/O failure and has been suspended" message on a local setup by running the root FS out of space.

m:~# df -h
Filesystem                   Size  Used Avail Use% Mounted on
/dev/vda1                     24G   20G  2.7G  88% /
...
lustre-mdt1/mdt1              72M  3.4M   67M   5% /mnt/lustre-mds1
lustre-ost1/ost1             239M  4.0M  233M   2% /mnt/lustre-ost1
lustre-ost2/ost2             239M  8.0M  229M   4% /mnt/lustre-ost2
192.168.122.121@tcp:/lustre  478M   12M  462M   3% /mnt/lustre
m:~# du -hs /tmp/lustre-mdt1 
40M	/tmp/lustre-mdt1
m:~# ls -lh /tmp/lustre-mdt1
-rw-r--r-- 1 root root 196M Dec 21 10:12 /tmp/lustre-mdt1
m:~# dd if=/dev/zero of=/tmp/f0 bs=1M
dd: error writing ‘/tmp/f0’: No space left on device
3982+0 records in
3981+0 records out

Consistent with what we see in AT test logs there were no other messages besides the uncorrectable I/O failure message:

[692069.367224] Lustre: DEBUG MARKER: == sanity-hsm test 13: Recursively import and restore a directory ==================================== 10:13:00 (1513872780)
[692095.705099] WARNING: Pool 'lustre-mdt1' has encountered an uncorrectable I/O failure and has been suspended.

[692095.709069] WARNING: Pool 'lustre-ost1' has encountered an uncorrectable I/O failure and has been suspended.

Perhaps we can avoid this issue by not using sparse backing files for ZFS.

Comment by John Hammond [ 21/Dec/17 ]

Would it be useful to set the pool failure mode to panic to get more information here?

Comment by John Hammond [ 21/Dec/17 ]

Or patch zfs to print more of the zio?

Comment by Gerrit Updater [ 22/Dec/17 ]

John L. Hammond (john.hammond@intel.com) uploaded a new patch: https://review.whamcloud.com/30649
Subject: LU-9845 test: add failmode=panic to zpool import options
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8f3c50b516bfb82d91efc40efb1ce9b0fe832f76

Comment by John Hammond [ 22/Dec/17 ]

This could also be due to a blocked mmp write. Running locally:

m:~# export FSTYPE=zfs
m:~# bash $LUSTRE/tests/llmount.sh
...
m:~# cat /sys/module/zfs/parameters/zfs_multihost_interval 
1000
m:~# echo 100 > /sys/module/zfs/parameters/zfs_multihost_interval # set mmp interval to 100ms
m:~# chrt -f 20 dd if=/dev/zero of=/dev/null &
m:~# chrt -f 20 dd if=/dev/zero of=/dev/null &
m:~# dmesg | tail
[  122.404321] Lustre: Skipped 1 previous similar message
[  122.406378] Lustre: Mounted lustre-client
[  123.540533] Lustre: DEBUG MARKER: Using TIMEOUT=20
[  171.949008] sched: RT throttling activated
[  308.950021] WARNING: Pool 'lustre-ost2' has encountered an uncorrectable I/O failure and has been suspended.

[  308.950055] WARNING: Pool 'lustre-ost1' has encountered an uncorrectable I/O failure and has been suspended.

[  308.950070] WARNING: Pool 'lustre-mdt1' has encountered an uncorrectable I/O failure and has been suspended.

m:~# zpool status lustre-mdt1
  pool: lustre-mdt1
 state: ONLINE
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: http://zfsonlinux.org/msg/ZFS-8000-HC
  scan: none requested
config:

	NAME                STATE     READ WRITE CKSUM
	lustre-mdt1         ONLINE       0     0     0
	  /tmp/lustre-mdt1  ONLINE       0     0     0

errors: No known data errors
Comment by Gerrit Updater [ 05/Jan/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30649/
Subject: LU-9845 test: add failmode=panic to zpool import options
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 76afb10a76830396b1457f51eb16992da8a894a4

Comment by Oleg Drokin [ 07/Jan/18 ]

I had two panic events with the landed patch, both very similar - a sanityn test 77/78 triggers a "Sleep under spinlock" warning and then shortly after the zpool panic hits.

[164923.664467] Lustre: DEBUG MARKER: == sanityn test 77c: check ORR NRS policy ============================================================ 23:24:29 (1515212669)
[164937.316648] BUG: sleeping function called from invalid context at /home/green/git/lustre-release/libcfs/libcfs/hash.c:1120
[164937.319085] in_atomic(): 1, irqs_disabled(): 0, pid: 9459, name: lctl
[164937.319812] INFO: lockdep is turned off.
[164937.320473] CPU: 0 PID: 9459 Comm: lctl Tainted: P        W  OE  ------------   3.10.0-debug #2
[164937.322225] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[164937.322915]  ffff8802e79d4b40 000000007ad8fd8d ffff8802c0183d18 ffffffff816fd3e4
[164937.324369]  ffff8802c0183d30 ffffffff810b00e9 0000000000000018 ffff8802c0183d80
[164937.325689]  ffffffffa0295233 ffffffffa0299cf6 ffffc90015e68000 0000000000001000
[164937.336350] Call Trace:
[164937.337079]  [<ffffffff816fd3e4>] dump_stack+0x19/0x1b
[164937.337746]  [<ffffffff810b00e9>] __might_sleep+0xe9/0x110
[164937.338436]  [<ffffffffa0295233>] cfs_hash_putref+0x233/0x500 [libcfs]
[164937.339120]  [<ffffffffa0299cf6>] ? cfs_binheap_destroy+0x396/0x830 [libcfs]
[164937.339927]  [<ffffffffa11c1909>] nrs_orr_stop+0x69/0x270 [ptlrpc]
[164937.340647]  [<ffffffffa11b9d19>] nrs_policy_stop0+0x39/0x1c0 [ptlrpc]
[164937.354153]  [<ffffffffa11ba688>] nrs_policy_stop_primary.isra.9+0x78/0x1f0 [ptlrpc]
[164937.355884]  [<ffffffffa11bb9c4>] nrs_policy_start_locked+0x624/0x670 [ptlrpc]
[164937.357170]  [<ffffffffa11bbb3b>] nrs_policy_ctl+0x12b/0x2b0 [ptlrpc]
[164937.357910]  [<ffffffffa11be12f>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc]
[164937.359173]  [<ffffffffa11a05b5>] ptlrpc_lprocfs_nrs_seq_write+0x3a5/0x560 [ptlrpc]
[164937.360594]  [<ffffffff8125d3cd>] proc_reg_write+0x3d/0x80
[164937.362287]  [<ffffffff811ed36d>] vfs_write+0xbd/0x1e0
[164937.362943]  [<ffffffff811fe3dd>] ? putname+0x3d/0x60
[164937.363595]  [<ffffffff811ede34>] SyS_write+0x84/0xf0
[164937.364292]  [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b
[164937.364958] BUG: scheduling while atomic: lctl/9459/0x10000002
[164937.365627] INFO: lockdep is turned off.
[164937.366239] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_zfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) ext4 mbcache loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate jbd2 syscopyarea sysfillrect sysimgblt ttm ata_generic drm_kms_helper pata_acpi drm i2c_piix4 ata_piix pcspkr virtio_balloon serio_raw virtio_blk virtio_console i2c_core libata floppy nfsd ip_tables rpcsec_gss_krb5 [last unloaded: libcfs]
[164937.412757] CPU: 0 PID: 9459 Comm: lctl Tainted: P        W  OE  ------------   3.10.0-debug #2
[164937.414008] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[164937.416003]  ffff8802e79d4b40 000000007ad8fd8d ffff8802c0183c90 ffffffff816fd3e4
[164937.418242]  ffff8802c0183ca8 ffffffff816f9218 ffff8802c0183fd8 ffff8802c0183d08
[164937.419596]  ffffffff817041a7 ffff8802e79d4b40 ffff8802c0183fd8 ffff8802c0183fd8
[164937.421025] Call Trace:
[164937.421640]  [<ffffffff816fd3e4>] dump_stack+0x19/0x1b
[164937.422370]  [<ffffffff816f9218>] __schedule_bug+0x59/0x68
[164937.423092]  [<ffffffff817041a7>] __schedule+0x807/0x950
[164937.423866]  [<ffffffff810b4936>] __cond_resched+0x26/0x30
[164937.424823]  [<ffffffff817045da>] _cond_resched+0x3a/0x50
[164937.425472]  [<ffffffffa0295238>] cfs_hash_putref+0x238/0x500 [libcfs]
[164937.426355]  [<ffffffffa0299cf6>] ? cfs_binheap_destroy+0x396/0x830 [libcfs]
[164937.427348]  [<ffffffffa11c1909>] nrs_orr_stop+0x69/0x270 [ptlrpc]
[164937.428483]  [<ffffffffa11b9d19>] nrs_policy_stop0+0x39/0x1c0 [ptlrpc]
[164937.429204]  [<ffffffffa11ba688>] nrs_policy_stop_primary.isra.9+0x78/0x1f0 [ptlrpc]
[164937.430526]  [<ffffffffa11bb9c4>] nrs_policy_start_locked+0x624/0x670 [ptlrpc]
[164937.440376]  [<ffffffffa11bbb3b>] nrs_policy_ctl+0x12b/0x2b0 [ptlrpc]
[164937.440915]  [<ffffffffa11be12f>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc]
[164937.441796]  [<ffffffffa11a05b5>] ptlrpc_lprocfs_nrs_seq_write+0x3a5/0x560 [ptlrpc]
[164937.442696]  [<ffffffff8125d3cd>] proc_reg_write+0x3d/0x80
[164937.443151]  [<ffffffff811ed36d>] vfs_write+0xbd/0x1e0
[164937.443621]  [<ffffffff811fe3dd>] ? putname+0x3d/0x60
[164937.444235]  [<ffffffff811ede34>] SyS_write+0x84/0xf0
[164937.444903]  [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b
[164937.510313] BUG: scheduling while atomic: lctl/9459/0x10000002
[164937.511243] INFO: lockdep is turned off.
[164937.511901] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_zfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) ext4 mbcache loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate jbd2 syscopyarea sysfillrect sysimgblt ttm ata_generic drm_kms_helper pata_acpi drm i2c_piix4 ata_piix pcspkr virtio_balloon serio_raw virtio_blk virtio_console i2c_core libata floppy nfsd ip_tables rpcsec_gss_krb5 [last unloaded: libcfs]
[164937.539224] CPU: 15 PID: 9459 Comm: lctl Tainted: P        W  OE  ------------   3.10.0-debug #2
[164937.540796] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[164937.544056]  ffff8802e79d4b40 000000007ad8fd8d ffff8802c0183c90 ffffffff816fd3e4
[164937.578747]  ffff8802c0183ca8 ffffffff816f9218 ffff8802c0183fd8 ffff8802c0183d08
[164937.581213]  ffffffff817041a7 ffff8802e79d4b40 ffff8802c0183fd8 ffff8802c0183fd8
[164937.582630] Call Trace:
[164937.583224]  [<ffffffff816fd3e4>] dump_stack+0x19/0x1b
[164937.583905]  [<ffffffff816f9218>] __schedule_bug+0x59/0x68
[164937.584571]  [<ffffffff817041a7>] __schedule+0x807/0x950
[164937.585220]  [<ffffffff810b4936>] __cond_resched+0x26/0x30
[164937.585885]  [<ffffffff817045da>] _cond_resched+0x3a/0x50
[164937.586567]  [<ffffffffa0295238>] cfs_hash_putref+0x238/0x500 [libcfs]
[164937.587246]  [<ffffffffa0299cf6>] ? cfs_binheap_destroy+0x396/0x830 [libcfs]
[164937.588052]  [<ffffffffa11c1909>] nrs_orr_stop+0x69/0x270 [ptlrpc]
[164937.588774]  [<ffffffffa11b9d19>] nrs_policy_stop0+0x39/0x1c0 [ptlrpc]
[164937.589496]  [<ffffffffa11ba688>] nrs_policy_stop_primary.isra.9+0x78/0x1f0 [ptlrpc]
[164937.590765]  [<ffffffffa11bb9c4>] nrs_policy_start_locked+0x624/0x670 [ptlrpc]
[164937.592025]  [<ffffffffa11bbb3b>] nrs_policy_ctl+0x12b/0x2b0 [ptlrpc]
[164937.592729]  [<ffffffffa11be12f>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc]
[164937.593993]  [<ffffffffa11a05b5>] ptlrpc_lprocfs_nrs_seq_write+0x3a5/0x560 [ptlrpc]
[164937.595191]  [<ffffffff8125d3cd>] proc_reg_write+0x3d/0x80
[164937.596143]  [<ffffffff811ed36d>] vfs_write+0xbd/0x1e0
[164937.596779]  [<ffffffff811fe3dd>] ? putname+0x3d/0x60
[164937.597413]  [<ffffffff811ede34>] SyS_write+0x84/0xf0
[164937.598038]  [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b
[164937.614894] BUG: scheduling while atomic: lctl/9459/0x10000002
[164937.615623] INFO: lockdep is turned off.
[164937.616240] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_zfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) ext4 mbcache loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate jbd2 syscopyarea sysfillrect sysimgblt ttm ata_generic drm_kms_helper pata_acpi drm i2c_piix4 ata_piix pcspkr virtio_balloon serio_raw virtio_blk virtio_console i2c_core libata floppy nfsd ip_tables rpcsec_gss_krb5 [last unloaded: libcfs]
[164937.622741] CPU: 15 PID: 9459 Comm: lctl Tainted: P        W  OE  ------------   3.10.0-debug #2
[164937.623998] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[164937.624663]  ffff8802e79d4b40 000000007ad8fd8d ffff8802c0183c90 ffffffff816fd3e4
[164937.626597]  ffff8802c0183ca8 ffffffff816f9218 ffff8802c0183fd8 ffff8802c0183d08
[164937.627859]  ffffffff817041a7 ffff8802e79d4b40 ffff8802c0183fd8 ffff8802c0183fd8
[164937.630788] Call Trace:
[164937.631507]  [<ffffffff816fd3e4>] dump_stack+0x19/0x1b
[164937.632306]  [<ffffffff816f9218>] __schedule_bug+0x59/0x68
[164937.633014]  [<ffffffff817041a7>] __schedule+0x807/0x950
[164937.634560]  [<ffffffff810b4936>] __cond_resched+0x26/0x30
[164937.635082]  [<ffffffff817045da>] _cond_resched+0x3a/0x50
[164937.635744]  [<ffffffffa0295238>] cfs_hash_putref+0x238/0x500 [libcfs]
[164937.636594]  [<ffffffffa0299cf6>] ? cfs_binheap_destroy+0x396/0x830 [libcfs]
[164937.643706]  [<ffffffffa11c1909>] nrs_orr_stop+0x69/0x270 [ptlrpc]
[164937.644453]  [<ffffffffa11b9d19>] nrs_policy_stop0+0x39/0x1c0 [ptlrpc]
[164937.645174]  [<ffffffffa11ba688>] nrs_policy_stop_primary.isra.9+0x78/0x1f0 [ptlrpc]
[164937.656192]  [<ffffffffa11bb9c4>] nrs_policy_start_locked+0x624/0x670 [ptlrpc]
[164937.657543]  [<ffffffffa11bbb3b>] nrs_policy_ctl+0x12b/0x2b0 [ptlrpc]
[164937.658320]  [<ffffffffa11be12f>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc]
[164937.659782]  [<ffffffffa11a05b5>] ptlrpc_lprocfs_nrs_seq_write+0x3a5/0x560 [ptlrpc]
[164937.661211]  [<ffffffff8125d3cd>] proc_reg_write+0x3d/0x80
[164937.661885]  [<ffffffff811ed36d>] vfs_write+0xbd/0x1e0
[164937.663536]  [<ffffffff811fe3dd>] ? putname+0x3d/0x60
[164937.664179]  [<ffffffff811ede34>] SyS_write+0x84/0xf0
[164937.666440]  [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b
[164937.681884] BUG: scheduling while atomic: lctl/9459/0x10000002
[164937.682619] INFO: lockdep is turned off.
[164937.683241] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_zfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) ext4 mbcache loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate jbd2 syscopyarea sysfillrect sysimgblt ttm ata_generic drm_kms_helper pata_acpi drm i2c_piix4 ata_piix pcspkr virtio_balloon serio_raw virtio_blk virtio_console i2c_core libata floppy nfsd ip_tables rpcsec_gss_krb5 [last unloaded: libcfs]
[164937.721046] CPU: 15 PID: 9459 Comm: lctl Tainted: P        W  OE  ------------   3.10.0-debug #2
[164937.722320] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[164937.722980]  ffff8802e79d4b40 000000007ad8fd8d ffff8802c0183c90 ffffffff816fd3e4
[164937.724242]  ffff8802c0183ca8 ffffffff816f9218 ffff8802c0183fd8 ffff8802c0183d08
[164937.748057]  ffffffff817041a7 ffff8802e79d4b40 ffff8802c0183fd8 ffff8802c0183fd8
[164937.755299] Call Trace:
[164937.755938]  [<ffffffff816fd3e4>] dump_stack+0x19/0x1b
[164937.757446]  [<ffffffff816f9218>] __schedule_bug+0x59/0x68
[164937.764367]  [<ffffffff817041a7>] __schedule+0x807/0x950
[164937.765081]  [<ffffffff810b4936>] __cond_resched+0x26/0x30
[164937.765847]  [<ffffffff817045da>] _cond_resched+0x3a/0x50
[164937.766658]  [<ffffffffa0295238>] cfs_hash_putref+0x238/0x500 [libcfs]
[164937.767358]  [<ffffffffa0299cf6>] ? cfs_binheap_destroy+0x396/0x830 [libcfs]
[164937.768142]  [<ffffffffa11c1909>] nrs_orr_stop+0x69/0x270 [ptlrpc]
[164937.768970]  [<ffffffffa11b9d19>] nrs_policy_stop0+0x39/0x1c0 [ptlrpc]
[164937.769730]  [<ffffffffa11ba688>] nrs_policy_stop_primary.isra.9+0x78/0x1f0 [ptlrpc]
[164937.772095]  [<ffffffffa11bb9c4>] nrs_policy_start_locked+0x624/0x670 [ptlrpc]
[164937.775802]  [<ffffffffa11bbb3b>] nrs_policy_ctl+0x12b/0x2b0 [ptlrpc]
[164937.776892]  [<ffffffffa11be12f>] ptlrpc_nrs_policy_control+0x10f/0x2d0 [ptlrpc]
[164937.778593]  [<ffffffffa11a05b5>] ptlrpc_lprocfs_nrs_seq_write+0x3a5/0x560 [ptlrpc]
[164937.779895]  [<ffffffff8125d3cd>] proc_reg_write+0x3d/0x80
[164937.780630]  [<ffffffff811ed36d>] vfs_write+0xbd/0x1e0
[164937.781701]  [<ffffffff811fe3dd>] ? putname+0x3d/0x60
[164937.784249]  [<ffffffff811ede34>] SyS_write+0x84/0xf0
[164937.784959]  [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b
[164941.981909] Kernel panic - not syncing: Pool 'lustre-ost2' has encountered an uncorrectable I/O failure and the failure mode property for this pool is set to panic.
[164941.985000] CPU: 0 PID: 13624 Comm: mmp Tainted: P        W  OE  ------------   3.10.0-debug #2
[164941.986459] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[164941.987216]  ffffffffa0c844c8 00000000ff7aa3be ffff8802cf5a3cc0 ffffffff816fd3e4
[164941.988699]  ffff8802cf5a3d40 ffffffff816f8c34 ffff880200000010 ffff8802cf5a3d50
[164941.990160]  ffff8802cf5a3cf0 00000000ff7aa3be ffffffffa0565213 ffff8802266ec000
[164941.991644] Call Trace:
[164941.992359]  [<ffffffff816fd3e4>] dump_stack+0x19/0x1b
[164941.993104]  [<ffffffff816f8c34>] panic+0xd8/0x1e7
[164941.993863]  [<ffffffffa0565213>] ? __cv_timedwait_common+0x143/0x240 [spl]
[164942.007808]  [<ffffffffa0bdf279>] zio_suspend+0x1c9/0x1d0 [zfs]
[164942.008631]  [<ffffffffa0b56242>] mmp_thread+0x9c2/0xb30 [zfs]
[164942.009436]  [<ffffffffa0b55640>] ? metaslab_check_free+0x190/0x190 [zfs]
[164942.010251]  [<ffffffffa0b55880>] ? mmp_write_done+0x240/0x240 [zfs]
[164942.011057]  [<ffffffffa055f091>] thread_generic_wrapper+0x71/0x80 [spl]
[164942.011793]  [<ffffffffa055f020>] ? __thread_exit+0x20/0x20 [spl]
[164942.012514]  [<ffffffff810a2eba>] kthread+0xea/0xf0
[164942.013119]  [<ffffffff810a2dd0>] ? kthread_create_on_node+0x140/0x140
[164942.013767]  [<ffffffff8170fb98>] ret_from_fork+0x58/0x90
[164942.014394]  [<ffffffff810a2dd0>] ? kthread_create_on_node+0x140/0x140

Hm, I was under full impression tehre was a ticket for those schedule when atomic stuff in nrs_orr_stop but cannot find it.

the other crash was in test 78 with a lot more warnings in between before hitting the panic, and Jira does not like such a long comment.

Comment by nasf (Inactive) [ 08/Jan/18 ]

+1 on master:
https://testing.hpdd.intel.com/test_sets/15f738d8-f463-11e7-8c23-52540065bddc

Comment by John Hammond [ 08/Jan/18 ]

The panic in https://testing.hpdd.intel.com/test_sets/15f738d8-f463-11e7-8c23-52540065bddc is due to MMP:

[ 7799.830025] Kernel panic - not syncing: Pool 'lustre-mdt1' has encountered an uncorrectable I/O failure and the failure mode property for this pool is set to panic.
[ 7799.831007] CPU: 1 PID: 15805 Comm: mmp Tainted: P           OE  ------------   3.10.0-693.11.1.el7_lustre.x86_64 #1
[ 7799.831007] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[ 7799.831007]  ffff880041bda000 000000002c0bba5e ffff880056c73d28 ffffffff816a3e3d
[ 7799.831007]  ffff880056c73da8 ffffffff8169dd24 ffff880000000010 ffff880056c73db8
[ 7799.831007]  ffff880056c73d58 000000002c0bba5e ffff880056c73dc0 ffff880041bda000
[ 7799.831007] Call Trace:
[ 7799.831007]  [<ffffffff816a3e3d>] dump_stack+0x19/0x1b
[ 7799.831007]  [<ffffffff8169dd24>] panic+0xe8/0x20d
[ 7799.831007]  [<ffffffffc08bbcb6>] zio_suspend+0x106/0x110 [zfs]
[ 7799.831007]  [<ffffffffc08427ea>] mmp_thread+0x7ea/0x820 [zfs]
[ 7799.831007]  [<ffffffffc0841e90>] ? metaslab_check_free+0x190/0x190 [zfs]
[ 7799.831007]  [<ffffffffc0842000>] ? mmp_write_done+0x170/0x170 [zfs]
[ 7799.831007]  [<ffffffffc070cfa1>] thread_generic_wrapper+0x71/0x80 [spl]
[ 7799.831007]  [<ffffffffc070cf30>] ? __thread_exit+0x20/0x20 [spl]
[ 7799.831007]  [<ffffffff810b099f>] kthread+0xcf/0xe0
[ 7799.831007]  [<ffffffff810b08d0>] ? insert_kthread_work+0x40/0x40
[ 7799.831007]  [<ffffffff816b4fd8>] ret_from_fork+0x58/0x90
[ 7799.831007]  [<ffffffff810b08d0>] ? insert_kthread_work+0x40/0x40
Comment by Oleg Drokin [ 09/Jan/18 ]

aha, LU-6849 is where I was putting my reports

Comment by Sebastien Buisson (Inactive) [ 10/Jan/18 ]

Hit with sanity test_64d:
https://testing.hpdd.intel.com/test_sets/696dfeac-f562-11e7-854b-52540065bddc

Panic is due to MMP:

[ 2212.716735] Kernel panic - not syncing: Pool 'lustre-ost1' has encountered an uncorrectable I/O failure and the failure mode property for this pool is set to panic.
[ 2212.718347] CPU: 0 PID: 777 Comm: mmp Tainted: P           OE  ------------   3.10.0-693.11.1.el7_lustre.x86_64 #1
[ 2212.719366] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 2212.719946]  ffff880058852000 00000000554e0bda ffff88005335bd28 ffffffff816a3e3d
[ 2212.720772]  ffff88005335bda8 ffffffff8169dd24 ffff880000000010 ffff88005335bdb8
[ 2212.721598]  ffff88005335bd58 00000000554e0bda ffff88005335bdc0 ffff880058852000
[ 2212.722421] Call Trace:
[ 2212.722687]  [<ffffffff816a3e3d>] dump_stack+0x19/0x1b
[ 2212.723211]  [<ffffffff8169dd24>] panic+0xe8/0x20d
[ 2212.723744]  [<ffffffffc07facb6>] zio_suspend+0x106/0x110 [zfs]
[ 2212.724356]  [<ffffffffc07817ea>] mmp_thread+0x7ea/0x820 [zfs]
[ 2212.724969]  [<ffffffffc0780e90>] ? metaslab_check_free+0x190/0x190 [zfs]
[ 2212.725676]  [<ffffffffc0781000>] ? mmp_write_done+0x170/0x170 [zfs]
[ 2212.726312]  [<ffffffffc064bfa1>] thread_generic_wrapper+0x71/0x80 [spl]
[ 2212.726988]  [<ffffffffc064bf30>] ? __thread_exit+0x20/0x20 [spl]
[ 2212.727604]  [<ffffffff810b099f>] kthread+0xcf/0xe0
[ 2212.728092]  [<ffffffff810b08d0>] ? insert_kthread_work+0x40/0x40
[ 2212.728724]  [<ffffffff816b4fd8>] ret_from_fork+0x58/0x90
[ 2212.729261]  [<ffffffff810b08d0>] ? insert_kthread_work+0x40/0x40

ZFS 0.7.3

Comment by Nathaniel Clark [ 11/Jan/18 ]

vdev_random_leaf() (which is used to pick a vdev to write to in mmp_thread) was rewritten as mmp_random_leaf() (See ZFS Pull 6665) for ZFS 0.7.5/0.7.4. Looking at the code, I don't see this being the issue that was fixed, as this prevents an infinite loop if upper vdev is writeable with no writable children.

Comment by Nathaniel Clark [ 11/Jan/18 ]

b2_10, ZFS 0.7.5
https://testing.hpdd.intel.com/test_sets/db69c948-f5b5-11e7-94c7-52540065bddc

sanityn/test_40d (MMP trips during test 40c, but nothing hangs until beginning of test 40d)

Jan  9 23:47:51 trevis-4vm7 kernel: WARNING: Pool 'lustre-ost2' has encountered an uncorrectable I/O failure and has been suspended.
Jan  9 23:47:52 trevis-4vm7 kernel: WARNING: Pool 'lustre-ost1' has encountered an uncorrectable I/O failure and has been suspended.
Comment by Sarah Liu [ 11/Jan/18 ]

another ones on 2.10.3
https://testing.hpdd.intel.com/test_sets/9e9b0404-f6f1-11e7-a7cd-52540065bddc
https://testing.hpdd.intel.com/test_sets/a02edd7c-f6f1-11e7-a7cd-52540065bddc

[ 4280.251316] Lustre: DEBUG MARKER: == replay-ost-single test 5: Fail OST during iozone ================================================== 17:57:51 (1515607071)
[ 4288.629724] Lustre: DEBUG MARKER: grep -c /mnt/lustre-ost1' ' /proc/mounts
[ 4289.211154] Lustre: DEBUG MARKER: umount -d /mnt/lustre-ost1
[ 4289.591211] Lustre: Failing over lustre-OST0000
[ 4292.839279] Lustre: server umount lustre-OST0000 complete
[ 4293.002308] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
[ 4293.343927] Lustre: DEBUG MARKER: ! zpool list -H lustre-ost1 >/dev/null 2>&1 ||
			grep -q ^lustre-ost1/ /proc/mounts ||
			zpool export  lustre-ost1
[ 4304.792431] Lustre: DEBUG MARKER: hostname
[ 4305.199104] Lustre: DEBUG MARKER: lsmod | grep zfs >&/dev/null || modprobe zfs;
			zpool list -H lustre-ost1 >/dev/null 2>&1 ||
			zpool import -f -o cachefile=none -d /dev/lvm-Role_OSS lustre-ost1
[ 4306.781168] Lustre: DEBUG MARKER: zfs get -H -o value 						lustre:svname lustre-ost1/ost1
[ 4307.103538] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-ost1; mount -t lustre   		                   lustre-ost1/ost1 /mnt/lustre-ost1
[ 4307.465621] Lustre: lustre-OST0000: Imperative Recovery enabled, recovery window shrunk from 60-180 down to 60-180
[ 4307.633766] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
[ 4307.958667] Lustre: lustre-OST0000: Will be in recovery for at least 1:00, or until 6 clients reconnect
[ 4307.960237] Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
[ 4308.555493] Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-40vm8.onyx.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 4308.567312] Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-40vm8.onyx.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 4308.764786] Lustre: DEBUG MARKER: onyx-40vm8.onyx.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 4308.765360] Lustre: DEBUG MARKER: onyx-40vm8.onyx.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 4308.940402] Lustre: DEBUG MARKER: zfs get -H -o value 				lustre:svname lustre-ost1/ost1 2>/dev/null | 				grep -E ':[a-zA-Z]{3}[0-9]{4}'
[ 4309.261238] Lustre: DEBUG MARKER: zfs get -H -o value 				lustre:svname lustre-ost1/ost1 2>/dev/null | 				grep -E ':[a-zA-Z]{3}[0-9]{4}'
[ 4309.590687] Lustre: DEBUG MARKER: zfs get -H -o value lustre:svname 		                           lustre-ost1/ost1 2>/dev/null
[ 4310.483010] Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-40vm7.onyx.hpdd.intel.com: executing wait_import_state_mount FULL osc.lustre-OST0000-osc-*.ost_server_uuid
[ 4310.555992] Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-40vm6.onyx.hpdd.intel.com: executing wait_import_state_mount FULL osc.lustre-OST0000-osc-*.ost_server_uuid
[ 4310.686461] Lustre: DEBUG MARKER: onyx-40vm7.onyx.hpdd.intel.com: executing wait_import_state_mount FULL osc.lustre-OST0000-osc-*.ost_server_uuid
[ 4310.773159] Lustre: DEBUG MARKER: onyx-40vm6.onyx.hpdd.intel.com: executing wait_import_state_mount FULL osc.lustre-OST0000-osc-*.ost_server_uuid
[ 4310.799378] Lustre: lustre-OST0000: Recovery over after 0:03, of 6 clients 6 recovered and 0 were evicted.
[ 4310.803867] Lustre: lustre-OST0000: deleting orphan objects from 0x300000401:34 to 0x300000401:417
[ 4310.808957] Lustre: lustre-OST0000: deleting orphan objects from 0x0:2436 to 0x0:2465
[ 4310.810587] Lustre: lustre-OST0000: deleting orphan objects from 0x300000400:34 to 0x300000400:417
[ 4310.812828] Lustre: lustre-OST0000: deleting orphan objects from 0x300000402:34 to 0x300000402:417
[ 4311.105228] Lustre: DEBUG MARKER: /usr/sbin/lctl mark osc.lustre-OST0000-osc-*.ost_server_uuid in FULL state after 0 sec
[ 4311.497754] Lustre: DEBUG MARKER: osc.lustre-OST0000-osc-*.ost_server_uuid in FULL state after 0 sec
[ 4312.069870] Lustre: DEBUG MARKER: /usr/sbin/lctl mark osc.lustre-OST0000-osc-*.ost_server_uuid in FULL state after 1 sec
[ 4312.375301] Lustre: DEBUG MARKER: osc.lustre-OST0000-osc-*.ost_server_uuid in FULL state after 1 sec
[ 6614.243576] WARNING: Pool 'lustre-ost1' has encountered an uncorrectable I/O failure and has been suspended.

[ 6614.285585] WARNING: Pool 'lustre-ost3' has encountered an uncorrectable I/O failure and has been suspended.

[ 6614.397584] WARNING: Pool 'lustre-ost7' has encountered an uncorrectable I/O failure and has been suspended.

[ 6614.501582] WARNING: Pool 'lustre-ost6' has encountered an uncorrectable I/O failure and has been suspended.

[ 6614.504581] WARNING: Pool 'lustre-ost8' has encountered an uncorrectable I/O failure and has been suspended.

[ 6614.510587] WARNING: Pool 'lustre-ost5' has encountered an uncorrectable I/O failure and has been suspended.

[ 6614.796590] WARNING: Pool 'lustre-ost2' has encountered an uncorrectable I/O failure and has been suspended.

[ 6615.011582] WARNING: Pool 'lustre-ost4' has encountered an uncorrectable I/O failure and has been suspended.

[ 6649.202587] LNet: Service thread pid 30548 was inactive for 40.12s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
[ 6649.204384] Pid: 30548, comm: ll_ost_io00_003
[ 6649.204827] 
Call Trace:
[ 6649.205292]  [<ffffffff816ab6b9>] schedule+0x29/0x70
[ 6649.205810]  [<ffffffffc0647515>] cv_wait_common+0x125/0x150 [spl]
[ 6649.206485]  [<ffffffff810b34b0>] ? autoremove_wake_function+0x0/0x40
[ 6649.207126]  [<ffffffffc0647555>] __cv_wait+0x15/0x20 [spl]
[ 6649.207776]  [<ffffffffc0796a2f>] txg_wait_synced+0xef/0x140 [zfs]
[ 6649.208409]  [<ffffffffc074bdf5>] dmu_tx_wait+0x275/0x3c0 [zfs]
[ 6649.209018]  [<ffffffffc074bfd1>] dmu_tx_assign+0x91/0x490 [zfs]
[ 6649.209628]  [<ffffffffc0fbcf3a>] osd_trans_start+0xaa/0x3c0 [osd_zfs]
[ 6649.210273]  [<ffffffffc11101cb>] ofd_trans_start+0x6b/0xe0 [ofd]
[ 6649.210880]  [<ffffffffc11165eb>] ofd_commitrw_write+0x94b/0x1c50 [ofd]
[ 6649.211528]  [<ffffffffc111a5a9>] ofd_commitrw+0x4b9/0xac0 [ofd]
[ 6649.212188]  [<ffffffffc0e12eb7>] obd_commitrw+0x2ed/0x330 [ptlrpc]
[ 6649.212847]  [<ffffffffc0de5d91>] tgt_brw_write+0xff1/0x17c0 [ptlrpc]
[ 6649.213489]  [<ffffffff813300bb>] ? string.isra.7+0x3b/0xf0
[ 6649.214075]  [<ffffffffc0d37e60>] ? target_bulk_timeout+0x0/0xb0 [ptlrpc]
[ 6649.214787]  [<ffffffffc0de1da5>] tgt_request_handle+0x925/0x1370 [ptlrpc]
[ 6649.215494]  [<ffffffffc0d8ab16>] ptlrpc_server_handle_request+0x236/0xa90 [ptlrpc]
[ 6649.216262]  [<ffffffff810bc0f8>] ? __wake_up_common+0x58/0x90
[ 6649.216871]  [<ffffffffc0d8e252>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
[ 6649.217527]  [<ffffffffc0d8d7c0>] ? ptlrpc_main+0x0/0x1e40 [ptlrpc]
[ 6649.218155]  [<ffffffff810b252f>] kthread+0xcf/0xe0
[ 6649.218647]  [<ffffffff810b2460>] ? kthread+0x0/0xe0
[ 6649.219151]  [<ffffffff816b8798>] ret_from_fork+0x58/0x90
[ 6649.219687]  [<ffffffff810b2460>] ? kthread+0x0/0xe0

[ 6649.220352] LustreError: dumping log to /tmp/lustre-log.1515609441.30548
[ 7204.082593] Lustre: 11314:0:(service.c:1346:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply
  req@ffff8800694ce050 x1589224788664784/t0(0) o4->1a4123cb-0667-4fc1-a984-578de9d574bd@10.2.8.124@tcp:350/0 lens 784/488 e 24 to 0 dl 1515610000 ref 2 fl Interpret:/0/0 rc 0/0
[ 7209.956336] Lustre: lustre-OST0000: Client 1a4123cb-0667-4fc1-a984-578de9d574bd (at 10.2.8.124@tcp) reconnecting
[ 7209.957438] Lustre: Skipped 3 previous similar messages
[ 7209.957986] Lustre: lustre-OST0000: Connection restored to 1bcb3ced-3143-3f53-430e-80a587c85a02 (at 10.2.8.124@tcp)
[ 7209.959086] Lustre: Skipped 23 previous similar messages
[ 7806.066639] Lustre: 11899:0:(service.c:1346:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (4/4), not sending early reply
  req@ffff880069229c50 x1589224788689760/t0(0) o4->1a4123cb-0667-4fc1-a984-578de9d574bd@10.2.8.124@tcp:196/0 lens 608/448 e 1 to 0 dl 1515610601 ref 2 fl Interpret:/2/0 rc 0/0
[ 7810.960502] Lustre: lustre-OST0000: Client 1a4123cb-0667-4fc1-a984-578de9d574bd (at 10.2.8.124@tcp) reconnecting
[ 7810.961613] Lustre: lustre-OST0000: Connection restored to 1bcb3ced-3143-3f53-430e-80a587c85a02 (at 10.2.8.124@tcp)

Comment by Bob Glossman (Inactive) [ 12/Jan/18 ]

another on master:
https://testing.hpdd.intel.com/test_sets/68134712-f759-11e7-a7cd-52540065bddc

Comment by John Hammond [ 15/Jan/18 ]

Upstream zfsonlinux issue to track message improvement: https://github.com/zfsonlinux/zfs/issues/7045

Comment by Andreas Dilger [ 16/Jan/18 ]

Digging through the ZFS code a bit, I see there is a zfs_multihost_history tunable (default = 0) that can be set to track the history of MMP überblock writes. If the patches on https://github.com/zfsonlinux/zfs/issues/7045 do not solve the problem, it might be useful to collect the multihost stats in case of a test timeout/failure to see if there is some useful information (e.g. whether mmp_delay is increasing steadily or if it is a step function, what mmp_delay * vdev_count is vs. mmp_interval, etc).

Comment by Brian Behlendorf [ 16/Jan/18 ]

Yes, it would be great if you could enable the multihost history so we can get a clear idea of exactly what's happening with the delay. You'll want to set zfs_multihost_history=100, where 100 means keep a history of the last 100 mmp writes. You can then dump the log by reading /proc/spl/kstat/zfs/<pool>/multihost.

echo 100 >/sys/module/zfs/parameters/zfs_multihost_history
cat /proc/spl/kstat/zfs/jet22-1/multihost
38 0 0x01 13 832 12020281026402 14414698950019
txg        timestamp  mmp_delay    vdev_guid                vdev_label vdev_path
...
28271769   1516145833 102524916    13587132990859797684     1          /dev/disk/by-vdev/L1
28271769   1516145833 102498562    18032937067254868643     0          /dev/disk/by-vdev/L56
28271769   1516145833 102470510    18032937067254868643     0          /dev/disk/by-vdev/L56
28271769   1516145833 102443525    18235850673591164513     1          /dev/disk/by-vdev/L15
28271769   1516145833 102435221    8353277438441160016      2          /dev/disk/by-vdev/L28


Comment by Jian Yu [ 20/Jan/18 ]

+1 on master branch:
https://testing.hpdd.intel.com/test_sets/72655ed0-fcef-11e7-bd00-52540065bddc

Comment by Minh Diep [ 22/Jan/18 ]

+1 on b2_10
https://testing.hpdd.intel.com/test_sessions/d2abdea2-15b2-43c2-a1aa-d9e3a8d322c6

Comment by Minh Diep [ 23/Jan/18 ]

FYI, we just added below params in our infrastructure.

options zfs zfs_multihost_history=100
options zfs zfs_multihost_fail_intervals=20

Comment by nasf (Inactive) [ 26/Jan/18 ]

+1 on master:
https://testing.hpdd.intel.com/test_sets/df6be592-01e7-11e8-bd00-52540065bddc

Comment by Jian Yu [ 15/Feb/18 ]

sanity test 124a hit this failure on master branch:
https://testing.hpdd.intel.com/test_sets/7d550360-11f6-11e8-a7cd-52540065bddc

Comment by Jian Yu [ 21/Mar/18 ]

+1 on master branch:
https://testing.hpdd.intel.com/test_sets/db7af22c-2c6a-11e8-b3c6-52540065bddc

Comment by Joe Grund [ 06/Jun/18 ]

Looks like we saw this in IML testing: https://github.com/intel-hpdd/intel-manager-for-lustre/issues/645

Comment by Nathaniel Clark [ 13/Sep/18 ]

With the upgrade to ZFS 0.7.9 and the increase in the multihost fail interval, this is no longer an issue.

Comment by James Nunez (Inactive) [ 12/Mar/19 ]

We're seeing this error message again with ZFS testing. Looking at a recent 2.10.7 RC1 test session using ZFS 0.7.9-1, https://testing.whamcloud.com/test_sessions/a35ede3e-36a0-4a9f-92f6-e22a3cda1bec , we see this error in the OST (vm3) console for ost-pools test 23b

============================================================= 23:14:20 \(1552259660\)
[14220.079856] Lustre: DEBUG MARKER: == ost-pools test 23b: OST pools and OOS ============================================================= 23:14:20 (1552259660)
[14414.872497] WARNING: MMP writes to pool 'lustre-ost4' have not succeeded in over 20s; suspending pool
[14414.874142] WARNING: Pool 'lustre-ost4' has encountered an uncorrectable I/O failure and has been suspended.
[14414.874142] 
[14414.878399] WARNING: MMP writes to pool 'lustre-ost8' have not succeeded in over 20s; suspending pool
[14414.880115] WARNING: Pool 'lustre-ost8' has encountered an uncorrectable I/O failure and has been suspended.
[14414.880115] 
[14415.175128] WARNING: MMP writes to pool 'lustre-ost7' have not succeeded in over 20s; suspending pool
[14415.176759] WARNING: Pool 'lustre-ost7' has encountered an uncorrectable I/O failure and has been suspended.
[14415.176759] 
[14415.251535] WARNING: MMP writes to pool 'lustre-ost2' have not succeeded in over 20s; suspending pool
[14415.253116] WARNING: Pool 'lustre-ost2' has encountered an uncorrectable I/O failure and has been suspended.
[14415.253116] 
[14465.182431] LNet: Service thread pid 16167 was inactive for 78.05s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
[14465.185338] Pid: 16167, comm: ll_ost_io00_006 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Fri Mar 8 06:17:38 UTC 2019
[14465.187046] Call Trace:
[14465.187545]  [<ffffffffc03ae2b5>] cv_wait_common+0x125/0x150 [spl]
[14465.188787]  [<ffffffffc03ae2f5>] __cv_wait+0x15/0x20 [spl]
[14465.189791]  [<ffffffffc0505c0f>] txg_wait_synced+0xef/0x140 [zfs]
[14465.190919]  [<ffffffffc04bba15>] dmu_tx_wait+0x275/0x3c0 [zfs]
[14465.191992]  [<ffffffffc04bbbf1>] dmu_tx_assign+0x91/0x490 [zfs]
[14465.193078]  [<ffffffffc1123f57>] osd_trans_start+0xa7/0x3a0 [osd_zfs]
[14465.194216]  [<ffffffffc124e31e>] ofd_trans_start+0x6e/0xf0 [ofd]
[14465.195327]  [<ffffffffc125482b>] ofd_commitrw_write+0x94b/0x1c90 [ofd]
[14465.196500]  [<ffffffffc1258879>] ofd_commitrw+0x4c9/0xae0 [ofd]
[14465.197563]  [<ffffffffc0f8fd84>] obd_commitrw+0x2f3/0x336 [ptlrpc]
[14465.198979]  [<ffffffffc0f627cd>] tgt_brw_write+0xffd/0x17d0 [ptlrpc]
[14465.200198]  [<ffffffffc0f5e7ca>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
[14465.201458]  [<ffffffffc0f0705b>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
[14465.202814]  [<ffffffffc0f0a7a2>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
[14465.203954]  [<ffffffff92ac1c31>] kthread+0xd1/0xe0
[14465.204831]  [<ffffffff93174c37>] ret_from_fork_nospec_end+0x0/0x39
[14465.205926]  [<ffffffffffffffff>] 0xffffffffffffffff
[14465.206820] LustreError: dumping log to /tmp/lustre-log.1552259905.16167

Do you think this is a new issue?

Other failures for the same test, ost-pools test 23b hanging, this year are at
09 March - https://testing.whamcloud.com/test_sets/cbf1b00c-4280-11e9-92fe-52540065bddc
28 Feb - https://testing.whamcloud.com/test_sets/ed47664a-3bc1-11e9-b98a-52540065bddc
17 Jan - https://testing.whamcloud.com/test_sets/e2791e6a-1a93-11e9-b7d4-52540065bddc

Comment by Andreas Dilger [ 13/Mar/19 ]

A patch has been submitted to upstream ZFS master that increases the MMP interval/retry interval, as well as improve other aspects of the code in https://github.com/zfsonlinux/zfs/issues/7709 and a smaller patch to just update the interval/retry count that seems suitable for inclusion into 0.7.x as https://github.com/zfsonlinux/zfs/pull/8495 .

Comment by Andreas Dilger [ 16/Mar/19 ]

If this is becoming really problematic, we could disable MMP in our VM test configurations by setting "zfs_multihost_fail_intervals=0", but that should be a last resort as it will make our testing less representative of real deployments. At least the MMP feature would still be enabled and running, but a timeout would not take the pool offline.

Comment by Minh Diep [ 20/Mar/19 ]

+1 on b2_12 https://testing.whamcloud.com/test_sessions/e73dd201-fae1-4123-92ab-1961d681f34e

Comment by Andreas Dilger [ 20/Apr/19 ]

+1 on master conf-sanity test_32c:
https://testing.whamcloud.com/test_sets/ef12e78e-6343-11e9-aeec-52540065bddc

Comment by Nathaniel Clark [ 13/May/19 ]

I've asked that Autotest bump zfs_multihost_fail_intervals to 60 which is twice the default SCSI timeout in ATM-827

Comment by Nathaniel Clark [ 22/May/19 ]

zfs_multihost_fail_intervals has been bumped to 60 seconds in autotest via ATM-827.

This should address the zpool MMP timeouts.

Comment by Homer Li (Inactive) [ 23/May/19 ]

Hi Nathaniel

     Does it cause zpool import too slow in another pair OSS ? 

     I mean HA switch have too wait a long time in 100+ SAS device env. 

Comment by Nathaniel Clark [ 23/May/19 ]

ieelusername,

This setting is for our test environment only.  Since they are VMs they can be pretty heavily I/O contended on underlying host.

Comment by Homer Li (Inactive) [ 24/May/19 ]

Hi Nathaniel

     Oh, Got it.

Comment by Sarah Liu [ 07/Aug/19 ]

Hit the same problem when doing soak on b2_12-ib build #31

soak-7 console shows

WARNING: MMP writes to pool 'soaked-ost11' have not succeeded in over 5s; suspending pool
WARNING: Pool 'soaked-ost11' has encountered an uncorrectable I/O failure and has been suspended.
Comment by Andreas Dilger [ 28/Aug/20 ]

Hit the same problem when doing soak on b2_12-ib build #31

soak-7 console shows

WARNING: MMP writes to pool 'soaked-ost11' have not succeeded in over 5s; suspending pool
WARNING: Pool 'soaked-ost11' has encountered an uncorrectable I/O failure and has been suspended.

It looks like the zfs_mmp_fail_intervals=60 was not set, if the suspend happened after only 5s.

Generated at Sat Feb 10 02:29:47 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.