[LU-10342] sanity test_27F timeout Created: 07/Dec/17  Updated: 11/Jan/18  Resolved: 11/Jan/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates LU-9845 ost-pools test_22 hangs with ‘WARNING... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for nasf <fan.yong@intel.com>

Please provide additional information about the failure here.

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/440e96aa-dae4-11e7-9c63-52540065bddc.

The OST log shows that;

[ 1088.077958] 			zpool import -f -o cachefile=none -d /dev/lvm-Role_OSS lustre-ost2
[ 1093.764051] WARNING: Pool 'lustre-ost2' has encountered an uncorrectable I/O failure and has been suspended.
[ 1093.764051] 
[ 1094.209023] WARNING: Pool 'lustre-ost1' has encountered an uncorrectable I/O failure and has been suspended.
[ 1094.209023] 
[ 1100.550162] LustreError: 137-5: lustre-OST0001_UUID: not available for connect from 10.9.4.20@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
[ 1100.555357] LustreError: Skipped 12 previous similar messages
[ 1135.550139] LustreError: 137-5: lustre-OST0001_UUID: not available for connect from 10.9.4.20@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
[ 1135.555258] LustreError: Skipped 20 previous similar messages
[ 1200.550232] LustreError: 137-5: lustre-OST0001_UUID: not available for connect from 10.9.4.20@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
[ 1200.555384] LustreError: Skipped 38 previous similar messages
[ 1320.092067] INFO: task spa_async:22725 blocked for more than 120 seconds.
[ 1320.094598] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1320.097123] spa_async       D ffffffffc0a69488     0 22725      2 0x00000080
[ 1320.099681]  ffff88005a5b3dc0 0000000000000046 ffff880061984f10 ffff88005a5b3fd8
[ 1320.102286]  ffff88005a5b3fd8 ffff88005a5b3fd8 ffff880061984f10 ffffffffc0a69480
[ 1320.104826]  ffffffffc0a69484 ffff880061984f10 00000000ffffffff ffffffffc0a69488
[ 1320.107426] Call Trace:
[ 1320.109542]  [<ffffffff816aa3e9>] schedule_preempt_disabled+0x29/0x70
[ 1320.112010]  [<ffffffff816a8317>] __mutex_lock_slowpath+0xc7/0x1d0
[ 1320.114418]  [<ffffffffc085abc0>] ? spa_vdev_resilver_done+0x140/0x140 [zfs]
[ 1320.116812]  [<ffffffff816a772f>] mutex_lock+0x1f/0x2f
[ 1320.119131]  [<ffffffffc085ae04>] spa_async_thread+0x244/0x300 [zfs]
[ 1320.121483]  [<ffffffff811deec3>] ? kfree+0x103/0x140
[ 1320.123735]  [<ffffffffc085abc0>] ? spa_vdev_resilver_done+0x140/0x140 [zfs]
[ 1320.126116]  [<ffffffffc0710fa1>] thread_generic_wrapper+0x71/0x80 [spl]
[ 1320.128452]  [<ffffffffc0710f30>] ? __thread_exit+0x20/0x20 [spl]
[ 1320.130681]  [<ffffffff810b098f>] kthread+0xcf/0xe0
[ 1320.132832]  [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
[ 1320.135025]  [<ffffffff816b4f18>] ret_from_fork+0x58/0x90
[ 1320.137103]  [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40


 Comments   
Comment by Mikhail Pershin [ 08/Dec/17 ]

isn't this LU-9845?

Comment by Nathaniel Clark [ 11/Jan/18 ]

Yes this is a duplicate of LU-9845

Generated at Sat Feb 10 02:34:11 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.