[LU-9247] replay-ost-single test_5: test failed to respond and timed out - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Cannot Reproduce
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.10.0, Lustre 2.10.1, Lustre 2.11.0, Lustre 2.10.7
Labels:
None
Environment:
onyx-32vm1-8, Full Group test,
RHEL7.3/zfs, branch master, v2.9.54, b3541

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

https://testing.hpdd.intel.com/test_sessions/afc7f4b0-0af4-11e7-8c9f-5254006e85c2

It appears that zfs was hung and caused this timeout. Here are a couple indications of this:

test_log:

Starting ost1: lustre-ost1/ost1 /mnt/lustre-ost1
CMD: onyx-32vm8 mkdir -p /mnt/lustre-ost1; mount -t lustre lustre-ost1/ost1 /mnt/lustre-ost1
onyx-32vm8: e2label: No such file or directory while trying to open lustre-ost1/ost1
onyx-32vm8: Couldn't find valid filesystem superblock.

OST console:

10:35:06:[31399.498089] txg_sync        D 0000000000000001     0 27626      2 0x00000080
10:35:06:[31399.498090]  ffff880049607ac0 0000000000000046 ffff88003d98edd0 ffff880049607fd8
10:35:06:[31399.498091]  ffff880049607fd8 ffff880049607fd8 ffff88003d98edd0 ffff88007fc16c40
10:35:06:[31399.498092]  0000000000000000 7fffffffffffffff ffff88005ac587a8 0000000000000001
10:35:06:[31399.498092] Call Trace:
10:35:06:[31399.498093]  [<ffffffff8168bac9>] schedule+0x29/0x70
10:35:06:[31399.498095]  [<ffffffff81689519>] schedule_timeout+0x239/0x2d0
10:35:06:[31399.498096]  [<ffffffff810c4fe2>] ? default_wake_function+0x12/0x20
10:35:06:[31399.498098]  [<ffffffff810ba238>] ? __wake_up_common+0x58/0x90
10:35:06:[31399.498101]  [<ffffffff81060c1f>] ? kvm_clock_get_cycles+0x1f/0x30
10:35:06:[31399.498103]  [<ffffffff8168b06e>] io_schedule_timeout+0xae/0x130
10:35:06:[31399.498104]  [<ffffffff810b1416>] ? prepare_to_wait_exclusive+0x56/0x90
10:35:06:[31399.498106]  [<ffffffff8168b108>] io_schedule+0x18/0x20
10:35:06:[31399.498109]  [<ffffffffa0677617>] cv_wait_common+0xa7/0x130 [spl]
10:35:06:[31399.498111]  [<ffffffff810b1720>] ? wake_up_atomic_t+0x30/0x30
10:35:06:[31399.498114]  [<ffffffffa06776f8>] __cv_wait_io+0x18/0x20 [spl]
10:35:06:[31399.498150]  [<ffffffffa07d151b>] zio_wait+0x10b/0x1f0 [zfs]
10:35:06:[31399.498169]  [<ffffffffa075acdf>] dsl_pool_sync+0xbf/0x440 [zfs]
10:35:06:[31399.498187]  [<ffffffffa0775868>] spa_sync+0x388/0xb50 [zfs]
10:35:06:[31399.498189]  [<ffffffff810b174b>] ? autoremove_wake_function+0x2b/0x40
10:35:06:[31399.498191]  [<ffffffff81689c72>] ? mutex_lock+0x12/0x2f
10:35:06:[31399.498208]  [<ffffffffa07874e5>] txg_sync_thread+0x3c5/0x620 [zfs]
10:35:06:[31399.498226]  [<ffffffffa0787120>] ? txg_init+0x280/0x280 [zfs]
10:35:06:[31399.498229]  [<ffffffffa0672851>] thread_generic_wrapper+0x71/0x80 [spl]
10:35:06:[31399.498232]  [<ffffffffa06727e0>] ? __thread_exit+0x20/0x20 [spl]
10:35:06:[31399.498234]  [<ffffffff810b064f>] kthread+0xcf/0xe0
10:35:06:[31399.498235]  [<ffffffff810b0580>] ? kthread_create_on_node+0x140/0x140
10:35:06:[31399.498237]  [<ffffffff81696958>] ret_from_fork+0x58/0x90
10:35:06:[31399.498239]  [<ffffffff810b0580>] ? kthread_create_on_node+0x140/0x140

Attachments

Issue Links

is related to

LU-8601 sanity test_230d: Timeout on ZFS backed MDSs

Resolved

is related to

LU-4950 sanity-benchmark test fsx hung: txg_sync was stuck on OSS

Closed

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(9 mentioned in)

Activity

People

Assignee:: Alex Zhuravlev

Reporter:: James Casper (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 23/Mar/17 4:14 PM

Updated:: 16/Apr/20 7:35 AM

Resolved:: 16/Apr/20 7:35 AM