[LU-9122] replay-ost-single test_5 test failed to respond and timed out Created: 14/Feb/17  Updated: 20/Nov/17

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0, Lustre 2.11.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Casper Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None
Environment:

onyx-30vm1-3/7/8, Full Group test,
master branch, v2.9.52, b3520,
DNE, ZFS


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

https://testing.hpdd.intel.com/test_sets/39bcbd8a-efe9-11e6-8c0d-5254006e85c2

Noticed in the client 1 dmesg log that the writemany task is failing:

[ 3197.720173] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  recovery-small test_50: @@@@@@ IGNORE \(bz13652\): writemany returned rc 108 
[ 3198.045985] Lustre: DEBUG MARKER: recovery-small test_50: @@@@@@ IGNORE (bz13652): writemany returned rc 108

and

[ 3360.095050] INFO: task writemany:29170 blocked for more than 120 seconds.
[ 3360.097638] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3360.100227] writemany       D ffff88007b13bbc0     0 29170  29166 0x00000080
[ 3360.102784]  ffff880057be3df0 0000000000000082 ffff880051bc1f60 ffff880057be3fd8
[ 3360.105352]  ffff880057be3fd8 ffff880057be3fd8 ffff880051bc1f60 ffff88007b13bbb8
[ 3360.107917]  ffff88007b13bbbc ffff880051bc1f60 00000000ffffffff ffff88007b13bbc0
[ 3360.110428] Call Trace:
[ 3360.112480]  [<ffffffff8168c989>] schedule_preempt_disabled+0x29/0x70
[ 3360.115023]  [<ffffffff8168a5e5>] __mutex_lock_slowpath+0xc5/0x1c0
[ 3360.117307]  [<ffffffff81689a4f>] mutex_lock+0x1f/0x2f
[ 3360.119528]  [<ffffffff8120f40b>] do_unlinkat+0x13b/0x2b0
[ 3360.121668]  [<ffffffff8120031e>] ? ____fput+0xe/0x10
[ 3360.123853]  [<ffffffff810acdec>] ? task_work_run+0xac/0xe0
[ 3360.125937]  [<ffffffff8102ab22>] ? do_notify_resume+0x92/0xb0
[ 3360.128057]  [<ffffffff81210486>] SyS_unlink+0x16/0x20
[ 3360.130052]  [<ffffffff816967c9>] system_call_fastpath+0x16/0x1b
[ 3374.831443] Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_51: failover in 25 sec
[ 3375.133682] Lustre: DEBUG MARKER: test_51: failover in 25 sec
[ 3423.727627] Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_51: failover in 30 sec
[ 3424.041587] Lustre: DEBUG MARKER: test_51: failover in 30 sec
[ 3477.622211] LustreError: 11-0: MGC10.2.4.99@tcp: operation obd_ping to node 10.2.4.99@tcp failed: rc = -107
[ 3477.624949] LustreError: Skipped 9 previous similar messages

Generated at Sat Feb 10 02:23:26 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.