Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.10.0, Lustre 2.11.0
-
None
-
onyx-30vm1-3/7/8, Full Group test,
master branch, v2.9.52, b3520,
DNE, ZFS
-
3
-
9223372036854775807
Description
https://testing.hpdd.intel.com/test_sets/39bcbd8a-efe9-11e6-8c0d-5254006e85c2
Noticed in the client 1 dmesg log that the writemany task is failing:
[ 3197.720173] Lustre: DEBUG MARKER: /usr/sbin/lctl mark recovery-small test_50: @@@@@@ IGNORE \(bz13652\): writemany returned rc 108 [ 3198.045985] Lustre: DEBUG MARKER: recovery-small test_50: @@@@@@ IGNORE (bz13652): writemany returned rc 108
and
[ 3360.095050] INFO: task writemany:29170 blocked for more than 120 seconds. [ 3360.097638] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 3360.100227] writemany D ffff88007b13bbc0 0 29170 29166 0x00000080 [ 3360.102784] ffff880057be3df0 0000000000000082 ffff880051bc1f60 ffff880057be3fd8 [ 3360.105352] ffff880057be3fd8 ffff880057be3fd8 ffff880051bc1f60 ffff88007b13bbb8 [ 3360.107917] ffff88007b13bbbc ffff880051bc1f60 00000000ffffffff ffff88007b13bbc0 [ 3360.110428] Call Trace: [ 3360.112480] [<ffffffff8168c989>] schedule_preempt_disabled+0x29/0x70 [ 3360.115023] [<ffffffff8168a5e5>] __mutex_lock_slowpath+0xc5/0x1c0 [ 3360.117307] [<ffffffff81689a4f>] mutex_lock+0x1f/0x2f [ 3360.119528] [<ffffffff8120f40b>] do_unlinkat+0x13b/0x2b0 [ 3360.121668] [<ffffffff8120031e>] ? ____fput+0xe/0x10 [ 3360.123853] [<ffffffff810acdec>] ? task_work_run+0xac/0xe0 [ 3360.125937] [<ffffffff8102ab22>] ? do_notify_resume+0x92/0xb0 [ 3360.128057] [<ffffffff81210486>] SyS_unlink+0x16/0x20 [ 3360.130052] [<ffffffff816967c9>] system_call_fastpath+0x16/0x1b [ 3374.831443] Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_51: failover in 25 sec [ 3375.133682] Lustre: DEBUG MARKER: test_51: failover in 25 sec [ 3423.727627] Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_51: failover in 30 sec [ 3424.041587] Lustre: DEBUG MARKER: test_51: failover in 30 sec [ 3477.622211] LustreError: 11-0: MGC10.2.4.99@tcp: operation obd_ping to node 10.2.4.99@tcp failed: rc = -107 [ 3477.624949] LustreError: Skipped 9 previous similar messages