Loading...

Details

Type: Bug
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: None
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

This issue was created by maloo for Jinshan Xiong <jinshan.xiong@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/96539eb0-cadb-11e7-8027-52540065bddc.

The sub-test test_74 failed with the following error:

Timeout occurred after 128 mins, last suite running was replay-single, restarting cluster to continue tests

Please provide additional information about the failure here.

Info required for matching: replay-single 74

On the client stack, there is a stack trace:

[ 4440.087054] INFO: task touch:18639 blocked for more than 120 seconds.
[ 4440.089288] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4440.091541] touch           D 0000000000000000     0 18639  17432 0x00000080
[ 4440.093752]  ffff880069033a40 0000000000000086 ffff880036a79fa0 ffff880069033fd8
[ 4440.096088]  ffff880069033fd8 ffff880069033fd8 ffff880036a79fa0 ffff880066647650
[ 4440.098281]  7fffffffffffffff ffff880066647648 ffff880036a79fa0 0000000000000000
[ 4440.100517] Call Trace:
[ 4440.102304]  [<ffffffff816a9589>] schedule+0x29/0x70
[ 4440.104209]  [<ffffffff816a7099>] schedule_timeout+0x239/0x2c0
[ 4440.106187]  [<ffffffffc09e3108>] ? ptlrpc_set_add_new_req+0xd8/0x150 [ptlrpc]
[ 4440.108288]  [<ffffffffc0bc2bc0>] ? osc_io_ladvise_end+0x50/0x50 [osc]
[ 4440.110272]  [<ffffffffc0a1324b>] ? ptlrpcd_add_req+0x22b/0x300 [ptlrpc]
[ 4440.112208]  [<ffffffffc09f5e40>] ? lustre_swab_niobuf_remote+0x30/0x30 [ptlrpc]
[ 4440.114221]  [<ffffffff816a993d>] wait_for_completion+0xfd/0x140
[ 4440.116026]  [<ffffffff810c4820>] ? wake_up_state+0x20/0x20
[ 4440.117835]  [<ffffffffc0bc2d84>] osc_io_setattr_end+0xc4/0x180 [osc]
[ 4440.119639]  [<ffffffffc0bc4381>] ? osc_io_setattr_start+0x471/0x6e0 [osc]
[ 4440.121517]  [<ffffffffc0c15450>] ? lov_io_iter_fini_wrapper+0x50/0x50 [lov]
[ 4440.123344]  [<ffffffffc08201ed>] cl_io_end+0x5d/0x150 [obdclass]
[ 4440.125122]  [<ffffffffc0c1552b>] lov_io_end_wrapper+0xdb/0xe0 [lov]
[ 4440.126825]  [<ffffffffc0c15b75>] lov_io_call.isra.5+0x85/0x140 [lov]
[ 4440.128586]  [<ffffffffc0c15c66>] lov_io_end+0x36/0xb0 [lov]
[ 4440.130210]  [<ffffffffc08201ed>] cl_io_end+0x5d/0x150 [obdclass]
[ 4440.131901]  [<ffffffffc082287f>] cl_io_loop+0x13f/0xc70 [obdclass]
[ 4440.133544]  [<ffffffffc0cbb4d0>] cl_setattr_ost+0x240/0x3a0 [lustre]
[ 4440.135229]  [<ffffffffc0c95b05>] ll_setattr_raw+0x1185/0x1290 [lustre]
[ 4440.136821]  [<ffffffffc0c95c7c>] ll_setattr+0x6c/0xd0 [lustre]
[ 4440.138380]  [<ffffffff8121ee71>] notify_change+0x2c1/0x420
[ 4440.139813]  [<ffffffff81233b59>] utimes_common+0xd9/0x1c0
[ 4440.141276]  [<ffffffff81233d7e>] do_utimes+0x13e/0x180
[ 4440.142646]  [<ffffffff81233ec4>] SyS_utimensat+0x64/0xb0
[ 4440.144085]  [<ffffffff816ac5c8>] ? page_fault+0x28/0x30
[ 4440.145450]  [<ffffffff816b5089>] system_call_fastpath+0x16/0x1b
[ 4560.146045] INFO: task touch:18639 blocked for more than 120 seconds.

That shows the client was waiting for a setattr to complete, and on the OFD side:

[ 4163.936021] LNet: Service thread pid 15315 was inactive for 40.10s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
[ 4163.941814] Pid: 15315, comm: ll_ost00_000
[ 4163.944340] 
Call Trace:
[ 4163.948946]  [<ffffffff816a9589>] schedule+0x29/0x70
[ 4163.951469]  [<ffffffffc07084d5>] cv_wait_common+0x125/0x150 [spl]
[ 4163.954051]  [<ffffffff810b1920>] ? autoremove_wake_function+0x0/0x40
[ 4163.956663]  [<ffffffffc0708515>] __cv_wait+0x15/0x20 [spl]
[ 4163.959223]  [<ffffffffc085781f>] txg_wait_synced+0xef/0x140 [zfs]
[ 4163.961745]  [<ffffffffc080cc65>] dmu_tx_wait+0x275/0x3c0 [zfs]
[ 4163.964242]  [<ffffffffc080ce41>] dmu_tx_assign+0x91/0x490 [zfs]
[ 4163.966725]  [<ffffffffc0ecbe1a>] ? tgt_txn_start_cb+0x1da/0x3b0 [ptlrpc]
[ 4163.969213]  [<ffffffffc1098f37>] osd_trans_start+0xa7/0x3a0 [osd_zfs]
[ 4163.971617]  [<ffffffffc11c1fab>] ofd_trans_start+0x6b/0xe0 [ofd]
[ 4163.973952]  [<ffffffffc11c4943>] ofd_attr_set+0x433/0xb00 [ofd]
[ 4163.976281]  [<ffffffffc11afff3>] ofd_setattr_hdl+0x303/0x950 [ofd]
[ 4163.978596]  [<ffffffffc0ed7c05>] tgt_request_handle+0x925/0x13b0 [ptlrpc]
[ 4163.980977]  [<ffffffffc0e7be8e>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc]
[ 4163.983391]  [<ffffffff810ba598>] ? __wake_up_common+0x58/0x90
[ 4163.985676]  [<ffffffffc0e7f632>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
[ 4163.987983]  [<ffffffff81029557>] ? __switch_to+0xd7/0x510
[ 4163.990182]  [<ffffffff816a9000>] ? __schedule+0x350/0x8b0
[ 4163.992348]  [<ffffffffc0e7eba0>] ? ptlrpc_main+0x0/0x1e40 [ptlrpc]
[ 4163.994528]  [<ffffffff810b099f>] kthread+0xcf/0xe0
[ 4163.996574]  [<ffffffff810b08d0>] ? kthread+0x0/0xe0
[ 4163.998609]  [<ffffffff816b4fd8>] ret_from_fork+0x58/0x90
[ 4164.000651]  [<ffffffff810b08d0>] ? kthread+0x0/0xe0

[ 4164.004284] LustreError: dumping log to /tmp/lustre-log.1510820580.15315

It's waiting a sync trans, which handled the setattr from client, and it seems the trans is never completed.

I guess this is the same issue ~~LU-4440~~. Unfortunately there is no useful description on the ticket.

Attachments

Issue Links

is duplicated by

LU-10009 sanity-benchmark test_iozone: test failed to respond and timed out

Open

LU-10065 ost-pools test_5a: test failed to respond and timed out

Resolved

is related to

LU-10670 sanity-flr test 43 timeout

Resolved

LU-12510 mds server hangs cv_wait_common

Resolved

is related to

LU-10223 OSS hang

Resolved

LU-10572 Hang in conf-sanity test 69 on zfs

Resolved

mentioned in: Page Loading...; Page Loading...

(1 is related to , 2 mentioned in)

replay-single test_74: hang and timed out

Details

Description

Attachments

Issue Links

Activity

People

Dates