Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for Jinshan Xiong <jinshan.xiong@intel.com>
This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/96539eb0-cadb-11e7-8027-52540065bddc.
The sub-test test_74 failed with the following error:
Timeout occurred after 128 mins, last suite running was replay-single, restarting cluster to continue tests
Please provide additional information about the failure here.
Info required for matching: replay-single 74
On the client stack, there is a stack trace:
[ 4440.087054] INFO: task touch:18639 blocked for more than 120 seconds. [ 4440.089288] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4440.091541] touch D 0000000000000000 0 18639 17432 0x00000080 [ 4440.093752] ffff880069033a40 0000000000000086 ffff880036a79fa0 ffff880069033fd8 [ 4440.096088] ffff880069033fd8 ffff880069033fd8 ffff880036a79fa0 ffff880066647650 [ 4440.098281] 7fffffffffffffff ffff880066647648 ffff880036a79fa0 0000000000000000 [ 4440.100517] Call Trace: [ 4440.102304] [<ffffffff816a9589>] schedule+0x29/0x70 [ 4440.104209] [<ffffffff816a7099>] schedule_timeout+0x239/0x2c0 [ 4440.106187] [<ffffffffc09e3108>] ? ptlrpc_set_add_new_req+0xd8/0x150 [ptlrpc] [ 4440.108288] [<ffffffffc0bc2bc0>] ? osc_io_ladvise_end+0x50/0x50 [osc] [ 4440.110272] [<ffffffffc0a1324b>] ? ptlrpcd_add_req+0x22b/0x300 [ptlrpc] [ 4440.112208] [<ffffffffc09f5e40>] ? lustre_swab_niobuf_remote+0x30/0x30 [ptlrpc] [ 4440.114221] [<ffffffff816a993d>] wait_for_completion+0xfd/0x140 [ 4440.116026] [<ffffffff810c4820>] ? wake_up_state+0x20/0x20 [ 4440.117835] [<ffffffffc0bc2d84>] osc_io_setattr_end+0xc4/0x180 [osc] [ 4440.119639] [<ffffffffc0bc4381>] ? osc_io_setattr_start+0x471/0x6e0 [osc] [ 4440.121517] [<ffffffffc0c15450>] ? lov_io_iter_fini_wrapper+0x50/0x50 [lov] [ 4440.123344] [<ffffffffc08201ed>] cl_io_end+0x5d/0x150 [obdclass] [ 4440.125122] [<ffffffffc0c1552b>] lov_io_end_wrapper+0xdb/0xe0 [lov] [ 4440.126825] [<ffffffffc0c15b75>] lov_io_call.isra.5+0x85/0x140 [lov] [ 4440.128586] [<ffffffffc0c15c66>] lov_io_end+0x36/0xb0 [lov] [ 4440.130210] [<ffffffffc08201ed>] cl_io_end+0x5d/0x150 [obdclass] [ 4440.131901] [<ffffffffc082287f>] cl_io_loop+0x13f/0xc70 [obdclass] [ 4440.133544] [<ffffffffc0cbb4d0>] cl_setattr_ost+0x240/0x3a0 [lustre] [ 4440.135229] [<ffffffffc0c95b05>] ll_setattr_raw+0x1185/0x1290 [lustre] [ 4440.136821] [<ffffffffc0c95c7c>] ll_setattr+0x6c/0xd0 [lustre] [ 4440.138380] [<ffffffff8121ee71>] notify_change+0x2c1/0x420 [ 4440.139813] [<ffffffff81233b59>] utimes_common+0xd9/0x1c0 [ 4440.141276] [<ffffffff81233d7e>] do_utimes+0x13e/0x180 [ 4440.142646] [<ffffffff81233ec4>] SyS_utimensat+0x64/0xb0 [ 4440.144085] [<ffffffff816ac5c8>] ? page_fault+0x28/0x30 [ 4440.145450] [<ffffffff816b5089>] system_call_fastpath+0x16/0x1b [ 4560.146045] INFO: task touch:18639 blocked for more than 120 seconds.
That shows the client was waiting for a setattr to complete, and on the OFD side:
[ 4163.936021] LNet: Service thread pid 15315 was inactive for 40.10s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [ 4163.941814] Pid: 15315, comm: ll_ost00_000 [ 4163.944340] Call Trace: [ 4163.948946] [<ffffffff816a9589>] schedule+0x29/0x70 [ 4163.951469] [<ffffffffc07084d5>] cv_wait_common+0x125/0x150 [spl] [ 4163.954051] [<ffffffff810b1920>] ? autoremove_wake_function+0x0/0x40 [ 4163.956663] [<ffffffffc0708515>] __cv_wait+0x15/0x20 [spl] [ 4163.959223] [<ffffffffc085781f>] txg_wait_synced+0xef/0x140 [zfs] [ 4163.961745] [<ffffffffc080cc65>] dmu_tx_wait+0x275/0x3c0 [zfs] [ 4163.964242] [<ffffffffc080ce41>] dmu_tx_assign+0x91/0x490 [zfs] [ 4163.966725] [<ffffffffc0ecbe1a>] ? tgt_txn_start_cb+0x1da/0x3b0 [ptlrpc] [ 4163.969213] [<ffffffffc1098f37>] osd_trans_start+0xa7/0x3a0 [osd_zfs] [ 4163.971617] [<ffffffffc11c1fab>] ofd_trans_start+0x6b/0xe0 [ofd] [ 4163.973952] [<ffffffffc11c4943>] ofd_attr_set+0x433/0xb00 [ofd] [ 4163.976281] [<ffffffffc11afff3>] ofd_setattr_hdl+0x303/0x950 [ofd] [ 4163.978596] [<ffffffffc0ed7c05>] tgt_request_handle+0x925/0x13b0 [ptlrpc] [ 4163.980977] [<ffffffffc0e7be8e>] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc] [ 4163.983391] [<ffffffff810ba598>] ? __wake_up_common+0x58/0x90 [ 4163.985676] [<ffffffffc0e7f632>] ptlrpc_main+0xa92/0x1e40 [ptlrpc] [ 4163.987983] [<ffffffff81029557>] ? __switch_to+0xd7/0x510 [ 4163.990182] [<ffffffff816a9000>] ? __schedule+0x350/0x8b0 [ 4163.992348] [<ffffffffc0e7eba0>] ? ptlrpc_main+0x0/0x1e40 [ptlrpc] [ 4163.994528] [<ffffffff810b099f>] kthread+0xcf/0xe0 [ 4163.996574] [<ffffffff810b08d0>] ? kthread+0x0/0xe0 [ 4163.998609] [<ffffffff816b4fd8>] ret_from_fork+0x58/0x90 [ 4164.000651] [<ffffffff810b08d0>] ? kthread+0x0/0xe0 [ 4164.004284] LustreError: dumping log to /tmp/lustre-log.1510820580.15315
It's waiting a sync trans, which handled the setattr from client, and it seems the trans is never completed.
I guess this is the same issue LU-4440. Unfortunately there is no useful description on the ticket.
Attachments
Issue Links
- is duplicated by
-
LU-10009 sanity-benchmark test_iozone: test failed to respond and timed out
- Open
-
LU-10065 ost-pools test_5a: test failed to respond and timed out
- Resolved
- is related to
-
LU-10670 sanity-flr test 43 timeout
- Resolved
-
LU-12510 mds server hangs cv_wait_common
- Resolved