[LU-10670] sanity-flr test 43 timeout Created: 15/Feb/18 Updated: 17/Jul/18 Resolved: 27/Feb/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0 |
| Fix Version/s: | Lustre 2.11.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Mikhail Pershin | Assignee: | Zhenyu Xu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
https://testing.hpdd.intel.com/test_sets/713fb70e-119d-11e8-a6ad-52540065bddc It fails very often: Error: 'Timeout occurred after 227 mins, last suite running was sanity-flr, restarting cluster to continue tests' Failure Rate: 41.18% of most recent 17 runs, 22 skipped (all branches) On a client: [10077.749514] Lustre: DEBUG MARKER: == sanity-flr test 43: mirror pick on write ========================================================== 12:14:55 (1518610495) [10320.098013] INFO: task dd:23892 blocked for more than 120 seconds. [10320.114074] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [10320.116709] dd D ffff88007b96dee0 0 23892 23675 0x00000080 [10320.119330] Call Trace: [10320.125475] [<ffffffff810c6632>] ? default_wake_function+0x12/0x20 [10320.150782] [<ffffffff810bc2d8>] ? __wake_up_common+0x58/0x90 [10320.154162] [<ffffffff816ab8a9>] schedule+0x29/0x70 [10320.170306] [<ffffffff816a92b9>] schedule_timeout+0x239/0x2c0 [10320.176336] [<ffffffffc09f5e88>] ? ptlrpc_set_add_new_req+0xd8/0x150 [ptlrpc] [10320.178829] [<ffffffffc0bd50c0>] ? osc_io_ladvise_end+0x50/0x50 [osc] [10320.181237] [<ffffffffc0a25ffb>] ? ptlrpcd_add_req+0x22b/0x300 [ptlrpc] [10320.183701] [<ffffffffc09fbe99>] ? ptlrpc_request_bufs_pack+0x1d9/0x480 [ptlrpc] [10320.186106] [<ffffffff816abc5d>] wait_for_completion+0xfd/0x140 [10320.188437] [<ffffffff810c6620>] ? wake_up_state+0x20/0x20 [10320.190651] [<ffffffffc0bd5284>] osc_io_setattr_end+0xc4/0x180 [osc] [10320.192955] [<ffffffffc0bd63d0>] ? osc_io_setattr_start+0x260/0x700 [osc] [10320.195231] [<ffffffffc0c28490>] ? lov_io_iter_fini_wrapper+0x50/0x50 [lov] [10320.197659] [<ffffffffc0832e8d>] cl_io_end+0x5d/0x150 [obdclass] [10320.199802] [<ffffffffc0c2856b>] lov_io_end_wrapper+0xdb/0xe0 [lov] [10320.202033] [<ffffffffc0c28bc5>] lov_io_call.isra.5+0x85/0x140 [lov] [10320.204170] [<ffffffffc0c28cb6>] lov_io_end+0x36/0xb0 [lov] [10320.206291] [<ffffffffc0832e8d>] cl_io_end+0x5d/0x150 [obdclass] [10320.208353] [<ffffffffc083551f>] cl_io_loop+0x13f/0xc70 [obdclass] [10320.210509] [<ffffffffc0cd1460>] cl_setattr_ost+0x250/0x3c0 [lustre] [10320.212550] [<ffffffffc0cab495>] ll_setattr_raw+0x1165/0x1270 [lustre] [10320.214631] [<ffffffffc0cab60c>] ll_setattr+0x6c/0xd0 [lustre] [10320.217542] [<ffffffff81220fc1>] notify_change+0x2c1/0x420 [10320.228621] [<ffffffff812b45b6>] ? security_inode_need_killpriv+0x16/0x20 [10320.230605] [<ffffffff81200ad5>] do_truncate+0x75/0xc0 [10320.232485] [<ffffffff81211d97>] do_last+0x627/0x12c0 [10320.234244] [<ffffffff81212af2>] path_openat+0xc2/0x490 [10320.236065] [<ffffffff811af746>] ? do_read_fault.isra.44+0xe6/0x130 [10320.237871] [<ffffffff8121508b>] do_filp_open+0x4b/0xb0 [10320.239642] [<ffffffff8122233a>] ? __alloc_fd+0x8a/0x130 [10320.241313] [<ffffffff81201bc3>] do_sys_open+0xf3/0x1f0 [10320.243068] [<ffffffff816b8945>] ? system_call_after_swapgs+0x172/0x214 [10320.244820] [<ffffffff81201cde>] SyS_open+0x1e/0x20 [10320.246469] [<ffffffff816b89fd>] system_call_fastpath+0x16/0x1b [10320.248096] [<ffffffff816b889d>] ? system_call_after_swapgs+0xca/0x214 |
| Comments |
| Comment by Jian Yu [ 15/Feb/18 ] |
|
This is a regression failure introduced by patch https://review.whamcloud.com/30711 for |
| Comment by Gerrit Updater [ 15/Feb/18 ] |
|
Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/31315 |
| Comment by Bob Glossman (Inactive) [ 21/Feb/18 ] |
|
another on master: |
| Comment by Patrick Farrell (Inactive) [ 21/Feb/18 ] |
|
Another on master: |
| Comment by Gerrit Updater [ 27/Feb/18 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31315/ |
| Comment by Peter Jones [ 27/Feb/18 ] |
|
Landed for 2.11 |
| Comment by Patrick Farrell (Inactive) [ 02/Mar/18 ] |
|
Looks like a hit with this fix in: |
| Comment by Zhenyu Xu [ 02/Mar/18 ] |
|
the build's parent is e528677e1630093362394ae36d725c321d0da4f2, which does not have this fix. |
| Comment by Patrick Farrell (Inactive) [ 02/Mar/18 ] |
|
Ah, OK, I will ask him to rebase. |