[LU-13745] tasks hang with copy_file_range: ll_file_splice_read() Created: 02/Jul/20 Updated: 23/Dec/20 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Zhenyu Xu | Assignee: | Zhenyu Xu |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | always_except | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
copy_file_range(2) reads from a Lustre file hang. With no .copy_file_range VFS API implemented it calls do_splice_direct()->splice_direct_to_actor()->do_splice_to()->ll_file_splice_read(). While the call chain of ll_file_splice_read()->ll_file_io_generic()->generic_file_splice_read()->ll_file_read_iter()->ll_file_io_generic(). And that would try to get LDLM lock twice in ll_file_io_generic(), so that hang ensued. |
| Comments |
| Comment by Gerrit Updater [ 02/Jul/20 ] |
|
Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/39246 |
| Comment by Gerrit Updater [ 03/Jul/20 ] |
|
James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/39272 |
| Comment by Gerrit Updater [ 20/Aug/20 ] |
|
Wang Shilong (wshilong@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39695 |
| Comment by Gerrit Updater [ 15/Sep/20 ] |
|
James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/39910 |
| Comment by Gerrit Updater [ 02/Oct/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39272/ |
| Comment by James A Simmons [ 02/Oct/20 ] |
|
New test need to land still |
| Comment by Gerrit Updater [ 19/Oct/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39695/ |
| Comment by Andreas Dilger [ 21/Oct/20 ] |
|
The landing of patch https://review.whamcloud.com/39695 "LU-13745 test: add splice test for lustre" has caused 100% sanity.sh test_426 failure on aarch64. |
| Comment by Gerrit Updater [ 21/Oct/20 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40326 |
| Comment by Wang Shilong (Inactive) [ 21/Oct/20 ] |
|
adilger That means the newly added test is helpful |
| Comment by Bruno Faccini (Inactive) [ 21/Oct/20 ] |
|
adilger this may be of interest, but I think my patch https://review.whamcloud.com/35856 has also failed 100% of "test review-ldiskfs-ubuntu on CentOS 7.8/x86_64, Ubuntu 18.04/x86_64" stage I have attempted, for the same sanity/test_426 crash on Client side that you have provided the significant stack in LU-14045. So it should be more a Kernel v4.x related issue than an arch related one. |
| Comment by Gerrit Updater [ 21/Oct/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40326/ |
| Comment by John Hammond [ 22/Oct/20 ] |
|
This is also failing on Ubuntu 18.04 which is using 4.15.0-72-generic. See https://testing.whamcloud.com/test_sets/bdfb8c4b-a6a2-493a-ab57-6a9923f96e7c. |
| Comment by Bruno Faccini (Inactive) [ 22/Oct/20 ] |
|
adilger, after I have rebased my patch on top of change #40326, it appears that same crash during sanity/test_426 still occurs with Ubuntu Client because the used Kernel version is 4.15 so sub-test is not skipped !! |
| Comment by Gerrit Updater [ 22/Oct/20 ] |
|
John L. Hammond (jhammond@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40366 |
| Comment by Gerrit Updater [ 23/Oct/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40366/ |
| Comment by Gerrit Updater [ 26/Oct/20 ] |
|
Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/40396 |
| Comment by Gerrit Updater [ 22/Dec/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40396/ |
| Comment by Peter Jones [ 22/Dec/20 ] |
|
The fix itself has landed for 2.14. All that remains tracked by this ticket is a test. Are there still plans to land that test imminently or can we either abandon that changeset /move it to a new JIRA? |