[LU-13745] tasks hang with copy_file_range: ll_file_splice_read() Created: 02/Jul/20  Updated: 23/Dec/20

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Zhenyu Xu Assignee: Zhenyu Xu
Resolution: Unresolved Votes: 0
Labels: always_except

Issue Links:
Related
is related to LU-12425 Add test for splice_read Open
is related to LU-14060 sanity test_426 (splice-test) issues Open
is related to LU-14045 Fix O_DIRECT and encrypted files Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

copy_file_range(2) reads from a Lustre file hang.

With no .copy_file_range VFS API implemented it calls do_splice_direct()->splice_direct_to_actor()->do_splice_to()->ll_file_splice_read().

While the call chain of ll_file_splice_read()->ll_file_io_generic()->generic_file_splice_read()->ll_file_read_iter()->ll_file_io_generic().

And that would try to get LDLM lock twice in ll_file_io_generic(), so that hang ensued.



 Comments   
Comment by Gerrit Updater [ 02/Jul/20 ]

Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/39246
Subject: LU-13745 llite: avoid lock ldlm twice in splice read
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4f31a54bad899b69d775f971d572b6536b6da0b0

Comment by Gerrit Updater [ 03/Jul/20 ]

James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/39272
Subject: LU-13745 llite: switch generic_file_splice_read() to use of ->read_iter()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5a957f5de38543418c5af9b0435299aace81062c

Comment by Gerrit Updater [ 20/Aug/20 ]

Wang Shilong (wshilong@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39695
Subject: LU-13745 test: add splice test for lustre
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f997041face6a8ce5c60af1e37da3a5368194083

Comment by Gerrit Updater [ 15/Sep/20 ]

James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/39910
Subject: LU-13745 test: add PCC test for splice handling
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 81abcfe3241090a61af2dcaee3bc1dd9b1023d4b

Comment by Gerrit Updater [ 02/Oct/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39272/
Subject: LU-13745 llite: switch generic_file_splice_read() to use of ->read_iter()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 1635dc9de0bc1d6701ca5f4bc0d342fca416f89a

Comment by James A Simmons [ 02/Oct/20 ]

New test need to land still

Comment by Gerrit Updater [ 19/Oct/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39695/
Subject: LU-13745 test: add splice test for lustre
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 7ad8bf544f4132c7a291c2e8a2035afcb98f94c6

Comment by Andreas Dilger [ 21/Oct/20 ]

The landing of patch https://review.whamcloud.com/39695 "LU-13745 test: add splice test for lustre" has caused 100% sanity.sh test_426 failure on aarch64.

Comment by Gerrit Updater [ 21/Oct/20 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40326
Subject: LU-13745 tests: skip sanity test_426 for 4.18+
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 37563cfa713b2bfe8a71dc1b23fea66f6329f4c5

Comment by Wang Shilong (Inactive) [ 21/Oct/20 ]

adilger That means the newly added test is helpful and something we need fix still

Comment by Bruno Faccini (Inactive) [ 21/Oct/20 ]

adilger this may be of interest, but I think my patch https://review.whamcloud.com/35856 has also failed 100% of "test review-ldiskfs-ubuntu on CentOS 7.8/x86_64, Ubuntu 18.04/x86_64" stage I have attempted, for the same sanity/test_426 crash on Client side that you have provided the significant stack in LU-14045. So it should be more a Kernel v4.x related issue than an arch related one.

Comment by Gerrit Updater [ 21/Oct/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40326/
Subject: LU-13745 tests: skip sanity test_426 for 4.18+
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 010425898fa4b2abc6325a8073e20cb994ce7947

Comment by John Hammond [ 22/Oct/20 ]

This is also failing on Ubuntu 18.04 which is using 4.15.0-72-generic. See https://testing.whamcloud.com/test_sets/bdfb8c4b-a6a2-493a-ab57-6a9923f96e7c.

Comment by Bruno Faccini (Inactive) [ 22/Oct/20 ]

adilger, after I have rebased my patch on top of change #40326, it appears that same crash during sanity/test_426 still occurs with Ubuntu Client because the used Kernel version is 4.15 so sub-test is not skipped !!

Comment by Gerrit Updater [ 22/Oct/20 ]

John L. Hammond (jhammond@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40366
Subject: LU-13745 tests: skip sanity test_426 for 4.15+
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a8a6a55a1e4e11ee3b61aa7e230d752b5c1a476a

Comment by Gerrit Updater [ 23/Oct/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40366/
Subject: LU-13745 tests: skip sanity test_426 for 4.15+
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f8a8d3f83db67be9dcc724ff49757cce81b13a5e

Comment by Gerrit Updater [ 26/Oct/20 ]

Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/40396
Subject: LU-13745 pcc: fall back normal splice read for detached file
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7b45c8d9c77a9c45862bd61ea05b8c46117cffa4

Comment by Gerrit Updater [ 22/Dec/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40396/
Subject: LU-13745 pcc: fall back normal splice read for detached file
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: cca45ad8aeaa8e124e9e48361bf7cff89a035f82

Comment by Peter Jones [ 22/Dec/20 ]

The fix itself has landed for 2.14. All that remains tracked by this ticket is a test. Are there still plans to land that test imminently or can we either abandon that changeset /move it to a new JIRA?

Generated at Sat Feb 10 03:03:51 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.