[LU-16412] check truncated page in ->read page() Created: 19/Dec/22  Updated: 07/Jul/23  Resolved: 03/Feb/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0, Lustre 2.15.3

Type: Bug Priority: Minor
Reporter: Qian Yingjin Assignee: Qian Yingjin
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocker
Duplicate
is duplicated by LU-16638 LustreError: 18531:0:(osc_object.c:41... Resolved
Related
is related to LU-16579 llite: Fix the wrong ending offset ca... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

I found the page end offset calculation in filemap_get_read_batch() was off by one in 5.x kernel.

When a read is submitted with end offset 1048575, then it incorrectly calculates
the end page for read of 1024 when it should be 1023. This result in the readpage() call of the page is over stripe boundary and may be not covered by a DLM extent lock.

In some corner racer case, filemap_get_read_batch() batches the page with index 1024 for read, but later this page is truncated and removed from page cache due to the lock protected it being revoked. This results in this page in the read path is not covered by a DLM lock. This will trigger an assertion in the code:

LustreError: 14129:0:(osc_object.c:397:osc_req_attr_set()) uncovered page!
Pid: 14129, comm: ptlrpcd_04_18 5.14.0-1038-oem #42-Ubuntu SMP Thu May 19 05:03:08 UTC 2022
LustreError: 14129:0:(osc_object.c:411:osc_req_attr_set()) LBUG

To work around this bug in the kernel, we can simply check whether this page got truncated and was removed from page cache in ->readpage(), and return AOP_TRUNCATED_PAGE to the upper layer, and then it will retry to batch pages and it will not add this truncated page into batches as it was removed from page cache.



 Comments   
Comment by Gerrit Updater [ 19/Dec/22 ]

"Qian Yingjin <qian@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49433
Subject: LU-16412 llite: check truncated page in ->readpage()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: be18ce2d198ffa48093b0125ebb30188aabf0213

Comment by Gerrit Updater [ 20/Jan/23 ]

"Patrick Farrell <farr0186@gmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49723
Subject: LU-16412 llite: check read page past requested
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6cd31b52dda862131b619b4e5acccf248611b358

Comment by Gerrit Updater [ 31/Jan/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49433/
Subject: LU-16412 llite: check truncated page in ->readpage()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 209afbe28b5f164bda81f54ff0797be459e14b44

Comment by Gerrit Updater [ 03/Feb/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49723/
Subject: LU-16412 llite: check read page past requested
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 2f8f38effac3a95199cdcdbd4854f958cdb0c72c

Comment by Peter Jones [ 03/Feb/23 ]

Landed for 2.16

Comment by Andreas Dilger [ 13/Mar/23 ]

Patch for the upstream kernel submitted:
https://lore.kernel.org/linux-fsdevel/20230208022400.28962-1-coolqyj@163.com/

Accepted into kernel and nackported to 6.1 and 5.15 stable trees:
https://lore.kernel.org/stable/20230220133603.227781589@linuxfoundation.org/
https://lore.kernel.org/stable/20230220133556.188276389@linuxfoundation.org/

This patch also ends up improving fxmark benchmark performance by 13%, likely due to avoiding extraneous reads of pages not actually requested by the application:
https://lore.kernel.org/linux-fsdevel/202302171032.69bd3cf7-yujie.liu@intel.com/

Comment by Gerrit Updater [ 13/Mar/23 ]

"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50277
Subject: LU-16412 llite: check read page past requested
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: 3c35c311211e561dd83b665f081e0112044de98a

Comment by Gerrit Updater [ 11/Apr/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50277/
Subject: LU-16412 llite: check read page past requested
Project: fs/lustre-release
Branch: b2_15
Current Patch Set:
Commit: c6388ef80ff593936296f394a0154729578ac6eb

Generated at Sat Feb 10 03:26:47 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.