[LU-12179] allocate continuous pages when disabled page caches Created: 11/Apr/19  Updated: 11/Jun/20

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Qian Yingjin Assignee: Qian Yingjin
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-11347 Do not use pagecache for SSD I/O when... Resolved
is related to LU-12071 bypass pagecache for large files Resolved
is related to LU-13309 performance optimizations for brw Resolved
Rank (Obsolete): 9223372036854775807

 Description   

LU-11347: Do not use pagecache for SSD I/O when read/write cache are disabled, 

this feature allocates the physical pages per OSS I/O thread in need at the runtime, it may result that  the many allocated pages are  not physical continuous. This may be not good for I/O merge in the block layer due to the limited segements per block I/O request.

For testing purposes, we should pre-allocate continuous pages when OSS thread started as much as possible when disable page chace.



 Comments   
Comment by Gerrit Updater [ 11/Apr/19 ]

Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/34644
Subject: LU-12179 osd: allocate continuous pages when disabled page caches
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8e8d1d32ad55786735823135d6ca126f7fadf40d

Comment by Alex Zhuravlev [ 15/Apr/19 ]

while I understand that SG constraints can prevent large IOs thus limit throughput, I don't see how this is related to pagecache bypassing. i.e. how pagecache can be doing better in this context.

Comment by Qian Yingjin [ 15/Apr/19 ]

IIRC, Ihara did a series of tests using obdfilter-survery,  we found that the performace is bad especially when using 64MB RPC size, and SFA backend can not get large I/Os. After analyzed the debug logs, we found the reason is that the physical pages per I/O thread are not continuous reaching SG constranints.

Current pagechache bypassing allocates pages in need at runtime, when many OSS I/O threads perform IOs concurrently, it may cause that the page allocation is interleaving between each I/O thread, thus many allocated pages per OSS I/O thread is not physical continuous.

If we allocate the pages at the start time of OSS threads, it can  avoid the interleaving page allocation, I think. Thus the pages per I/O thread is much more physical-continnous.

 

Hi Ihara,

Did you do any testing with this patch? any results?

 

Regards,

Qian

 

Comment by Alex Zhuravlev [ 15/Apr/19 ]

> Current pagechache bypassing allocates pages in need at runtime

only once for each thread.

Comment by Qian Yingjin [ 15/Apr/19 ]

From the code it is once for each I/O if the allocated pages are not enough before reach PTLRPC_MAX__BRW_PAGES.

If we allocate pages at the start time of I/O threads and start max_threads during OSS setup, then PTLRPC_MAX_BRW_PAGES pages for each I/O thread are allocated together and it is not interrupt and interleaved by other threads, thus can get much more number of continuous physical pages, I think.

 

If allocating pages during I/O at runtime, then there may be many I/O thread needing to allocate page (via call __page_cache_alloc) for IO concurrently...  and obdfilter-suvery may test with I/O size varying from 1M to 64M, these all will make the allocated pyhiscal pages per I/O thread are not continuous...

Comment by Andreas Dilger [ 11/Jun/20 ]

Now that there are preallocated BRW pages to avoid page cache overhead, since patch https://review.whamcloud.com/32875 "LU-11347 osd: do not use pagecache for I/O", it probably makes sense to update the patch here so those pages can be allocated in larger chunks so that IO submission to the lower storage is more efficient.

Generated at Sat Feb 10 02:50:20 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.