[LU-16613] Future: folio_batch BIO write path Created: 03/Mar/23  Updated: 08/Mar/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Shaun Tancheff Assignee: Shaun Tancheff
Resolution: Unresolved Votes: 0
Labels: None

Rank (Obsolete): 9223372036854775807

 Description   

Prototype support for folio_batch support on page cache backed buffer I/O path.

Relies on a patched kernel to provide:

  • generic_perform_batch_write
  • aops->write_batch_begin and aops->write_patch_end
  • filemap_dirty_folio_batched
  • grab_cache_folios_fast


 Comments   
Comment by Gerrit Updater [ 03/Mar/23 ]

"Shaun Tancheff <shaun.tancheff@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50194
Subject: LU-16613 clio: Use folio_batch write path
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6dc4f8a03385bd507f5ebb5d67503c10e05053c7

Comment by Patrick Farrell [ 03/Mar/23 ]

Shaun,

This is very exciting, I didn’t know the requisite folio stuff existed yet - have you been able to benchmark this at all?  And is there some form of read batch available as well?  The read path is also stuck in page cache work as its main performance limitation.

Comment by Patrick Farrell [ 03/Mar/23 ]

Ah, I see it requires a patched kernel.  Can you say more about this in general? Where are these patches from, are they expected upstream, etc, etc

Comment by Shaun Tancheff [ 05/Mar/23 ]

I have patches here:
https://github.com/stancheff/linux/
branch: dedupe-v1

The basic idea is to enable 'batch' adding to the page cache (and batch dirty pages). I did the first pass on the write path.

There is a device mapper that can be used to do local perf testing, ex:

sudo mkdir -p /vol/fcp 
echo "0 `sudo blockdev --getsz /dev/vda` dedupe /dev/vda 0" | sudo dmsetup create dedupe
sudo mkfs.ext4 -F -E 'lazy_itable_init=0 packed_meta_blocks=1' /dev/mapper/dedupe
sudo mount /dev/mapper/dedupe /vol/fcp
sudo fio --filename=/vol/fcp/fio.bin \
  --size=100GB \
  --buffered=1 \
  --rw=write \
  --bs=64k \
  --buffer_pattern=0x00000000 \
  --ioengine=sync \
  --iodepth=64 \
  --runtime=120 \
  --numjobs=16 \
  --time_based \
  --group_reporting \
  --name=throughput-test-job \
  --eta-newline=1

Locally I see 2 G/s buffered. Still a long ways from direct I/O which is closer to 12G/s

Comment by Shaun Tancheff [ 08/Mar/23 ]

The read path is batch'd:

 generic_file_read_iter() 
    filemap_read()
      filemap_get_pages()
Generated at Sat Feb 10 03:28:31 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.