[LU-5598] osd-zfs should allow more concurrent reads Created: 09/Sep/14  Updated: 13/Jul/15  Resolved: 13/Jul/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: None

Type: Improvement Priority: Critical
Reporter: Isaac Huang (Inactive) Assignee: Isaac Huang (Inactive)
Resolution: Won't Fix Votes: 0
Labels: zfs

Rank (Obsolete): 15644

 Description   

In the OBD layer, ofd_preprw_read() calls dt_bufs_get(dbo_bufs_get==osd_bufs_get_read) multiple times to prepare local buffers and then calls dt_read_prep(dbo_read_prep==osd_read_prep) once for all local buffers, in the hope that the osd driver would issue all reads concurrently in one go.

But the osd-zfs driver currently issues reads and waits for IO to complete in osd_bufs_get_read(), and does NOOP in osd_read_prep (see also LU-4820). So in ofd_preprw_read() the reads are essentially done sequentially, i.e. no concurrent IO.

The ZFS IO scheduler services reads from osd-zfs in FIFO order, i.e. no reordering or merging. It might be possible to reduce seeks by having more concurrent reads and using the Linux deadline IO scheduler under the pool - see HP-5 for an example of poor read throughput due to excessive seeks.



 Comments   
Comment by Alex Zhuravlev [ 09/Sep/14 ]

we'd have to duplicate dmu_buf_hold_array_by_dnode() in osd_bufs_get_read()

Comment by Isaac Huang (Inactive) [ 09/Sep/14 ]

I'm thinking about:
1. In osd_bufs_get_read(), call dmu_buf_hold_array_by_bonus(read=FALSE).
2. In osd_read_prep(), call dbuf_read() and wait for IOs to complete, i.e. duplicate parts of dmu_buf_hold_array_by_dnode().

Comment by Isaac Huang (Inactive) [ 11/Sep/14 ]

This would increase concurrency only for OBD_BRW_READ RPCs with multiple (offset, len) ranges. I wonder if it's worthwhile as the fix would likely involve patching ZFS, and it seems that it'd increase read concurrency by simply adding more service threads.

Comment by Peter Jones [ 13/Jul/15 ]

As per Isaac we can mark this as "Won't Fix"

Generated at Sat Feb 10 01:52:52 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.