Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5598

osd-zfs should allow more concurrent reads

Details

    • Improvement
    • Resolution: Won't Fix
    • Critical
    • None
    • Lustre 2.6.0
    • 15644

    Description

      In the OBD layer, ofd_preprw_read() calls dt_bufs_get(dbo_bufs_get==osd_bufs_get_read) multiple times to prepare local buffers and then calls dt_read_prep(dbo_read_prep==osd_read_prep) once for all local buffers, in the hope that the osd driver would issue all reads concurrently in one go.

      But the osd-zfs driver currently issues reads and waits for IO to complete in osd_bufs_get_read(), and does NOOP in osd_read_prep (see also LU-4820). So in ofd_preprw_read() the reads are essentially done sequentially, i.e. no concurrent IO.

      The ZFS IO scheduler services reads from osd-zfs in FIFO order, i.e. no reordering or merging. It might be possible to reduce seeks by having more concurrent reads and using the Linux deadline IO scheduler under the pool - see HP-5 for an example of poor read throughput due to excessive seeks.

      Attachments

        Activity

          [LU-5598] osd-zfs should allow more concurrent reads
          pjones Peter Jones added a comment -

          As per Isaac we can mark this as "Won't Fix"

          pjones Peter Jones added a comment - As per Isaac we can mark this as "Won't Fix"

          This would increase concurrency only for OBD_BRW_READ RPCs with multiple (offset, len) ranges. I wonder if it's worthwhile as the fix would likely involve patching ZFS, and it seems that it'd increase read concurrency by simply adding more service threads.

          isaac Isaac Huang (Inactive) added a comment - This would increase concurrency only for OBD_BRW_READ RPCs with multiple (offset, len) ranges. I wonder if it's worthwhile as the fix would likely involve patching ZFS, and it seems that it'd increase read concurrency by simply adding more service threads.

          I'm thinking about:
          1. In osd_bufs_get_read(), call dmu_buf_hold_array_by_bonus(read=FALSE).
          2. In osd_read_prep(), call dbuf_read() and wait for IOs to complete, i.e. duplicate parts of dmu_buf_hold_array_by_dnode().

          isaac Isaac Huang (Inactive) added a comment - I'm thinking about: 1. In osd_bufs_get_read(), call dmu_buf_hold_array_by_bonus(read=FALSE). 2. In osd_read_prep(), call dbuf_read() and wait for IOs to complete, i.e. duplicate parts of dmu_buf_hold_array_by_dnode().

          we'd have to duplicate dmu_buf_hold_array_by_dnode() in osd_bufs_get_read()

          bzzz Alex Zhuravlev added a comment - we'd have to duplicate dmu_buf_hold_array_by_dnode() in osd_bufs_get_read()

          People

            isaac Isaac Huang (Inactive)
            isaac Isaac Huang (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: