Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7226

improve osd-zfs blocksize heuristics

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Duplicate
    • Affects Version/s: Lustre 2.9.0
    • Fix Version/s: None
    • Labels:
      None
    • Rank (Obsolete):
      9223372036854775807

      Description

      The current osd-zfs blocksize selection heuristic since LU-4865 was landed is "grow blocksize up to maximum as long as linear writes are done from the start of the file". That is OK for sequential file IO is done, but does not handle other use cases very well. Also, this can have a serious negative performance impact if the ZFS blocksize is larger than the Lustre RPC size (e.g. in a mixed-client-version environment where some clients have 1MB RPCs and others have 4MB RPCs, recordsize=4M), since it would cause the blocksize to grow up to 4MB, but clients writing to a file with 1MB RPCs would cause three extra read-modify-write cycles for every RPC.

      LU-7225 was filed to allow the client to explicitly specify the blocksize for applications/libraries that are very Lustre-savvy, but until that is implemented, and for applications that are not using the ladvise API, the osd-zfs code should have a better blocksize selection heuristic.

      One option is to check for niobuf sizes in the BRW write RPCs, to limit the blocksize to the minimum or median niobuf size within the BRW RPCs. That ensures the blocksize does not grow beyond the maximum RPC size, and for clients that are aggregating smaller client-side writes into one RPC it tries to determine what the natural IO size is. This is not ideal, since random write workloads do not necessarily imply random read workloads, but since the client is already doing IO aggregation it should avoid the worst offenders. Also, if the blocksize is too small then RAID-Z/Z2 would have a lot of space/parity overhead, but possibly not worse than read-modify-write of large blocks.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                wc-triage WC Triage
                Reporter:
                adilger Andreas Dilger
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: