Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18269

NRS TBF bucket prioritization

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.17.0
    • 3
    • 9223372036854775807

    Description

      The NRS TBF implementation prioritizes RPC scheduling to handle RPCs with the oldest deadline. However, if a job is submitting a more RPCs that cannot be processed in each timeslice (e.g. a "bad job") then these RPCs will actually have a higher priority in later time slices because they have missed their deadline.

      While this is OK if "real time" scheduling is more important, this is actually the opposite of what we want for normal behavior, which is fair share among the jobs and processing the small number of RPCs for most jobs, and only process the many RPCs from the "bad job" afterward.

      This is especially true when the "bad job" is sending too many RPCs for the server to process, so that they are all delayed. This will prioritize those delayed RPCs even when they are the source of the server load.

      Some change needs to be made to the TBF processing so that it will "round robin" RPC selection among buckets with the same priority, rather than always preferring buckets with the oldest deadline.

      Attachments

        Activity

          People

            wc-triage WC Triage
            adilger Andreas Dilger
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: