Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
    • Rank (Obsolete):
      9223372036854775807

      Description

      When doing writes to many files, one bottleneck on a client
      currently seems to be the grant code, specifically
      spinning in the lock around:
      osc_enter_cache_try

      The contention is just on osc_enter_cache_try, so there's
      no obvious way to refactor the lock, etc. Instead, we can
      look at where time is going in the function.

      Two things that stand out:
      obd_dirty_pages is an atomic, but it is always accessed
      under the cl_loi_list_lock, so it can be a regular long.
      In my perf tracing, the add_return to this is 50% of the
      time in this function.

      The assert_spin_lock in osc_consume_write_grant generates
      an atomic read of the cl_loi_list_lock lock value. This
      isn't too painful, but it would be nice to cut it out of
      the hot path. There is already a comment saying the
      cl_loi_list_lock must be held, and this is considered
      enough in most places in Lustre.

      mpirun -np 36 $IOR -o $LUSTRE -w -t 1M -b 2G -i 1 -F

      That's 36 processes on one client, writing to separate
      files.

      Before patch:
      5942.34 MiB/s
      After patch:
      11541 MiB/s

      Looking in perf, the change is huge:
      I go from spending 60% of the time in osc_enter_cache_try
      to less than 1%.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                paf0186 Patrick Farrell
                Reporter:
                paf0186 Patrick Farrell
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated: