Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13419

Simplify osc_enter_cache_try

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Major
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      When doing writes to many files, one bottleneck on a client
      currently seems to be the grant code, specifically
      spinning in the lock around:
      osc_enter_cache_try

      The contention is just on osc_enter_cache_try, so there's
      no obvious way to refactor the lock, etc. Instead, we can
      look at where time is going in the function.

      Two things that stand out:
      obd_dirty_pages is an atomic, but it is always accessed
      under the cl_loi_list_lock, so it can be a regular long.
      In my perf tracing, the add_return to this is 50% of the
      time in this function.

      The assert_spin_lock in osc_consume_write_grant generates
      an atomic read of the cl_loi_list_lock lock value. This
      isn't too painful, but it would be nice to cut it out of
      the hot path. There is already a comment saying the
      cl_loi_list_lock must be held, and this is considered
      enough in most places in Lustre.

      mpirun -np 36 $IOR -o $LUSTRE -w -t 1M -b 2G -i 1 -F

      That's 36 processes on one client, writing to separate
      files.

      Before patch:
      5942.34 MiB/s
      After patch:
      11541 MiB/s

      Looking in perf, the change is huge:
      I go from spending 60% of the time in osc_enter_cache_try
      to less than 1%.

      Attachments

        Issue Links

          Activity

            People

              paf0186 Patrick Farrell (Inactive)
              paf0186 Patrick Farrell (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: