Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13419

Simplify osc_enter_cache_try

XMLWordPrintable

    • Icon: Improvement Improvement
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • 3
    • 9223372036854775807

      When doing writes to many files, one bottleneck on a client
      currently seems to be the grant code, specifically
      spinning in the lock around:
      osc_enter_cache_try

      The contention is just on osc_enter_cache_try, so there's
      no obvious way to refactor the lock, etc. Instead, we can
      look at where time is going in the function.

      Two things that stand out:
      obd_dirty_pages is an atomic, but it is always accessed
      under the cl_loi_list_lock, so it can be a regular long.
      In my perf tracing, the add_return to this is 50% of the
      time in this function.

      The assert_spin_lock in osc_consume_write_grant generates
      an atomic read of the cl_loi_list_lock lock value. This
      isn't too painful, but it would be nice to cut it out of
      the hot path. There is already a comment saying the
      cl_loi_list_lock must be held, and this is considered
      enough in most places in Lustre.

      mpirun -np 36 $IOR -o $LUSTRE -w -t 1M -b 2G -i 1 -F

      That's 36 processes on one client, writing to separate
      files.

      Before patch:
      5942.34 MiB/s
      After patch:
      11541 MiB/s

      Looking in perf, the change is huge:
      I go from spending 60% of the time in osc_enter_cache_try
      to less than 1%.

            paf0186 Patrick Farrell
            paf0186 Patrick Farrell
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: