Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-20204

Growing lock counts need some solutions

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Medium
    • Lustre 2.18.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      System sizes keep growing including RAM and client counts.

      This leads to greatly increasing lock counts in the wild, the other day I saw a system that had nearly 100 million locks on one of the servers. That does not sound much, but for example with our object hash size of 1<<32 buckets (already consumes 1M RAM) - that leads to every bucket containing about 1500 entries - that's how many we need to iterate in the worst case to do the handle -> object lookup, and we do a lot of those!

      Certain workloads (e.g. cancel RPC with ELC where every request could carry hundreds of lock handles that are then rapidly iterated over, but there could be more) get disproportionally affected (LU-20205 for this particular case).

      For the handle->object hash table possible soutions include:

      increasing the hash even more (but then that's temporary again and uses lot's of RAM even on smaller systems where it's not really needed)

      replacing the hashtable entirely with something else (possible options: xarray and rbtrees)

       

      For other hash tables (like resource hashes per namespace) - making them resizable might make sense. I guess I'll make a separate ticket for this case so we can concentrate on the one type of the hash table here.

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: