Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-874 Client eviction on lock callback timeout
  3. LU-917

shared single file client IO submission with many cores starves OST DLM locks due to max_rpcs_in_flight

    XMLWordPrintable

Details

    • Technical task
    • Resolution: Won't Fix
    • Minor
    • None
    • Lustre 2.1.0, Lustre 2.2.0
    • 10219

    Description

      In LU-874 the shared single-file IOR testing with 512 threads on 32 clients (16 cores per client) writing 128MB chunks to a file striped over 2 OSTs. This showed clients timing out on DLM locks. The threads on the single client are writing to disjoint parts of the file (i.e. each thread has its own DLM extent that is not adjacent to the extents written by other threads on that client).

      For example, to reproduce this workload with 4 clients (A, B, C, D) against 2 OSTs (1, 2):

      Client ABCDABCDABCD...
      OST 121212121212...

      While this IOR test is running, other tests are also running on different clients to create a very heavy IO load on the OSTs.

      It may be that DLM locks on the OST are not getting any IO requests sent to refresh the DLM locks:

      • due to the number of active DLM locks on the client for a single OST being more than the number of rpcs in flight, some of the locks may be starved for sending BRW RPCs under that lock to the OST to refresh the lock timeout
      • due to the IO ordering of the BRW requests on the client, it may be that all of the pages for the lower-offset extent are sent to the OST before the pages for a higher-offset extent are ever sent
      • the high priority request queue on the OST may not be enough to help this if several locks on the client for one OST are canceled at the same time

      Some solutions that might help this (individually, or in combination):
      1. increase the max_rpcs_in_flight = core count, but I think this is bad in the long run since it can dramatically increase the number of RPCs that need to be handled at one time by each OST
      2. always allow at least one BRW RPC in flight for each lock that is being canceled
      3. prioritize ALL BRW RPCs for a blocked lock in advance of non-blocked BRW requests (e.g. like a high-priority request queue on the client)
      4. both (2) and (3) may be needed in order to avoid starvation as the client core count increases

      Attachments

        Activity

          People

            jay Jinshan Xiong (Inactive)
            adilger Andreas Dilger
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: