Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8307

Add cond_resched between work items in ldlm_bl_thread_main

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.10.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      When clearing all of the ldlm LRUs (as Cray does at the end of
      a job), a ldlm_bl_work_item is generated for each namespace
      and then they are placed on a list for the ldlm_bl threads to
      iterate over.

      If the number of namespaces greatly exceeds the number of
      ldlm_bl threads, a given thread will iterate over many
      namespaces without sleeping looking for work. This can go
      on for an extremely long time and result in an RCU stall.

      This patch adds a cond_resched() between completing one
      work item and looking for the next. This is a fairly cheap
      operation, as it will only schedule if there is an
      interrupt waiting, and it will not be called too much -
      Even the largest file systems have < 100 namespaces per
      ldlm_bl_thread currently.

      Attachments

        Activity

          People

            wc-triage WC Triage
            paf Patrick Farrell
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: