Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-513

cfs_wait_event_interruptible_exclusive is not exclusive

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.2.0, Lustre 2.1.1
    • None
    • None
    • PPC64, RHEL6.1 (but should apply to all platforms)
    • 3
    • 4877

    Description

      It looks like cfs_wait_event_interruptible_exclusive() was accidentally written to use wait_event_interruptible(), when it should use wait_event_interruptible_exclusive().

      This was expecially apparent when trying to run lnet selftest on Sequoia IO Nodes where there is 4-way SMT with 17 cores. This makes 68 "cpus" in linux, so Lustre is creating 68 cfs_wi_sd* scheduler threads.

      Even when writing to a single peer (over o2iblnd) with concurrency 1, I saw ALL of the cfs_wi_sd threads eating all of the CPU time. Throughput numbers were also quite low (680MB/s writing to an x86_64 server, concurrency level 16). My suspicion was that all of the tasks were being woken up instead of only one for each work item. That is what eventually made me find the missing _exclusive() extension.

      When I changed the it to use the _exclusive() version of the function, the cfs_wi_sd* threads no longer appear at the top of "top"'s display, and throughput jumped to 2750MB/s on the write test.

      This would be a problem on all architectures, but with its 68 "processors", it is just more obviously a problem on Sequoia.

      I'll submit a one-line patch.

      Attachments

        Issue Links

          Activity

            People

              liang Liang Zhen (Inactive)
              morrone Christopher Morrone (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: