Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18053

Add active osc_lru_shrink() limit

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.15.5
    • RHEL 8.9 client running lustre 2.15.5

    Description

      When running a single shared file IOR on a compute node with a large number of cores it's possible to trigger soft locks.  Applying LU-17630 helps but doesn't entirely resolve the issue.  The stack traces logged by the soft lockup watchdog indicate the cause is heavy contention in delete_from_page_cache() on the page cache spin lock.

      RIP: 0010:delete_from_page_cache+0x52/0x70
      [ 9375.915829]  generic_error_remove_page+0x36/0x60
      [ 9375.915837]  cl_page_discard+0x47/0x80 [obdclass]
      [ 9375.915883]  discard_pagevec+0x7d/0x150 [osc]
      [ 9375.915900]  osc_lru_shrink+0x87f/0x8b0 [osc] 
      [ 9375.915913]  lru_queue_work+0xfd/0x230 [osc]
      [ 9375.915925]  work_interpreter+0x32/0x110 [ptlrpc]
      [ 9375.915992]  ptlrpc_check_set+0x5cf/0x1fc0 [ptlrpc]
      [ 9375.916052]  ptlrpcd+0x6df/0xa70 [ptlrpc]
      [ 9375.916176]  kthread+0x14c/0x170

      It looks like this is possible because:
      1. Multiple callers pass 'force=1' to osc_lru_shrink() allowing multiple threads to run concurrently.   lru_queue_work() does use 'force=0' which is good.
      2. There is no per-filesystem or per-node limit on how many threads can run osc_lru_shrink().  It's only limited per client_obd using the 'cl_lru_shrinkers' atomic.

      I'll push a patch for review which adds a per-filesystem limit.  Interestingly, it looks portions of this may have been implemented long ago but not completed.  The proposed patch still needs to be tested on a system with a large number of OSCs but I wanted to post it for initial feedback.

      Attachments

        Issue Links

          Activity

            People

              behlendorf Brian Behlendorf
              behlendorf Brian Behlendorf
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated: