Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17181

lu_sites_guard sem caused a page reclaim starvation.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Linux MM can run serval cache reclaim in parallel,
      but lu_site_guard blocks an other threads to be work.
      A specially due cond_resched inside.

      PID: 98822  TASK: ffff9766015e0000  CPU: 11  COMMAND: "zabbix_agent2"
       #0 [ffffbeed4e617920] __schedule+708 at ffffffff9b54e1d4
       #1 [ffffbeed4e6179b8] schedule+56 at ffffffff9b54e648
       #2 [ffffbeed4e6179c8] rwsem_down_read_slowpath+864 at ffffffff9b5511d0
       #3 [ffffbeed4e617a60] lu_cache_shrink_count+30 at ffffffffc0fb34fe [obdclass]
       #4 [ffffbeed4e617a70] do_shrink_slab+84 at ffffffff9ae74344
       #5 [ffffbeed4e617ae0] shrink_slab+190 at ffffffff9ae74b6e
       #6 [ffffbeed4e617b60] shrink_node+412 at ffffffff9ae795ec
       #7 [ffffbeed4e617be0] do_try_to_free_pages+201 at ffffffff9ae79bb9
       #8 [ffffbeed4e617c30] try_to_free_pages+239 at ffffffff9ae79fbf
       #9 [ffffbeed4e617cd0] __alloc_pages_slowpath+945 at ffffffff9aebd7b1
      #10 [ffffbeed4e617dc8] __alloc_pages_nodemask+643 at ffffffff9aebe3a3
      #11 [ffffbeed4e617e28] __get_free_pages+10 at ffffffff9aeb86ca
      

      vs

      PID: 98811  TASK: ffff977ba2865f00  CPU: 16  COMMAND: "p_check_lustre_"
       #0 [ffffbeed6cf3f828] __schedule+708 at ffffffff9b54e1d4
       #1 [ffffbeed6cf3f8c0] preempt_schedule_common+10 at ffffffff9b54e6fa
       #2 [ffffbeed6cf3f8c8] _cond_resched+29 at ffffffff9b54e72d
       #3 [ffffbeed6cf3f8d0] mutex_lock+14 at ffffffff9b55087e
       #4 [ffffbeed6cf3f8e0] lod_striping_free+27 at ffffffffc1693a2b [lod]
       #5 [ffffbeed6cf3f900] lod_object_free+158 at ffffffffc169c43e [lod]
       #6 [ffffbeed6cf3f910] lu_object_free+216 at ffffffffc0fb2ed8 [obdclass]
       #7 [ffffbeed6cf3f978] lu_site_purge_objects+982 at ffffffffc0fb5d16 [obdclass]
       #8 [ffffbeed6cf3fa18] lu_cache_shrink_scan+146 at ffffffffc0fb5fe2 [obdclass]
       #9 [ffffbeed6cf3fa70] do_shrink_slab+300 at ffffffff9ae7441c
      #10 [ffffbeed6cf3fae0] shrink_slab+190 at ffffffff9ae74b6e
      

      Attachments

        Activity

          People

            shadow Alexey Lyashkov
            shadow Alexey Lyashkov
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: