Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1128

Complete investigation of the LDLM pool shrinker and SLV handling

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.2.0, Lustre 2.1.2
    • Lustre 2.1.0
    • lustre 2.1.0-21chaos (github.com/chaos/lustre)
    • 3
    • 4697

    Description

      We have an OSS where two OST threads are using significant CPU time, and I suspect not making real progress:

      29703 root      20   0     0    0    0 R 40.7  0.0 309:55.91 ll_ost_io_375
      14028 root      20   0     0    0    0 R 39.7  0.0 311:25.93 ll_ost_io_75
      

      When I can catch them between active cycles on the CPU, they show the following backtraces from "crash":

      crash> foreach 14028 29703 bt
      PID: 14028  TASK: ffff88041345ab00  CPU: 7   COMMAND: "ll_ost_io_75"
       #0 [ffff88041270f390] schedule at ffffffff814ee380
       #1 [ffff88041270f458] schedule_timeout at ffffffff814ef235
       #2 [ffff88041270f508] __down at ffffffff814f0152
       #3 [ffff88041270f558] down at ffffffff810968b1
       #4 [ffff88041270f588] ldlm_pools_shrink at ffffffffa05c9fd8 [ptlrpc]
       #5 [ffff88041270f5d8] ldlm_pools_srv_shrink at ffffffffa05ca213 [ptlrpc]
       #6 [ffff88041270f5e8] shrink_slab at ffffffff81129cba
       #7 [ffff88041270f648] zone_reclaim at ffffffff8112c364
       #8 [ffff88041270f728] get_page_from_freelist at ffffffff81122934
       #9 [ffff88041270f848] __alloc_pages_nodemask at ffffffff81123bb1
      #10 [ffff88041270f968] kmem_getpages at ffffffff8115e0d2
      #11 [ffff88041270f998] cache_grow at ffffffff8115e73f
      #12 [ffff88041270fa08] cache_alloc_refill at ffffffff8115e992
      #13 [ffff88041270fa78] __kmalloc at ffffffff8115f6b9
      #14 [ffff88041270fac8] cfs_alloc at ffffffffa032a863 [libcfs]
      #15 [ffff88041270faf8] ptlrpc_prep_bulk_exp at ffffffffa05d286d [ptlrpc]
      #16 [ffff88041270fb58] ost_brw_write at ffffffffa0a459c1 [ost]
      #17 [ffff88041270fcc8] ost_handle at ffffffffa0a49a19 [ost]
      #18 [ffff88041270fdf8] ptlrpc_main at ffffffffa05eb181 [ptlrpc]
      #19 [ffff88041270ff48] kernel_thread at ffffffff8100c14a
      
      PID: 29703  TASK: ffff88030b1be100  CPU: 5   COMMAND: "ll_ost_io_375"
       #0 [ffff8802e04771f0] schedule at ffffffff814ee380
       #1 [ffff8802e04772b8] schedule_timeout at ffffffff814ef235
       #2 [ffff8802e0477368] __down at ffffffff814f0152
       #3 [ffff8802e04773b8] down at ffffffff810968b1
       #4 [ffff8802e04773e8] ldlm_pools_shrink at ffffffffa05c9fd8 [ptlrpc]
       #5 [ffff8802e0477438] ldlm_pools_srv_shrink at ffffffffa05ca213 [ptlrpc]
       #6 [ffff8802e0477448] shrink_slab at ffffffff81129cba
       #7 [ffff8802e04774a8] zone_reclaim at ffffffff8112c364
       #8 [ffff8802e0477588] get_page_from_freelist at ffffffff81122934
       #9 [ffff8802e04776a8] __alloc_pages_nodemask at ffffffff81123bb1
      #10 [ffff8802e04777c8] kmem_getpages at ffffffff8115e0d2
      #11 [ffff8802e04777f8] cache_grow at ffffffff8115e73f
      #12 [ffff8802e0477868] cache_alloc_refill at ffffffff8115e992
      #13 [ffff8802e04778d8] __kmalloc at ffffffff8115f6b9
      #14 [ffff8802e0477928] cfs_alloc at ffffffffa032a863 [libcfs]
      #15 [ffff8802e0477958] filter_alloc_iobuf at ffffffffa0aa799e [obdfilter]
      #16 [ffff8802e04779a8] filter_iobuf_get at ffffffffa0a89c0e [obdfilter]
      #17 [ffff8802e04779c8] filter_preprw_read at ffffffffa0a9ff7e [obdfilter]
      #18 [ffff8802e0477af8] filter_preprw at ffffffffa0aa09b0 [obdfilter]
      #19 [ffff8802e0477b38] obd_preprw at ffffffffa0a3e33e [ost]
      #20 [ffff8802e0477ba8] ost_brw_read at ffffffffa0a44315 [ost]
      #21 [ffff8802e0477cc8] ost_handle at ffffffffa0a48d27 [ost]
      #22 [ffff8802e0477df8] ptlrpc_main at ffffffffa05eb181 [ptlrpc]
      #23 [ffff8802e0477f48] kernel_thread at ffffffff8100c14a
      

      So it appears that they are failing to make progress in the shrinker code.

      The immediate symptom that users see is a client that is unable to reconnect to an OST on this OSS:

      Lustre: 15630:0:(ldlm_lib.c:852:target_handle_connect()) lsd-OST01bb: refuse reconnection from 59381921-e835-2c32-ab83-d3f9e3fbad7f@192.168.120.181@o2ib3 to 0xffff88040d93ec00/1
      

      (As an aside, we really need to improve these console messages. Brian had a set of patches that greatly improved them attached to bugzilla bug 16909 and that LLNL carried for quite a while. But Sun/Oracle never landed them. I'll work on porting them forward and get them into gerrit when I have a chance).

      That message means that one RPC is outstanding for that export, and the client will not be allowed to reconnect until it is removed. We can see the refused connections for hours, and I suspect that it will never fix itself without intervention. My assumption is that one of the runaway ll_ost_io threads is to blame for the outstanding RPC.

      Memory is not really too tight on the node at this time.

      > cat /proc/meminfo 
      MemTotal:       16202948 kB
      MemFree:         2805328 kB
      Buffers:         7522692 kB
      Cached:          1631292 kB
      SwapCached:            0 kB
      Active:          6426516 kB
      Inactive:        2743512 kB
      Active(anon):      58184 kB
      Inactive(anon):   331836 kB
      Active(file):    6368332 kB
      Inactive(file):  2411676 kB
      Unevictable:       31840 kB
      Mlocked:           31840 kB
      SwapTotal:             0 kB
      SwapFree:              0 kB
      Dirty:              7844 kB
      Writeback:           344 kB
      AnonPages:         48572 kB
      Mapped:            12160 kB
      Shmem:            367384 kB
      Slab:            3405396 kB
      SReclaimable:     965160 kB
      SUnreclaim:      2440236 kB
      KernelStack:       13216 kB
      PageTables:         4556 kB
      NFS_Unstable:          0 kB
      Bounce:                0 kB
      WritebackTmp:          0 kB
      CommitLimit:     8101472 kB
      Committed_AS:     584152 kB
      VmallocTotal:   34359738367 kB
      VmallocUsed:      360960 kB
      VmallocChunk:   34350502472 kB
      HardwareCorrupted:     0 kB
      AnonHugePages:         0 kB
      HugePages_Total:       0
      HugePages_Free:        0
      HugePages_Rsvd:        0
      HugePages_Surp:        0
      Hugepagesize:       2048 kB
      DirectMap4k:        9856 kB
      DirectMap2M:     2086912 kB
      DirectMap1G:    14680064 kB
      

      Attachments

        1. diff_between_before_and_after.txt
          407 kB
          Christopher Morrone
        2. ldlm_pools_shrink2.stp
          0.7 kB
          Christopher Morrone
        3. servers_pool_state_and_stats.txt
          526 kB
          Christopher Morrone
        4. servers_pool_state_and_stats2.txt
          692 kB
          Christopher Morrone
        5. stap_script_output.txt.gz
          292 kB
          Christopher Morrone

        Issue Links

          Activity

            People

              niu Niu Yawei (Inactive)
              morrone Christopher Morrone (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: