Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-685

Wide busy lock in kiblnd_pool_alloc_node

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.2.0, Lustre 2.1.2
    • None
    • None
    • 3
    • 4837

    Description

      Hi,

      At CEA they have experienced at least 3 hangs of an MDS in the following context:

      • most of the memory (> 99%) is allocated by the kernel, caching inodes + attributes (system have 64 GB of memory)
      • 31 threads (over 32 cores) are busy waiting with the following backtrace (all are kiblns_sd_xx)

      _spin_lock
      kiblnd_pool_alloc_node
      kiblnd_get_idle_tx
      kiblnd_check_sends
      kiblnd_post_rx
      kiblnd_recv
      lnet_ni_recv
      lnet_recv_put
      lnet_parse
      kiblnd_handle_rx
      kiblnd_rx_complete
      kiblnd_complete
      kiblnd_scheduler
      kernel_thread

      • 1 thread (kiblnd_sd_11 in this case) is scheduled with the following bt

      schedule
      cfs_schedule
      kiblnd_pool_alloc_node
      kiblnd_get_idle_tx
      kiblnd_check_sends
      kiblnd_post_rx
      kiblnd_recv
      lnet_ni_recv
      lnet_recv_put
      lnet_parse
      kiblnd_handle_rx
      kiblnd_rx_complete
      kiblnd_complete
      kiblnd_scheduler
      kernel_thread

      • 1 thread (router_checker) is busy waiting with the following bt (the 32 th)

      spin_lock
      kiblnd_pool_alloc_node
      kiblnd_get_idle_tx
      kiblnd_send
      lnet_ni_send
      lnet_send
      LNetGet
      lnet_router_checker
      kernel_thread

      • 1 thread (kiblnd_connd) is scheduled with the following bt (the one owning the lock?)

      schedule_timeout
      __down
      down
      ldlm_pools_shrink
      ldlm_polls_cli_shrink
      shrink_slab
      do_try_to_free_pages
      __alloc_pages_nodemask
      kmem_getpages
      fallback_alloc
      ____cache_alloc_node
      __kmalloc
      cfs_alloc
      kiblnd_alloc_pages
      kiblnd_create_tx_pool
      kiblnd_pool_alloc_node
      kiblnd_get_idle_tx
      kiblnd_check_sends
      kiblnd_check_conns
      kiblnd_connd
      kernel_kthread

      So, havind #cpu threads busy waiting on a spinlock is not ideal, especially when the owner seems to be scheduled awaiting for something else to happen. At the moment it is not possible to verify if the lock is the same for all threads because the crash dump is not accessible.

      Trying to workaround this issue, we made a patch that allows choosing how many kiblnd threads are started, via a kernel module option. We set this number to 24 (for 32 cores).
      The issue reproduced, this time having 24 threads spinning on a spinlock, 4 kswapd waiting to acquire a lock in lu_cache_shrink, most of the others threads stucks in congestion_wait.

      Cheers,
      Sebastien.

      Attachments

        Issue Links

          Activity

            People

              laisiyao Lai Siyao
              sebastien.buisson Sebastien Buisson (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: