Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7330

spinlock lockup on ldlm blp_lock in ldlm_bl_* threads

    XMLWordPrintable

Details

    • 3
    • 9223372036854775807

    Description

      Our system administrators have reported that Lustre clients on our big BG/Q system are locking up due to lustre. There are some softlockups, and then the kernel reports a "spinlock lockup" and dumps backtraces for many processes that are all stuck spinning on the same ldlm blp_lock.

      Many ldlm_bl_* threads are all spinning here:

      _raw_spin_lock
      _spin_lock
      ldlm_bl_get_work
      ldlm_bl_thread_main
      

      Meanwhile many sysiod processes (the I/O node portion of the I/O forwarding system on BG/Q) are stuck in either a read or write system call path that go through shrink_slab() and get stuck on the same spin lock. Here is an example backtrace for a read case:

      _raw_spin_lock
      _spin_lock
      _ldlm_bl_to_thread
      ldlm_bl_to_thread
      ldlm_cancel_lru
      ldlm_cli_pool_shrink
      ldlm_pool_shrink
      ldlm_pools_shrink
      shrink_slab
      do_try_free_pages
      try_free_pages
      __alloc_pages_nodemask
      generic_file_aio_read
      vvp_io_read_start
      cl_io_start
      cl_io_loop
      ll_file_io_generic
      ll_file_aio_read
      ll_file_read
      vfs_read
      sys_read
      

      Crash dumps are unfortunately not available on this system. That is about ll the information that I can currently get.

      Those clients are running lustre 2.5.4-4chaos.

      Attachments

        Issue Links

          Activity

            People

              niu Niu Yawei (Inactive)
              morrone Christopher Morrone (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: