[LU-7330] spinlock lockup on ldlm blp_lock in ldlm_bl_* threads - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.8.0
Affects Version/s: None
Labels:
- llnl
- llnlfixready

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

Our system administrators have reported that Lustre clients on our big BG/Q system are locking up due to lustre. There are some softlockups, and then the kernel reports a "spinlock lockup" and dumps backtraces for many processes that are all stuck spinning on the same ldlm blp_lock.

Many ldlm_bl_* threads are all spinning here:

_raw_spin_lock
_spin_lock
ldlm_bl_get_work
ldlm_bl_thread_main

Meanwhile many sysiod processes (the I/O node portion of the I/O forwarding system on BG/Q) are stuck in either a read or write system call path that go through shrink_slab() and get stuck on the same spin lock. Here is an example backtrace for a read case:

_raw_spin_lock
_spin_lock
_ldlm_bl_to_thread
ldlm_bl_to_thread
ldlm_cancel_lru
ldlm_cli_pool_shrink
ldlm_pool_shrink
ldlm_pools_shrink
shrink_slab
do_try_free_pages
try_free_pages
__alloc_pages_nodemask
generic_file_aio_read
vvp_io_read_start
cl_io_start
cl_io_loop
ll_file_io_generic
ll_file_aio_read
ll_file_read
vfs_read
sys_read

Crash dumps are unfortunately not available on this system. That is about ll the information that I can currently get.

Those clients are running lustre 2.5.4-4chaos.

Attachments

Issue Links

mentioned in: Page Loading...

Activity

People

Assignee:: Niu Yawei (Inactive)

Reporter:: Christopher Morrone (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 23/Oct/15 12:39 AM

Updated:: 27/Jan/17 10:38 PM

Resolved:: 11/Nov/15 6:14 PM