Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
3
-
9223372036854775807
Description
Our system administrators have reported that Lustre clients on our big BG/Q system are locking up due to lustre. There are some softlockups, and then the kernel reports a "spinlock lockup" and dumps backtraces for many processes that are all stuck spinning on the same ldlm blp_lock.
Many ldlm_bl_* threads are all spinning here:
_raw_spin_lock _spin_lock ldlm_bl_get_work ldlm_bl_thread_main
Meanwhile many sysiod processes (the I/O node portion of the I/O forwarding system on BG/Q) are stuck in either a read or write system call path that go through shrink_slab() and get stuck on the same spin lock. Here is an example backtrace for a read case:
_raw_spin_lock _spin_lock _ldlm_bl_to_thread ldlm_bl_to_thread ldlm_cancel_lru ldlm_cli_pool_shrink ldlm_pool_shrink ldlm_pools_shrink shrink_slab do_try_free_pages try_free_pages __alloc_pages_nodemask generic_file_aio_read vvp_io_read_start cl_io_start cl_io_loop ll_file_io_generic ll_file_aio_read ll_file_read vfs_read sys_read
Crash dumps are unfortunately not available on this system. That is about ll the information that I can currently get.
Those clients are running lustre 2.5.4-4chaos.
Attachments
Issue Links
- mentioned in
-
Page Loading...