[LU-7330] spinlock lockup on ldlm blp_lock in ldlm_bl_* threads Created: 23/Oct/15 Updated: 27/Jan/17 Resolved: 11/Nov/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Christopher Morrone | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | llnl, llnlfixready | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
Our system administrators have reported that Lustre clients on our big BG/Q system are locking up due to lustre. There are some softlockups, and then the kernel reports a "spinlock lockup" and dumps backtraces for many processes that are all stuck spinning on the same ldlm blp_lock. Many ldlm_bl_* threads are all spinning here: _raw_spin_lock _spin_lock ldlm_bl_get_work ldlm_bl_thread_main Meanwhile many sysiod processes (the I/O node portion of the I/O forwarding system on BG/Q) are stuck in either a read or write system call path that go through shrink_slab() and get stuck on the same spin lock. Here is an example backtrace for a read case: _raw_spin_lock _spin_lock _ldlm_bl_to_thread ldlm_bl_to_thread ldlm_cancel_lru ldlm_cli_pool_shrink ldlm_pool_shrink ldlm_pools_shrink shrink_slab do_try_free_pages try_free_pages __alloc_pages_nodemask generic_file_aio_read vvp_io_read_start cl_io_start cl_io_loop ll_file_io_generic ll_file_aio_read ll_file_read vfs_read sys_read Crash dumps are unfortunately not available on this system. That is about ll the information that I can currently get. Those clients are running lustre 2.5.4-4chaos. |
| Comments |
| Comment by Andreas Dilger [ 23/Oct/15 ] |
|
When the clients lock up, do you know how many ldlm_bl_* threads are running? There was another issue reported recently about too many ldlm_bl threads being started when a process is interrupted, but setting options ptlrpc ldlm_num_threads=16 has avoided the problem. |
| Comment by Christopher Morrone [ 23/Oct/15 ] |
|
The "spinlock lockup" backtraces are showing in the 50+ threads, and the thread numbers are up in the 120s. I am also seeing multiple ldlm_bl_* threads duplicate names. On one node that is not currently locked up, there are 160 ldlm_bl_* threads with only 72 unique names between them. |
| Comment by Andreas Dilger [ 24/Oct/15 ] |
|
It sounds as if this may be a similar problem. The thread names are not important for their functionality, but the duplicate names do imply that many of the threads started around the same time. You could try the module option, but that won't take effect until the next remount. |
| Comment by Peter Jones [ 26/Oct/15 ] |
|
Niu Could you please assist with this issue? Thanks Peter |
| Comment by Andreas Dilger [ 26/Oct/15 ] |
|
[ Comments from the other ticket, which cannot be made public ]
There is one problem that is shown by the stack traces, which may only be cosmetic, but may also be a sign of a larger problem. There are a large number of threads named ldlm_bl_38, and in fact most of the ldlm_bl_* threads have duplicate names. That in itself is harmless (though confusing while debugging), but may indicate that there is some race condition when starting up these threads that causes so many to be started in such a short time. static int ldlm_bl_thread_start(struct ldlm_bl_pool *blp) { struct ldlm_bl_thread_data bltd = { .bltd_blp = blp }; struct task_struct *task; init_completion(&bltd.bltd_comp); bltd.bltd_num = atomic_read(&blp->blp_num_threads); snprintf(bltd.bltd_name, sizeof(bltd.bltd_name) - 1, "ldlm_bl_%02d", bltd.bltd_num); task = kthread_run(ldlm_bl_thread_main, &bltd, bltd.bltd_name); if (IS_ERR(task)) { CERROR("cannot start LDLM thread ldlm_bl_%02d: rc %ld\n", atomic_read(&blp->blp_num_threads), PTR_ERR(task)); return PTR_ERR(task); } wait_for_completion(&bltd.bltd_comp); return 0; } The naming problem is clearly in the code above - it picks the thread name before actually starting the thread and incrementing the counter. I counted about 200 threads started on this node, none of which were doing anything. It appears that the check for the number of running threads is also racy: /* Not fatal if racy and have a few too many threads */ static int ldlm_bl_thread_need_create(struct ldlm_bl_pool *blp, struct ldlm_bl_work_item *blwi) { int busy = atomic_read(&blp->blp_busy_threads); if (busy >= blp->blp_max_threads) return 0; if (busy < atomic_read(&blp->blp_num_threads)) return 0; : : } and could be improved. I agree a small number of threads too many is not harmful, but it seems that this can go badly wrong in some cases. There is a module parameter that could be used to limit the number of threads, to add in /etc/modprobe.d/lustre.conf: options ptlrpc ldlm_num_threads=16 which should reduce the number of threads significantly, but shouldn't cause any problems for clients since they don't handle many of these RPCs at one time. Note that this tunable may only fix the "symptom" (many LDLM threads) and not the root cause of the hang. It's hard to know for sure what is causing this problem since there are no error messages and the threads themselves are not doing anything either.
|
| Comment by Andreas Dilger [ 26/Oct/15 ] |
|
I think there are three items that need to be fixed here:
|
| Comment by Niu Yawei (Inactive) [ 03/Nov/15 ] |
|
I scrutinized the code and looks the only possible reason of so many bl threads being started is the racy thread starting code itself. When there are lots of unused locks cached on client and kernel senses memory pressure, there could be lots bl work items being generated and processed by the bl threads, and if several bl threads running in to the ldlm_bl_thread_need_create() in parallel, lot more thread could be created mistakenly (it could exceed the blp_max_threads). Probably it caused the spinlock lockup problem as well? (too many threads contending on the blp_lock) I'll post a patch to fix the problem of forking too many threads exceeding limit (along with the wrong thread name) first. That could fix the spinlock lockup too.
Andreas, usually server sends blocking callbacks to client, so client should have more bl callbacks to handle, right? In addition, the bl threads on client needs to handle the unused lock cancel, so I don't think we should decrease the maximum thread number for client. (it's a per module parameter now, and bl threads are shared by all namespaces) |
| Comment by Gerrit Updater [ 03/Nov/15 ] |
|
Niu Yawei (yawei.niu@intel.com) uploaded a new patch: http://review.whamcloud.com/17026 |
| Comment by Gerrit Updater [ 11/Nov/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17026/ |
| Comment by Joseph Gmitter (Inactive) [ 11/Nov/15 ] |
|
Landed for 2.8 |
| Comment by Andreas Dilger [ 29/Mar/16 ] |
|
Just to restate the workaround for this problem (so that it is more easily visible) is to add options ptlrpc ldlm_num_threads=16 to /etc/modprobe.d/lustre.conf to avoid the problem until upgrading to a release in which this has been fixed. |