[LU-5722] memory allocation deadlock under lu_cache_shrink() Created: 09/Oct/14 Updated: 18/Sep/16 Resolved: 08/Feb/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0 |
| Fix Version/s: | Lustre 2.7.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Andreas Dilger | Assignee: | Cliff White (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | mq414, patch | ||
| Environment: |
single-node testing on master (5c4f68be57 + http://review.whamcloud.com/11258 ) |
||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 16062 | ||||||||||||
| Description |
|
While running sanity-benchmark.sh dbench, I hit the following memory allocation deadlock under mdc_read_page_remote(): dbench D 0000000000000001 0 14532 1 0x00000004 Call Trace: resched_task+0x68/0x80 __mutex_lock_slowpath+0x13e/0x180 mutex_lock+0x2b/0x50 lu_cache_shrink+0x203/0x310 [obdclass] shrink_slab+0x11a/0x1a0 do_try_to_free_pages+0x3f7/0x610 try_to_free_pages+0x92/0x120 __alloc_pages_nodemask+0x478/0x8d0 alloc_pages_current+0xaa/0x110 __page_cache_alloc+0x87/0x90 mdc_read_page_remote+0x13c/0xd90 [mdc] do_read_cache_page+0x7b/0x180 read_cache_page_async+0x19/0x20 read_cache_page+0xe/0x20 mdc_read_page+0x192/0x950 [mdc] lmv_read_page+0x1e0/0x1210 [lmv] ll_get_dir_page+0xbc/0x370 [lustre] ll_dir_read+0x9e/0x300 [lustre] ll_readdir+0x12a/0x4d0 [lustre] vfs_readdir+0xc0/0xe0 sys_getdents+0x89/0xf0 The page allocation is recursing into Lustre and the DLM slab shrinker, which is blocked on a lock that is being held. Presumably it needs to use GFP_NOFS during the allocation? I didn't actually check what locks were held, since the machine hung as I was trying to get more info. |
| Comments |
| Comment by Andreas Dilger [ 09/Oct/14 ] |
|
It may be that this is a larger problem than just mdc_read_page_local(). Running sanity.sh again I see multiple threads stuck in test_49() calling lu_cache_shrink(), even threads unrelated to Lustre: kswapd0 D 0000000000000001 0 38 2 0x00000000 Call Trace: __mutex_lock_slowpath+0x13e/0x180 mutex_lock+0x2b/0x50 lu_cache_shrink+0x203/0x310 [obdclass] shrink_slab+0x11a/0x1a0 balance_pgdat+0x59a/0x820 kswapd+0x134/0x3c0 kthread+0x96/0xa0 This will impact all threads trying to allocate memory. Some threads also get stuck in direct reclaim if memory is low (this is just one of many threads stuck at lu_cache_shrink+0x203/0x310): sendmail D 0000000000000001 0 15730 2260 0x00000000 Call Trace: __mutex_lock_slowpath+0x13e/0x180 mutex_lock+0x2b/0x50 lu_cache_shrink+0x203/0x310 [obdclass] shrink_slab+0x11a/0x1a0 do_try_to_free_pages+0x3f7/0x610 try_to_free_pages+0x92/0x120 __alloc_pages_nodemask+0x478/0x8d0 kmem_getpages+0x62/0x170 fallback_alloc+0x1ba/0x270 ____cache_alloc_node+0x99/0x160 kmem_cache_alloc_node+0x89/0x1d0 __alloc_skb+0x4f/0x190 sk_stream_alloc_skb+0x41/0x110 tcp_sendmsg+0x350/0xa20 sock_aio_write+0x19b/0x1c0 do_sync_write+0xfa/0x140 vfs_write+0x184/0x1a0 sys_write+0x51/0x90 Some threads are stuck at lu_cache_shrink+0x144/0x310 instead: Oct 9 15:43:18 sookie-gig kernel: irqbalance D 0000000000000000 0 1616 1 0x00000000 Oct 9 15:43:18 sookie-gig kernel: Call Trace: __mutex_lock_slowpath+0x13e/0x180 mutex_lock+0x2b/0x50 lu_cache_shrink+0x144/0x310 [obdclass] shrink_slab+0x11a/0x1a0 do_try_to_free_pages+0x3f7/0x610 try_to_free_pages+0x92/0x120 __alloc_pages_nodemask+0x478/0x8d0 alloc_pages_vma+0x9a/0x150 handle_pte_fault+0x76b/0xb50 handle_mm_fault+0x23a/0x310 __do_page_fault+0x139/0x480 do_page_fault+0x3e/0xa0 page_fault+0x25/0x30 proc_reg_read+0x7e/0xc0 vfs_read+0xb5/0x1a0 sys_read+0x51/0x90 It seems some of the code has been inlined, but all callers are blocked on getting lu_sites_guard. (gdb) list *(lu_cache_shrink+0x203) 0x51c33 is in lu_cache_shrink (/usr/src/lustre-head/lustre/obdclass/lu_object.c:1961). 1956 1957 if (!(sc->gfp_mask & __GFP_FS)) 1958 return 0; 1959 1960 mutex_lock(&lu_sites_guard); 1961 list_for_each_entry_safe(s, tmp, &lu_sites, ls_linkage) { 1962 memset(&stats, 0, sizeof(stats)); 1963 lu_site_stats_get(s->ls_obj_hash, &stats, 0); 1964 cached += stats.lss_total - stats.lss_busy; 1965 } (gdb) list *(lu_cache_shrink+0x144) 0x51b74 is in lu_cache_shrink (/usr/src/lustre-head/lustre/obdclass/lu_object.c:1996). 1991 * anyways. 1992 */ 1993 return SHRINK_STOP; 1994 1995 mutex_lock(&lu_sites_guard); 1996 list_for_each_entry_safe(s, tmp, &lu_sites, ls_linkage) { 1997 remain = lu_site_purge(&lu_shrink_env, s, remain); 1998 /* 1999 * Move just shrunk site to the tail of site list to 2000 * assure shrinking fairness. It isn't clear which thread is holding the lu_sites_guard mutex. In both hangs so far, it appears there is a running thread that may be holding this mutex. The stack trace is unclear because it is entirely marked with "?", but that may always be the case for running threads, or this may be leftover garbage on the stack and the process is running in userspace (though I don't see any "cleanup" routines on the stack)? cpuspeed R running task 0 1606 1 0x00000000
Call Trace:
? thread_return+0x4e/0x76e
? apic_timer_interrupt+0xe/0x20
? mutex_lock+0x1e/0x50
? cfs_hash_spin_lock+0xe/0x10 [libcfs]
? lu_site_purge+0x134/0x4e0 [obdclass]
? _spin_lock+0x12/0x30
? cfs_hash_spin_lock+0xe/0x10 [libcfs]
? lu_site_stats_get+0x98/0x170 [obdclass]
? lu_cache_shrink+0x242/0x310 [obdclass]
? shrink_slab+0x12a/0x1a0
? do_try_to_free_pages+0x3f7/0x610
? try_to_free_pages+0x92/0x120
? __alloc_pages_nodemask+0x478/0x8d0
? alloc_pages_vma+0x9a/0x150
? handle_pte_fault+0x76b/0xb50
? handle_mm_fault+0x23a/0x310
? __do_page_fault+0x139/0x480
? do_page_fault+0x3e/0xa0
? page_fault+0x25/0x30
? proc_reg_read+0x7e/0xc0
? vfs_read+0xb5/0x1a0
? sys_read+0x51/0x90
runnable tasks:
task PID tree-key switches prio exec-runtime sum-exec sum-sleep
----------------------------------------------------------------------------------------------------------
R cpuspeed 1606 2735264.028890 1596454 120 2735264.028890 1068807.826562 11067012.338451 /
(gdb) list *(lu_cache_shrink+0x242)
0x51c72 is in lu_cache_shrink (/usr/src/lustre-head/lustre/obdclass/lu_object.c:1964).
1959
1960 mutex_lock(&lu_sites_guard);
1961 list_for_each_entry_safe(s, tmp, &lu_sites, ls_linkage) {
1962 memset(&stats, 0, sizeof(stats));
1963 lu_site_stats_get(s->ls_obj_hash, &stats, 0);
1964 cached += stats.lss_total - stats.lss_busy;
1965 }
1966 mutex_unlock(&lu_sites_guard);
1967
1968 cached = (cached / 100) * sysctl_vfs_cache_pressure;
|
| Comment by Frank Zago (Inactive) [ 29/Oct/14 ] |
|
I don't know whether it's the same bug or not, but I've seen something that looks similar under 2.5. A lot of processes are stuck in or around lu_cache_shrink. The machine is not really hung, but not usable either, and needs rebooting. I made a patch for it. Here's a forward ported patch for head of tree, but untested: http://review.whamcloud.com/#/c/12468/ |
| Comment by Gerrit Updater [ 08/Feb/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12468/ |
| Comment by Peter Jones [ 08/Feb/15 ] |
|
Landed for 2.7 |