Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.7.0
Affects Version/s: Lustre 2.7.0
Labels:
- mq414
- patch
Environment:
single-node testing on master (5c4f68be57 + http://review.whamcloud.com/11258 )
kernel: 2.6.32-358.23.2.el6_lustre.gc9be53c.x86_64
combined MDS+MGS+OSS, 2x MDT, 3xOST on LVM

Severity:
3
Rank (Obsolete):
16062

Description

While running sanity-benchmark.sh dbench, I hit the following memory allocation deadlock under mdc_read_page_remote():

dbench D 0000000000000001 0 14532 1 0x00000004
Call Trace:
resched_task+0x68/0x80
__mutex_lock_slowpath+0x13e/0x180
mutex_lock+0x2b/0x50
lu_cache_shrink+0x203/0x310 [obdclass]
shrink_slab+0x11a/0x1a0
do_try_to_free_pages+0x3f7/0x610
try_to_free_pages+0x92/0x120
__alloc_pages_nodemask+0x478/0x8d0
alloc_pages_current+0xaa/0x110
__page_cache_alloc+0x87/0x90
mdc_read_page_remote+0x13c/0xd90 [mdc] do_read_cache_page+0x7b/0x180
read_cache_page_async+0x19/0x20
read_cache_page+0xe/0x20
mdc_read_page+0x192/0x950 [mdc]
lmv_read_page+0x1e0/0x1210 [lmv]
ll_get_dir_page+0xbc/0x370 [lustre]
ll_dir_read+0x9e/0x300 [lustre]
ll_readdir+0x12a/0x4d0 [lustre]
vfs_readdir+0xc0/0xe0
sys_getdents+0x89/0xf0

The page allocation is recursing into Lustre and the DLM slab shrinker, which is blocked on a lock that is being held. Presumably it needs to use GFP_NOFS during the allocation? I didn't actually check what locks were held, since the machine hung as I was trying to get more info.

Attachments

Issue Links

is related to

LU-2468 MDS out of memory, blocked in ldlm_pools_shrink()

Resolved

LU-14 live replacement of OST

Resolved

Activity

People

Assignee:: Cliff White (Inactive)

Reporter:: Andreas Dilger

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 09/Oct/14 5:49 PM

Updated:: 18/Sep/16 5:13 PM

Resolved:: 08/Feb/15 4:52 AM