Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: Lustre 2.17.0, Lustre 2.16.1, Lustre 2.15.7
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

If there is a Lustre client mount on the Lustre metadata server, there will be a deadlock situation happen when system under memory pressure.

Sample stacktrace:

__switch_to+0xbc/0xfc
__schedule+0x28c/0x718
schedule+0x4c/0xcc
schedule_timeout+0x84/0x170
ptlrpc_set_wait+0x464/0x7c8 [ptlrpc]
ptlrpc_queue_wait+0xa4/0x364 [ptlrpc]
mdc_close+0x224/0xe64 [mdc]
lmv_close+0x1a8/0x480 [lmv]
ll_close_inode_openhandle+0x404/0xcc8 [lustre]
ll_md_real_close+0xa4/0x280 [lustre]
ll_clear_inode+0x1a0/0x7e0 [lustre]
ll_delete_inode+0x70/0x260 [lustre]
evict+0xdc/0x240
iput_final+0x8c/0x1c0
iput+0x10c/0x128
dentry_unlink_inode+0xc8/0x150
__dentry_kill+0xec/0x21c
shrink_dentry_list+0xa8/0x138
prune_dcache_sb+0x64/0x94
super_cache_scan+0x128/0x1a4
do_shrink_slab+0x194/0x394
shrink_slab+0xbc/0x13c
shrink_node_memcgs+0x1d4/0x230
shrink_node+0x150/0x5e0
shrink_zones+0x98/0x220
do_try_to_free_pages+0xac/0x2e0
try_to_free_pages+0x120/0x25c
__alloc_pages_slowpath.constprop.0+0x40c/0x85c
__alloc_pages_nodemask+0x2b4/0x308
alloc_pages_current+0x8c/0x13c
allocate_slab+0x3b8/0x4cc
new_slab_objects+0x9c/0x160
___slab_alloc+0x1b0/0x300
__slab_alloc+0x50/0x80
kmem_cache_alloc+0x30c/0x32c
spl_kmem_cache_alloc+0x84/0x1ac [spl]
zfs_btree_add_idx+0x1b4/0x248 [zfs]
range_tree_add_impl+0x868/0xd94 [zfs]
range_tree_add+0x18/0x20 [zfs]
dnode_free_range+0x194/0x6c0 [zfs]
dmu_object_free+0x6c/0xc0 [zfs]
osd_destroy+0x40c/0xb70 [osd_zfs]
lod_sub_destroy+0x204/0x480 [lod]
lod_destroy+0x2d8/0x800 [lod]
mdd_close+0x250/0x1000 [mdd]
mo_close+0x18/0x60 [mdt]
mdt_hsm_release+0x534/0x16a0 [mdt]
mdt_mfd_close+0x1b4/0xe68 [mdt]
mdt_close_internal+0x104/0x3c8 [mdt]
mdt_close+0x270/0x518 [mdt]
tgt_handle_request0+0x2b4/0x658 [ptlrpc]
tgt_request_handle+0x268/0xaac [ptlrpc]
ptlrpc_server_handle_request.isra.0+0x460/0xf20 [ptlrpc]
ptlrpc_main+0xd24/0x15bc [ptlrpc]
kthread+0x118/0x120

Analysis

The deadlock begins in mdt_close() → mdt_hsm_release() → osd_destroy() → dmu_object_free() where ZFS object destruction triggers memory allocation (zfs_btree_add_idx() → kmem_cache_alloc()), but system memory pressure forces kernel memory reclaim (try_to_free_pages() → shrink_dentry_list() → prune_dcache_sb()). This reclaim evicts Lustre client-side inodes (evict() → ll_clear_inode() → ll_md_real_close()), which sends an RPC back to the same server (mdc_close() → ptlrpc_queue_wait() → ptlrpc_set_wait()).

The resulting resource exhaustion prevents ZFS from assigning new transaction groups in osd_trans_start() → dmu_tx_assign() → dmu_tx_wait(), causing other operations to hang.

Related Issues

Similar issue but on a different path: https://jira.whamcloud.com/browse/LU-18246
Previous attempted fix (not merged, also won't fix this issue since it's only covering one specific path): https://review.whamcloud.com/c/fs/lustre-release/+/56442
Not deadlock situation, but if it get triggered from kthreadd, it will result in kernel panic: https://jira.whamcloud.com/browse/LU-18826

Proposed Fix

While there was a simple fix to cover paths that involves zfs api call with spl_fstrans_mark, but the similar issue can happen in other places as well. The root cause here is the behavior that Lustre allocate memory and block on (inline) memory allocation critical path may cause more crash/deadlock. (~~LU-18826~~ for example).

One more general fix idea I have in mind: make Lustre evict_inode (ll_delete_inode) operation executed in system inline memory allocation path return and free up resources without delay waiting for RPC. The required RPC call to MDS can be scheduled and sent asynchronously using ptlrpcd.

Attachments

Activity

People

Assignee:: Lijing Chen

Reporter:: Lijing Chen

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 23/Aug/25 6:38 AM

Updated:: 24/Aug/25 4:52 AM