[LU-4090] OST unavailable due to possible deadlock - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: Lustre 2.7.0, Lustre 2.5.3
Affects Version/s: Lustre 1.8.8
Labels:
None

Severity:
3
Rank (Obsolete):
10994

Description

One OST became unavailable ane kept on dumping stack traces until its service is taken over by another OSS. This issue occured a couple of time on different servers.

After some inverstigation, we found that a lot of service theads hang at different places. Here is a list of where they stuck.

ll_ost_01:10226,-ll_ost_07:10232,-ll_ost_09:10234,-ll_ost_11:10236,-ll_ost_13:10238,-ll_ost_15:10240,-ll_ost_18:10243
filter_lvbo_init
--filter_fid2dentry
----filter_parent_lock
------filter_lock_dentry
-------~~LOCK_INODE_MUTEX(dparent~~>d_inode);

ll_ost_06:10231,-ll_ost_16:10241,-ll_ost_484,-ll_ost_io_129,-ll_ost_io_123,-ll_ost_383
fsfilt_ext3_start
--ext3_journal_start
----journal_start
------start_this_handle
----------__jbd2_log_wait_for_space
-----------~~mutex_lock(&journal~~>j_checkpoint_mutex);

ll_ost_17:10242
filter_lvbo_init
--filter_fid2dentry
----filter_parent_lock
----lookup_one_len
------__lookup_hash
-------~~inode~~>i_op->lookup-=-ext4_lookup
----------ext4_iget
------------iget_locked
--------------ifind_fast
----------------find_inode_fast
------------------__wait_on_freeing_inode
-------------------?~~ldiskfs_bread...-Child-dentry's-inode~~__I_LOCK

ll_ost_io_15
ost_brw_write
--filter_commitrw_write
----fsfilt_ext3_commit_wait
------autoremove_wake_function
-------~~fsfilt_log_wait_commit~~=-jbd2_log_wait_commit

We think that is not neccessarily the problem of Lustre codes. And we found a nearly merged patch which fixes a similar deadlock problem in __jbd2_log_wait_for_space(). Maybe it is the root cause?

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/fs/jbd2/checkpoint.c?id=0ef54180e0187117062939202b96faf04c8673bc

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

messages.ALPL402.txt
1.12 MB
26/Mar/15 8:07 AM
messages.ALPL401.txt
144 kB
26/Mar/15 8:07 AM
ALPL202.messages_20150518.txt
307 kB
18/May/15 4:26 AM
0001-LU-4090-fsfilt-don-t-wait-forever-for-stale-tid.patch
6 kB
20/May/15 7:58 AM

Activity

People

Assignee:: Zhenyu Xu

Reporter:: Li Xi (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 11/Oct/13 7:25 AM

Updated:: 07/Jun/16 3:38 PM