[LU-4090] OST unavailable due to possible deadlock Created: 11/Oct/13  Updated: 07/Jun/16

Status: Reopened
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.8
Fix Version/s: Lustre 2.7.0, Lustre 2.5.3

Type: Bug Priority: Major
Reporter: Li Xi (Inactive) Assignee: Zhenyu Xu
Resolution: Unresolved Votes: 0
Labels: None

Attachments: Text File 0001-LU-4090-fsfilt-don-t-wait-forever-for-stale-tid.patch     Text File ALPL202.messages_20150518.txt     Text File messages.ALPL401.txt     Text File messages.ALPL402.txt    
Severity: 3
Rank (Obsolete): 10994

 Description   

One OST became unavailable ane kept on dumping stack traces until its service is taken over by another OSS. This issue occured a couple of time on different servers.

After some inverstigation, we found that a lot of service theads hang at different places. Here is a list of where they stuck.

ll_ost_01:10226,-ll_ost_07:10232,-ll_ost_09:10234,-ll_ost_11:10236,-ll_ost_13:10238,-ll_ost_15:10240,-ll_ost_18:10243
filter_lvbo_init
--filter_fid2dentry
----filter_parent_lock
------filter_lock_dentry
-------LOCK_INODE_MUTEX(dparent>d_inode);

ll_ost_06:10231,-ll_ost_16:10241,-ll_ost_484,-ll_ost_io_129,-ll_ost_io_123,-ll_ost_383
fsfilt_ext3_start
--ext3_journal_start
----journal_start
------start_this_handle
----------__jbd2_log_wait_for_space
-----------mutex_lock(&journal>j_checkpoint_mutex);

ll_ost_17:10242
filter_lvbo_init
--filter_fid2dentry
----filter_parent_lock
----lookup_one_len
------__lookup_hash
-------inode>i_op->lookup-=-ext4_lookup
----------ext4_iget
------------iget_locked
--------------ifind_fast
----------------find_inode_fast
------------------__wait_on_freeing_inode
-------------------?ldiskfs_bread...-Child-dentry's-inode__I_LOCK

ll_ost_io_15
ost_brw_write
--filter_commitrw_write
----fsfilt_ext3_commit_wait
------autoremove_wake_function
-------fsfilt_log_wait_commit=-jbd2_log_wait_commit

We think that is not neccessarily the problem of Lustre codes. And we found a nearly merged patch which fixes a similar deadlock problem in __jbd2_log_wait_for_space(). Maybe it is the root cause?

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/fs/jbd2/checkpoint.c?id=0ef54180e0187117062939202b96faf04c8673bc



 Comments   
Comment by Li Xi (Inactive) [ 11/Oct/13 ]

The debug messages are uploaded here.
ftp://ftp.whamcloud.com/uploads/LU-4090/

Comment by Peter Jones [ 11/Oct/13 ]

Bobijam

Could you please comment on this one?

Thanks

Peter

Comment by Zhenyu Xu [ 24/Oct/13 ]

yes, it is most likely the cause. I'll try to get a patch for it.

Comment by Li Xi (Inactive) [ 24/Oct/13 ]

Zhenyu,
Thank you very much for the help!

Comment by Zhenyu Xu [ 24/Oct/13 ]

master patch http://review.whamcloud.com/8059
b1_8 patch http://review.whamcloud.com/8060

Comment by Li Xi (Inactive) [ 13/Jun/14 ]

We've seen this problem in three different systems. And there is no way to recover from it without rebooting the system. So it is a critical problem...

Comment by Shuichi Ihara (Inactive) [ 10/Jul/14 ]

This is really repeatable problem.
New patches for included SLES11 and SP3 as well as RHEL6. http://review.whamcloud.com/#/c/11041/

Comment by James A Simmons [ 10/Jul/14 ]

Another patch seems to be floating around - http://review.whamcloud.com/#/c/8059. We need to pick one of them.

Comment by Bob Glossman (Inactive) [ 13/Aug/14 ]

I think http://review.whamcloud.com/#/c/11041 is the better mod as it covers both el6 and sles.

Comment by Peter Jones [ 18/Aug/14 ]

Landed for 2.7

Comment by Jian Yu [ 18/Aug/14 ]

Here is the back-ported patch for Lustre b2_5 branch: http://review.whamcloud.com/11492

Comment by Wang Shilong (Inactive) [ 26/Mar/15 ]

We still hit the issue, with LU-4020 patch applied, see attachment.

Comment by Zhenyu Xu [ 26/Mar/15 ]

You need to make sure that your server build does use lustre/kernel_patches/series/2.6-rhel6.series to patch the server kernel code to make the patch effect (since it's a kernel patch).

You can check the patched server code fs/jbd2/checkpoint.c, find whether following code is there

@@ -156,7 +156,15 @@ void __jbd2_log_wait_for_space(journal_t
/* We were able to recover space; yay! */
;
} else if (tid) {
+ /*
+ * jbd2_journal_commit_transaction() may want
+ * to take the checkpoint_mutex if JBD2_FLUSHED
+ * is set. So we need to temporarily drop it.
+ */
+ mutex_unlock(&journal->j_checkpoint_mutex);
jbd2_log_wait_commit(journal, tid);
+ spin_lock(&journal->j_state_lock);
+ continue;

Comment by Peter Jones [ 26/Mar/15 ]

Bobijam

Ihara confirms that this patch has been in place and the issue has still hit

Peter

Comment by Shuichi Ihara (Inactive) [ 27/Mar/15 ]

Bobijam, we confirmed kernel includes the follwoing codes, but hit same issue. pleaes check https://jira.hpdd.intel.com/secure/attachment/17384/messages.ALPL402.txt again, please.

Index: linux-2.6.18-348.3.1.el5-b18/fs/jbd2/checkpoint.c
===================================================================
--- linux-2.6.18-348.3.1.el5-b18.orig/fs/jbd2/checkpoint.c
+++ linux-2.6.18-348.3.1.el5-b18/fs/jbd2/checkpoint.c
@@ -155,7 +155,15 @@ void __jbd2_log_wait_for_space(journal_t
 				/* We were able to recover space; yay! */
 				;
 			} else if (tid) {
+				/*
+				 * jbd2_journal_commit_transaction() may want
+				 * to take the checkpoint_mutex if JBD2_FLUSHED
+				 * is set.  So we need to temporarily drop it.
+				 */
+				mutex_unlock(&journal->j_checkpoint_mutex);
 				jbd2_log_wait_commit(journal, tid);
+				spin_lock(&journal->j_state_lock);
+				continue;
 			} else {
 				printk(KERN_ERR "%s: needed %d blocks and "
 				       "only had %d space available\n",
Comment by Zhenyu Xu [ 02/Apr/15 ]

Hi Ihara,

From the messages.ALPL402.txt, I saw majorly two backtrace places, __jbd2_log_wait_for_space+0x5d and jbd2_log_wait_commit+0xa3, can you figure out what the code would be? As I've done the lookup on my test system

[12:04 PM]root@test1:~/work/kernel/linux-2.6.32-431.29.2.el6-master/
# gdb fs/jbd2/jbd2.ko
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-75.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /root/work/kernel/linux-2.6.32-431.29.2.el6-master/fs/jbd2/jbd2.ko...done.
(gdb) l *jbd2_log_wait_commit+0xa3
0x85c3 is in jbd2_log_wait_commit (fs/jbd2/journal.c:641).
636		while (tid_gt(tid, journal->j_commit_sequence)) {
637			jbd_debug(1, "JBD: want %d, j_commit_sequence=%d\n",
638					  tid, journal->j_commit_sequence);
639			wake_up(&journal->j_wait_commit);
640			spin_unlock(&journal->j_state_lock);
641			wait_event(journal->j_wait_done_commit,
642					!tid_gt(tid, journal->j_commit_sequence));
643			spin_lock(&journal->j_state_lock);
644		}
645		spin_unlock(&journal->j_state_lock);

(gdb) l *__jbd2_log_wait_for_space+0x5d
0x548d is in __jbd2_log_wait_for_space (fs/jbd2/checkpoint.c:172).
167					continue;
168				} else {
169					printk(KERN_ERR "%s: needed %d blocks and "
170					       "only had %d space available\n",
171					       __func__, nblocks, space_left);
172					printk(KERN_ERR "%s: no way to get more "
173					       "journal space in %s\n", __func__,
174					       journal->j_devname);
175					WARN_ON(1);
176					jbd2_journal_abort(journal, 0);
(gdb) 

The jbd2_log_wait_commit+0xa3 makes sense to, and it is waiting for the commit to finish. While other most processes hang in __jbd2_log_wait_for_space+0x5d does not look right, I want to know what code line would appear on your system.

Comment by Li Xi (Inactive) [ 08/Apr/15 ]

I am not sure, maybe following patch fixes a bug similar with our problem?

ext4/jbd2: don't wait (forever) for stale tid caused by wraparound

Tid wraparound is possible to happen on Lustre. And we saw it happened for multiple times in some systems before.

Comment by Zhenyu Xu [ 08/Apr/15 ]

So the system still uses 1.8 version?

Comment by Li Xi (Inactive) [ 08/Apr/15 ]

Zhengyu, yes, it is.

Comment by Zhenyu Xu [ 08/Apr/15 ]

I think what you described could be the reason some processes hang waiting for transaction commit, and would you give this patch a try? It changes fsfilt_ext3_commit_async() to avoid call jbd2_log_start_commit() with a stale tid.

Comment by Wang Shilong (Inactive) [ 18/May/15 ]

Hi,

After applying following patch:
https://jira.hpdd.intel.com/secure/attachment/17458/0001-LU-4090-fsfilt-don-t-wait-forever-for-stale-tid.patch

We still hit this issue, please check latest uploaded logs(xxx20150518.txt). And i am wondering whether following thread is
related to this bug?

https://lkml.org/lkml/2012/7/11/255

Comment by Zhenyu Xu [ 18/May/15 ]

What linux distro does the OST use?

Comment by Wang Shilong (Inactive) [ 18/May/15 ]

Hi,

It is Rhel5 and kernel version is 2.6.18-348.3.1-xxx

Regards,

Comment by Zhenyu Xu [ 20/May/15 ]

found such backtrace in ALPL202.messages_20150518.txt

May 17 11:57:05 ALPL202 kernel: INFO: task ll_ost_io_144:20839 blocked for more than 120 seconds.
May 17 11:57:05 ALPL202 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 17 11:57:05 ALPL202 kernel: ll_ost_io_144 D ffff810c3e08f7a0     0 20839      1         20840 20838 (L-TLB)
May 17 11:57:05 ALPL202 kernel:  ffff810911f63800 0000000000000046 ffff81034d1e1460 ffff8105ee5276c0
May 17 11:57:05 ALPL202 kernel:  ffff8105ee5276c0 000000000000000a ffff8108d1fee7e0 ffff810c3e08f7a0
May 17 11:57:05 ALPL202 kernel:  00129c413d2f65fb 000000000071882c ffff8108d1fee9c8 000000035a5a5a5a
May 17 11:57:05 ALPL202 kernel: Call Trace:
May 17 11:57:05 ALPL202 kernel:  [<ffffffff8002e4d5>] __wake_up+0x38/0x4f
May 17 11:57:05 ALPL202 kernel:  [<ffffffff88aa0840>] :jbd2:jbd2_log_wait_commit+0xa3/0xf5
...

and I checked Linux kernel patch, this patch commit 3469a32a1e948c54204b5dd6f7476a7d11349e9e fits to the symptom

    jbd2: don't hold j_state_lock while calling wake_up()
    
    The j_state_lock is one of the hottest locks in the jbd2 layer and
    thus one of its scalability bottlenecks.
    
    We don't need to be holding the j_state_lock while we are calling
    wake_up(&journal->j_wait_commit), so release the lock a little bit
    earlier.

so I include it in this patch, would you mind give it a try?

Comment by Zhenyu Xu [ 20/May/15 ]

I deleted the old patch and updated it.

Generated at Sat Feb 10 01:39:34 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.