[LU-4090] OST unavailable due to possible deadlock - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: Lustre 2.7.0, Lustre 2.5.3
Affects Version/s: Lustre 1.8.8
Labels:
None

Severity:
3
Rank (Obsolete):
10994

Description

One OST became unavailable ane kept on dumping stack traces until its service is taken over by another OSS. This issue occured a couple of time on different servers.

After some inverstigation, we found that a lot of service theads hang at different places. Here is a list of where they stuck.

ll_ost_01:10226,-ll_ost_07:10232,-ll_ost_09:10234,-ll_ost_11:10236,-ll_ost_13:10238,-ll_ost_15:10240,-ll_ost_18:10243
filter_lvbo_init
--filter_fid2dentry
----filter_parent_lock
------filter_lock_dentry
-------~~LOCK_INODE_MUTEX(dparent~~>d_inode);

ll_ost_06:10231,-ll_ost_16:10241,-ll_ost_484,-ll_ost_io_129,-ll_ost_io_123,-ll_ost_383
fsfilt_ext3_start
--ext3_journal_start
----journal_start
------start_this_handle
----------__jbd2_log_wait_for_space
-----------~~mutex_lock(&journal~~>j_checkpoint_mutex);

ll_ost_17:10242
filter_lvbo_init
--filter_fid2dentry
----filter_parent_lock
----lookup_one_len
------__lookup_hash
-------~~inode~~>i_op->lookup-=-ext4_lookup
----------ext4_iget
------------iget_locked
--------------ifind_fast
----------------find_inode_fast
------------------__wait_on_freeing_inode
-------------------?~~ldiskfs_bread...-Child-dentry's-inode~~__I_LOCK

ll_ost_io_15
ost_brw_write
--filter_commitrw_write
----fsfilt_ext3_commit_wait
------autoremove_wake_function
-------~~fsfilt_log_wait_commit~~=-jbd2_log_wait_commit

We think that is not neccessarily the problem of Lustre codes. And we found a nearly merged patch which fixes a similar deadlock problem in __jbd2_log_wait_for_space(). Maybe it is the root cause?

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/fs/jbd2/checkpoint.c?id=0ef54180e0187117062939202b96faf04c8673bc

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

0001-LU-4090-fsfilt-don-t-wait-forever-for-stale-tid.patch
6 kB
20/May/15 7:58 AM
ALPL202.messages_20150518.txt
307 kB
18/May/15 4:26 AM
messages.ALPL401.txt
144 kB
26/Mar/15 8:07 AM
messages.ALPL402.txt
1.12 MB
26/Mar/15 8:07 AM

Activity

[LU-4090] OST unavailable due to possible deadlock

Zhenyu Xu added a comment - 20/May/15 8:00 AM

I deleted the old patch and updated it.

Zhenyu Xu added a comment - 20/May/15 8:00 AM I deleted the old patch and updated it.

Zhenyu Xu added a comment - 20/May/15 7:58 AM

found such backtrace in ALPL202.messages_20150518.txt

May 17 11:57:05 ALPL202 kernel: INFO: task ll_ost_io_144:20839 blocked for more than 120 seconds.
May 17 11:57:05 ALPL202 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 17 11:57:05 ALPL202 kernel: ll_ost_io_144 D ffff810c3e08f7a0     0 20839      1         20840 20838 (L-TLB)
May 17 11:57:05 ALPL202 kernel:  ffff810911f63800 0000000000000046 ffff81034d1e1460 ffff8105ee5276c0
May 17 11:57:05 ALPL202 kernel:  ffff8105ee5276c0 000000000000000a ffff8108d1fee7e0 ffff810c3e08f7a0
May 17 11:57:05 ALPL202 kernel:  00129c413d2f65fb 000000000071882c ffff8108d1fee9c8 000000035a5a5a5a
May 17 11:57:05 ALPL202 kernel: Call Trace:
May 17 11:57:05 ALPL202 kernel:  [<ffffffff8002e4d5>] __wake_up+0x38/0x4f
May 17 11:57:05 ALPL202 kernel:  [<ffffffff88aa0840>] :jbd2:jbd2_log_wait_commit+0xa3/0xf5
...

and I checked Linux kernel patch, this patch commit 3469a32a1e948c54204b5dd6f7476a7d11349e9e fits to the symptom

    jbd2: don't hold j_state_lock while calling wake_up()
    
    The j_state_lock is one of the hottest locks in the jbd2 layer and
    thus one of its scalability bottlenecks.
    
    We don't need to be holding the j_state_lock while we are calling
    wake_up(&journal->j_wait_commit), so release the lock a little bit
    earlier.

so I include it in this patch, would you mind give it a try?

Zhenyu Xu added a comment - 20/May/15 7:58 AM found such backtrace in ALPL202.messages_20150518.txt May 17 11:57:05 ALPL202 kernel: INFO: task ll_ost_io_144:20839 blocked for more than 120 seconds. May 17 11:57:05 ALPL202 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 17 11:57:05 ALPL202 kernel: ll_ost_io_144 D ffff810c3e08f7a0 0 20839 1 20840 20838 (L-TLB) May 17 11:57:05 ALPL202 kernel: ffff810911f63800 0000000000000046 ffff81034d1e1460 ffff8105ee5276c0 May 17 11:57:05 ALPL202 kernel: ffff8105ee5276c0 000000000000000a ffff8108d1fee7e0 ffff810c3e08f7a0 May 17 11:57:05 ALPL202 kernel: 00129c413d2f65fb 000000000071882c ffff8108d1fee9c8 000000035a5a5a5a May 17 11:57:05 ALPL202 kernel: Call Trace: May 17 11:57:05 ALPL202 kernel: [<ffffffff8002e4d5>] __wake_up+0x38/0x4f May 17 11:57:05 ALPL202 kernel: [<ffffffff88aa0840>] :jbd2:jbd2_log_wait_commit+0xa3/0xf5 ... and I checked Linux kernel patch, this patch commit 3469a32a1e948c54204b5dd6f7476a7d11349e9e fits to the symptom jbd2: don't hold j_state_lock while calling wake_up() The j_state_lock is one of the hottest locks in the jbd2 layer and thus one of its scalability bottlenecks. We don't need to be holding the j_state_lock while we are calling wake_up(&journal->j_wait_commit), so release the lock a little bit earlier. so I include it in this patch, would you mind give it a try?

Wang Shilong (Inactive) added a comment - 18/May/15 11:17 AM

Hi,

It is Rhel5 and kernel version is 2.6.18-348.3.1-xxx

Regards,

Wang Shilong (Inactive) added a comment - 18/May/15 11:17 AM Hi, It is Rhel5 and kernel version is 2.6.18-348.3.1-xxx Regards,

Zhenyu Xu added a comment - 18/May/15 9:02 AM

What linux distro does the OST use?

Zhenyu Xu added a comment - 18/May/15 9:02 AM What linux distro does the OST use?

Wang Shilong (Inactive) added a comment - 18/May/15 4:26 AM

Hi,

After applying following patch:
https://jira.hpdd.intel.com/secure/attachment/17458/0001-LU-4090-fsfilt-don-t-wait-forever-for-stale-tid.patch

We still hit this issue, please check latest uploaded logs(xxx20150518.txt). And i am wondering whether following thread is
related to this bug?

https://lkml.org/lkml/2012/7/11/255

Wang Shilong (Inactive) added a comment - 18/May/15 4:26 AM Hi, After applying following patch: https://jira.hpdd.intel.com/secure/attachment/17458/0001-LU-4090-fsfilt-don-t-wait-forever-for-stale-tid.patch We still hit this issue, please check latest uploaded logs(xxx20150518.txt). And i am wondering whether following thread is related to this bug? https://lkml.org/lkml/2012/7/11/255

Zhenyu Xu added a comment - 08/Apr/15 3:27 PM

I think what you described could be the reason some processes hang waiting for transaction commit, and would you give this patch a try? It changes fsfilt_ext3_commit_async() to avoid call jbd2_log_start_commit() with a stale tid.

Zhenyu Xu added a comment - 08/Apr/15 3:27 PM I think what you described could be the reason some processes hang waiting for transaction commit, and would you give this patch a try? It changes fsfilt_ext3_commit_async() to avoid call jbd2_log_start_commit() with a stale tid.

Li Xi (Inactive) added a comment - 08/Apr/15 3:11 PM

Zhengyu, yes, it is.

Li Xi (Inactive) added a comment - 08/Apr/15 3:11 PM Zhengyu, yes, it is.

Zhenyu Xu added a comment - 08/Apr/15 3:08 PM

So the system still uses 1.8 version?

Zhenyu Xu added a comment - 08/Apr/15 3:08 PM So the system still uses 1.8 version?

Li Xi (Inactive) added a comment - 08/Apr/15 1:55 PM

I am not sure, maybe following patch fixes a bug similar with our problem?

ext4/jbd2: don't wait (forever) for stale tid caused by wraparound

Tid wraparound is possible to happen on Lustre. And we saw it happened for multiple times in some systems before.

Li Xi (Inactive) added a comment - 08/Apr/15 1:55 PM I am not sure, maybe following patch fixes a bug similar with our problem? ext4/jbd2: don't wait (forever) for stale tid caused by wraparound Tid wraparound is possible to happen on Lustre. And we saw it happened for multiple times in some systems before.

Zhenyu Xu added a comment - 02/Apr/15 4:11 AM

Hi Ihara,

From the messages.ALPL402.txt, I saw majorly two backtrace places, __jbd2_log_wait_for_space+0x5d and jbd2_log_wait_commit+0xa3, can you figure out what the code would be? As I've done the lookup on my test system

[12:04 PM]root@test1:~/work/kernel/linux-2.6.32-431.29.2.el6-master/
# gdb fs/jbd2/jbd2.ko
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-75.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /root/work/kernel/linux-2.6.32-431.29.2.el6-master/fs/jbd2/jbd2.ko...done.
(gdb) l *jbd2_log_wait_commit+0xa3
0x85c3 is in jbd2_log_wait_commit (fs/jbd2/journal.c:641).
636		while (tid_gt(tid, journal->j_commit_sequence)) {
637			jbd_debug(1, "JBD: want %d, j_commit_sequence=%d\n",
638					  tid, journal->j_commit_sequence);
639			wake_up(&journal->j_wait_commit);
640			spin_unlock(&journal->j_state_lock);
641			wait_event(journal->j_wait_done_commit,
642					!tid_gt(tid, journal->j_commit_sequence));
643			spin_lock(&journal->j_state_lock);
644		}
645		spin_unlock(&journal->j_state_lock);

(gdb) l *__jbd2_log_wait_for_space+0x5d
0x548d is in __jbd2_log_wait_for_space (fs/jbd2/checkpoint.c:172).
167					continue;
168				} else {
169					printk(KERN_ERR "%s: needed %d blocks and "
170					       "only had %d space available\n",
171					       __func__, nblocks, space_left);
172					printk(KERN_ERR "%s: no way to get more "
173					       "journal space in %s\n", __func__,
174					       journal->j_devname);
175					WARN_ON(1);
176					jbd2_journal_abort(journal, 0);
(gdb)

The jbd2_log_wait_commit+0xa3 makes sense to, and it is waiting for the commit to finish. While other most processes hang in __jbd2_log_wait_for_space+0x5d does not look right, I want to know what code line would appear on your system.

Zhenyu Xu added a comment - 02/Apr/15 4:11 AM Hi Ihara, From the messages.ALPL402.txt, I saw majorly two backtrace places, __jbd2_log_wait_for_space+0x5d and jbd2_log_wait_commit+0xa3, can you figure out what the code would be? As I've done the lookup on my test system [12:04 PM]root@test1:~/work/kernel/linux-2.6.32-431.29.2.el6-master/ # gdb fs/jbd2/jbd2.ko GNU gdb (GDB) Red Hat Enterprise Linux (7.2-75.el6) Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /root/work/kernel/linux-2.6.32-431.29.2.el6-master/fs/jbd2/jbd2.ko...done. (gdb) l *jbd2_log_wait_commit+0xa3 0x85c3 is in jbd2_log_wait_commit (fs/jbd2/journal.c:641). 636 while (tid_gt(tid, journal->j_commit_sequence)) { 637 jbd_debug(1, "JBD: want %d, j_commit_sequence=%d\n", 638 tid, journal->j_commit_sequence); 639 wake_up(&journal->j_wait_commit); 640 spin_unlock(&journal->j_state_lock); 641 wait_event(journal->j_wait_done_commit, 642 !tid_gt(tid, journal->j_commit_sequence)); 643 spin_lock(&journal->j_state_lock); 644 } 645 spin_unlock(&journal->j_state_lock); (gdb) l *__jbd2_log_wait_for_space+0x5d 0x548d is in __jbd2_log_wait_for_space (fs/jbd2/checkpoint.c:172). 167 continue; 168 } else { 169 printk(KERN_ERR "%s: needed %d blocks and " 170 "only had %d space available\n", 171 __func__, nblocks, space_left); 172 printk(KERN_ERR "%s: no way to get more " 173 "journal space in %s\n", __func__, 174 journal->j_devname); 175 WARN_ON(1); 176 jbd2_journal_abort(journal, 0); (gdb) The jbd2_log_wait_commit+0xa3 makes sense to, and it is waiting for the commit to finish. While other most processes hang in __jbd2_log_wait_for_space+0x5d does not look right, I want to know what code line would appear on your system.

People

Assignee:: Zhenyu Xu

Reporter:: Li Xi (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 11/Oct/13 7:25 AM

Updated:: 07/Jun/16 3:38 PM