Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4090

OST unavailable due to possible deadlock

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • Lustre 2.7.0, Lustre 2.5.3
    • Lustre 1.8.8
    • None
    • 3
    • 10994

    Description

      One OST became unavailable ane kept on dumping stack traces until its service is taken over by another OSS. This issue occured a couple of time on different servers.

      After some inverstigation, we found that a lot of service theads hang at different places. Here is a list of where they stuck.

      ll_ost_01:10226,-ll_ost_07:10232,-ll_ost_09:10234,-ll_ost_11:10236,-ll_ost_13:10238,-ll_ost_15:10240,-ll_ost_18:10243
      filter_lvbo_init
      --filter_fid2dentry
      ----filter_parent_lock
      ------filter_lock_dentry
      -------LOCK_INODE_MUTEX(dparent>d_inode);

      ll_ost_06:10231,-ll_ost_16:10241,-ll_ost_484,-ll_ost_io_129,-ll_ost_io_123,-ll_ost_383
      fsfilt_ext3_start
      --ext3_journal_start
      ----journal_start
      ------start_this_handle
      ----------__jbd2_log_wait_for_space
      -----------mutex_lock(&journal>j_checkpoint_mutex);

      ll_ost_17:10242
      filter_lvbo_init
      --filter_fid2dentry
      ----filter_parent_lock
      ----lookup_one_len
      ------__lookup_hash
      -------inode>i_op->lookup-=-ext4_lookup
      ----------ext4_iget
      ------------iget_locked
      --------------ifind_fast
      ----------------find_inode_fast
      ------------------__wait_on_freeing_inode
      -------------------?ldiskfs_bread...-Child-dentry's-inode__I_LOCK

      ll_ost_io_15
      ost_brw_write
      --filter_commitrw_write
      ----fsfilt_ext3_commit_wait
      ------autoremove_wake_function
      -------fsfilt_log_wait_commit=-jbd2_log_wait_commit

      We think that is not neccessarily the problem of Lustre codes. And we found a nearly merged patch which fixes a similar deadlock problem in __jbd2_log_wait_for_space(). Maybe it is the root cause?

      https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/fs/jbd2/checkpoint.c?id=0ef54180e0187117062939202b96faf04c8673bc

      Attachments

        Activity

          [LU-4090] OST unavailable due to possible deadlock
          bobijam Zhenyu Xu added a comment -

          I deleted the old patch and updated it.

          bobijam Zhenyu Xu added a comment - I deleted the old patch and updated it.
          bobijam Zhenyu Xu added a comment -

          found such backtrace in ALPL202.messages_20150518.txt

          May 17 11:57:05 ALPL202 kernel: INFO: task ll_ost_io_144:20839 blocked for more than 120 seconds.
          May 17 11:57:05 ALPL202 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
          May 17 11:57:05 ALPL202 kernel: ll_ost_io_144 D ffff810c3e08f7a0     0 20839      1         20840 20838 (L-TLB)
          May 17 11:57:05 ALPL202 kernel:  ffff810911f63800 0000000000000046 ffff81034d1e1460 ffff8105ee5276c0
          May 17 11:57:05 ALPL202 kernel:  ffff8105ee5276c0 000000000000000a ffff8108d1fee7e0 ffff810c3e08f7a0
          May 17 11:57:05 ALPL202 kernel:  00129c413d2f65fb 000000000071882c ffff8108d1fee9c8 000000035a5a5a5a
          May 17 11:57:05 ALPL202 kernel: Call Trace:
          May 17 11:57:05 ALPL202 kernel:  [<ffffffff8002e4d5>] __wake_up+0x38/0x4f
          May 17 11:57:05 ALPL202 kernel:  [<ffffffff88aa0840>] :jbd2:jbd2_log_wait_commit+0xa3/0xf5
          ...
          

          and I checked Linux kernel patch, this patch commit 3469a32a1e948c54204b5dd6f7476a7d11349e9e fits to the symptom

              jbd2: don't hold j_state_lock while calling wake_up()
              
              The j_state_lock is one of the hottest locks in the jbd2 layer and
              thus one of its scalability bottlenecks.
              
              We don't need to be holding the j_state_lock while we are calling
              wake_up(&journal->j_wait_commit), so release the lock a little bit
              earlier.
          

          so I include it in this patch, would you mind give it a try?

          bobijam Zhenyu Xu added a comment - found such backtrace in ALPL202.messages_20150518.txt May 17 11:57:05 ALPL202 kernel: INFO: task ll_ost_io_144:20839 blocked for more than 120 seconds. May 17 11:57:05 ALPL202 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 17 11:57:05 ALPL202 kernel: ll_ost_io_144 D ffff810c3e08f7a0 0 20839 1 20840 20838 (L-TLB) May 17 11:57:05 ALPL202 kernel: ffff810911f63800 0000000000000046 ffff81034d1e1460 ffff8105ee5276c0 May 17 11:57:05 ALPL202 kernel: ffff8105ee5276c0 000000000000000a ffff8108d1fee7e0 ffff810c3e08f7a0 May 17 11:57:05 ALPL202 kernel: 00129c413d2f65fb 000000000071882c ffff8108d1fee9c8 000000035a5a5a5a May 17 11:57:05 ALPL202 kernel: Call Trace: May 17 11:57:05 ALPL202 kernel: [<ffffffff8002e4d5>] __wake_up+0x38/0x4f May 17 11:57:05 ALPL202 kernel: [<ffffffff88aa0840>] :jbd2:jbd2_log_wait_commit+0xa3/0xf5 ... and I checked Linux kernel patch, this patch commit 3469a32a1e948c54204b5dd6f7476a7d11349e9e fits to the symptom jbd2: don't hold j_state_lock while calling wake_up() The j_state_lock is one of the hottest locks in the jbd2 layer and thus one of its scalability bottlenecks. We don't need to be holding the j_state_lock while we are calling wake_up(&journal->j_wait_commit), so release the lock a little bit earlier. so I include it in this patch, would you mind give it a try?

          Hi,

          It is Rhel5 and kernel version is 2.6.18-348.3.1-xxx

          Regards,

          wangshilong Wang Shilong (Inactive) added a comment - Hi, It is Rhel5 and kernel version is 2.6.18-348.3.1-xxx Regards,
          bobijam Zhenyu Xu added a comment -

          What linux distro does the OST use?

          bobijam Zhenyu Xu added a comment - What linux distro does the OST use?

          Hi,

          After applying following patch:
          https://jira.hpdd.intel.com/secure/attachment/17458/0001-LU-4090-fsfilt-don-t-wait-forever-for-stale-tid.patch

          We still hit this issue, please check latest uploaded logs(xxx20150518.txt). And i am wondering whether following thread is
          related to this bug?

          https://lkml.org/lkml/2012/7/11/255

          wangshilong Wang Shilong (Inactive) added a comment - Hi, After applying following patch: https://jira.hpdd.intel.com/secure/attachment/17458/0001-LU-4090-fsfilt-don-t-wait-forever-for-stale-tid.patch We still hit this issue, please check latest uploaded logs(xxx20150518.txt). And i am wondering whether following thread is related to this bug? https://lkml.org/lkml/2012/7/11/255
          bobijam Zhenyu Xu added a comment -

          I think what you described could be the reason some processes hang waiting for transaction commit, and would you give this patch a try? It changes fsfilt_ext3_commit_async() to avoid call jbd2_log_start_commit() with a stale tid.

          bobijam Zhenyu Xu added a comment - I think what you described could be the reason some processes hang waiting for transaction commit, and would you give this patch a try? It changes fsfilt_ext3_commit_async() to avoid call jbd2_log_start_commit() with a stale tid.

          Zhengyu, yes, it is.

          lixi Li Xi (Inactive) added a comment - Zhengyu, yes, it is.
          bobijam Zhenyu Xu added a comment -

          So the system still uses 1.8 version?

          bobijam Zhenyu Xu added a comment - So the system still uses 1.8 version?

          I am not sure, maybe following patch fixes a bug similar with our problem?

          ext4/jbd2: don't wait (forever) for stale tid caused by wraparound

          Tid wraparound is possible to happen on Lustre. And we saw it happened for multiple times in some systems before.

          lixi Li Xi (Inactive) added a comment - I am not sure, maybe following patch fixes a bug similar with our problem? ext4/jbd2: don't wait (forever) for stale tid caused by wraparound Tid wraparound is possible to happen on Lustre. And we saw it happened for multiple times in some systems before.
          bobijam Zhenyu Xu added a comment -

          Hi Ihara,

          From the messages.ALPL402.txt, I saw majorly two backtrace places, __jbd2_log_wait_for_space+0x5d and jbd2_log_wait_commit+0xa3, can you figure out what the code would be? As I've done the lookup on my test system

          [12:04 PM]root@test1:~/work/kernel/linux-2.6.32-431.29.2.el6-master/
          # gdb fs/jbd2/jbd2.ko
          GNU gdb (GDB) Red Hat Enterprise Linux (7.2-75.el6)
          Copyright (C) 2010 Free Software Foundation, Inc.
          License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
          This is free software: you are free to change and redistribute it.
          There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
          and "show warranty" for details.
          This GDB was configured as "x86_64-redhat-linux-gnu".
          For bug reporting instructions, please see:
          <http://www.gnu.org/software/gdb/bugs/>...
          Reading symbols from /root/work/kernel/linux-2.6.32-431.29.2.el6-master/fs/jbd2/jbd2.ko...done.
          (gdb) l *jbd2_log_wait_commit+0xa3
          0x85c3 is in jbd2_log_wait_commit (fs/jbd2/journal.c:641).
          636		while (tid_gt(tid, journal->j_commit_sequence)) {
          637			jbd_debug(1, "JBD: want %d, j_commit_sequence=%d\n",
          638					  tid, journal->j_commit_sequence);
          639			wake_up(&journal->j_wait_commit);
          640			spin_unlock(&journal->j_state_lock);
          641			wait_event(journal->j_wait_done_commit,
          642					!tid_gt(tid, journal->j_commit_sequence));
          643			spin_lock(&journal->j_state_lock);
          644		}
          645		spin_unlock(&journal->j_state_lock);
          
          (gdb) l *__jbd2_log_wait_for_space+0x5d
          0x548d is in __jbd2_log_wait_for_space (fs/jbd2/checkpoint.c:172).
          167					continue;
          168				} else {
          169					printk(KERN_ERR "%s: needed %d blocks and "
          170					       "only had %d space available\n",
          171					       __func__, nblocks, space_left);
          172					printk(KERN_ERR "%s: no way to get more "
          173					       "journal space in %s\n", __func__,
          174					       journal->j_devname);
          175					WARN_ON(1);
          176					jbd2_journal_abort(journal, 0);
          (gdb) 
          

          The jbd2_log_wait_commit+0xa3 makes sense to, and it is waiting for the commit to finish. While other most processes hang in __jbd2_log_wait_for_space+0x5d does not look right, I want to know what code line would appear on your system.

          bobijam Zhenyu Xu added a comment - Hi Ihara, From the messages.ALPL402.txt, I saw majorly two backtrace places, __jbd2_log_wait_for_space+0x5d and jbd2_log_wait_commit+0xa3, can you figure out what the code would be? As I've done the lookup on my test system [12:04 PM]root@test1:~/work/kernel/linux-2.6.32-431.29.2.el6-master/ # gdb fs/jbd2/jbd2.ko GNU gdb (GDB) Red Hat Enterprise Linux (7.2-75.el6) Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /root/work/kernel/linux-2.6.32-431.29.2.el6-master/fs/jbd2/jbd2.ko...done. (gdb) l *jbd2_log_wait_commit+0xa3 0x85c3 is in jbd2_log_wait_commit (fs/jbd2/journal.c:641). 636 while (tid_gt(tid, journal->j_commit_sequence)) { 637 jbd_debug(1, "JBD: want %d, j_commit_sequence=%d\n", 638 tid, journal->j_commit_sequence); 639 wake_up(&journal->j_wait_commit); 640 spin_unlock(&journal->j_state_lock); 641 wait_event(journal->j_wait_done_commit, 642 !tid_gt(tid, journal->j_commit_sequence)); 643 spin_lock(&journal->j_state_lock); 644 } 645 spin_unlock(&journal->j_state_lock); (gdb) l *__jbd2_log_wait_for_space+0x5d 0x548d is in __jbd2_log_wait_for_space (fs/jbd2/checkpoint.c:172). 167 continue; 168 } else { 169 printk(KERN_ERR "%s: needed %d blocks and " 170 "only had %d space available\n", 171 __func__, nblocks, space_left); 172 printk(KERN_ERR "%s: no way to get more " 173 "journal space in %s\n", __func__, 174 journal->j_devname); 175 WARN_ON(1); 176 jbd2_journal_abort(journal, 0); (gdb) The jbd2_log_wait_commit+0xa3 makes sense to, and it is waiting for the commit to finish. While other most processes hang in __jbd2_log_wait_for_space+0x5d does not look right, I want to know what code line would appear on your system.

          People

            bobijam Zhenyu Xu
            lixi Li Xi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated: