Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-81

Some JBD2 journaling deadlock at BULL

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.2.0, Lustre 2.1.2
    • Lustre 2.0.0
    • None
    • 2
    • 24,438
    • 4793

    Description

      BULL reports at the bugzilla that there are some possible deadlock issues on MDS with jbd2 (just run away transactions?):

      At CEA, they have encountered several occurrences of the same scenario where all Lustre activity is
      hung. Each time they live-debug the problem, they end-up on the MDS node where all Lustre
      operations appear to be frozen.

      As a consequence, MDS has to be rebooted and Lustre layer has to be restarted on it with recovery.

      The MDS threads which appear to be strongly involved in the frozen situation have the following
      stack traces, taken from one of the forced crash-dumps:
      ==================================

      There are about 234 tasks with the same following stack:

      PID 5250 mdt_rdpg_143
      schedule()
      start_this_handle()
      jbd2_journal_start()
      ldiskfs_journal_start_sb()
      osd_trans_start()
      mdd_trans_start()
      cml_close()

      One is with:

      Pid: 4990 mdt_395
      schedule()
      jbd2_log_wait_commit()
      jbd2_journal_stop()
      __ldiskfs_journal_stop()
      osd_trans_stop()
      mdd_trans_stop()
      mdd_attr_set()
      cml_attr_set()

      And another with:

      Pid: 4534 "jbd2/sdd-8"
      schedule()
      jbd2_journal_commit_transaction()
      kjournald2()
      kthread()
      kernel_thread()

      ==================================

      Analyzing the crash dump shows that the task hung in jbd2_journal_commit_transaction() is in this
      state since a very long time.

      This problem looks like bug 16667, but unfortunately it is not applicable 'as is' as it dates back
      to 1.6. Here it seems there is a race or deadlock in Lustre/JBD2 layers.
      As a workaround the customer deactivated the ChangeLog feature, and since then the problem never
      reoccurred. Sadly ChangeLogs are required by HSM so this workaround cannot last...

      Can you see the reason for this deadlock?

      I have to precise that this bug is critical as it blocks normal cluster operation (ie with HSM).

      Attachments

        Issue Links

          Activity

            [LU-81] Some JBD2 journaling deadlock at BULL

            Bruno - can you post/attach the full set of stack traces for this lockup.

            wpower William Power added a comment - Bruno - can you post/attach the full set of stack traces for this lockup.

            I understand there are strong assumptions that we don't have a definitive fix for this quite un-frequent problem/dead-lock actually ... And BTW, I just got a new occurence of this same scenario, but on an OSS this time running with Lustre 2.1.1 and a Kernel version 2.6.32-131.12.1 which contains the JBD2 patch jbd2-commit-timer-no-jiffies-rounding.diff patch ...

            The involved hung thread's stacks look about the same :
            =======================================================
            PID: 19269 TASK: ffff880470699340 CPU: 1 COMMAND: "ll_ost_io_249"
            #0 [ffff88047069f520] schedule at ffffffff8147bdd9
            0000001 [ffff88047069f5e8] jbd2_log_wait_commit at ffffffffa00867a5
            0000002 [ffff88047069f678] fsfilt_ldiskfs_commit_wait at ffffffffa07bf25e
            0000003 [ffff88047069f6c8] filter_commitrw_write at ffffffffa0a794c9
            0000004 [ffff88047069f908] filter_commitrw at ffffffffa0a6b33d
            0000005 [ffff88047069f9c8] obd_commitrw at ffffffffa069df5a
            0000006 [ffff88047069fa48] ost_brw_write at ffffffffa06a7922
            0000007 [ffff88047069fbf8] ost_handle at ffffffffa06abcd5
            0000008 [ffff88047069fd68] ptlrpc_main at ffffffffa07103e9
            0000009 [ffff88047069ff48] kernel_thread at ffffffff810041aa

            PID: 15704 TASK: ffff88062c52c0c0 CPU: 4 COMMAND: "jbd2/dm-5-8"
            #0 [ffff8804c1467c50] schedule at ffffffff8147bdd9
            0000001 [ffff8804c1467d18] jbd2_journal_commit_transaction at ffffffffa0080970
            0000002 [ffff8804c1467e68] kjournald2 at ffffffffa0086b48
            0000003 [ffff8804c1467ee8] kthread at ffffffff8107ad36
            0000004 [ffff8804c1467f48] kernel_thread at ffffffff810041aa

            and many other like this one

            PID: 15892 TASK: ffff88062c73f4c0 CPU: 4 COMMAND: "ll_ost_io_36"
            #0 [ffff8804bd81b430] schedule at ffffffff8147bdd9
            0000001 [ffff8804bd81b4f8] start_this_handle at ffffffffa007f092
            0000002 [ffff8804bd81b5b8] jbd2_journal_start at ffffffffa007f510
            0000003 [ffff8804bd81b608] ldiskfs_journal_start_sb at ffffffffa0a13758
            0000004 [ffff8804bd81b618] fsfilt_ldiskfs_brw_start at ffffffffa07bf792
            0000005 [ffff8804bd81b6c8] filter_commitrw_write at ffffffffa0a78cb8
            0000006 [ffff8804bd81b908] filter_commitrw at ffffffffa0a6b33d
            0000007 [ffff8804bd81b9c8] obd_commitrw at ffffffffa069df5a
            0000008 [ffff8804bd81ba48] ost_brw_write at ffffffffa06a7922
            0000009 [ffff8804bd81bbf8] ost_handle at ffffffffa06abcd5
            0000010 [ffff8804bd81bd68] ptlrpc_main at ffffffffa07103e9
            0000011 [ffff8804bd81bf48] kernel_thread at ffffffff810041aa
            =======================================================

            But since we are on an OSS this can not be implied from any "llog" activity, can we just consider that we are back on a "pure" JBD2 issue there ???

            bfaccini Bruno Faccini (Inactive) added a comment - I understand there are strong assumptions that we don't have a definitive fix for this quite un-frequent problem/dead-lock actually ... And BTW, I just got a new occurence of this same scenario, but on an OSS this time running with Lustre 2.1.1 and a Kernel version 2.6.32-131.12.1 which contains the JBD2 patch jbd2-commit-timer-no-jiffies-rounding.diff patch ... The involved hung thread's stacks look about the same : ======================================================= PID: 19269 TASK: ffff880470699340 CPU: 1 COMMAND: "ll_ost_io_249" #0 [ffff88047069f520] schedule at ffffffff8147bdd9 0000001 [ffff88047069f5e8] jbd2_log_wait_commit at ffffffffa00867a5 0000002 [ffff88047069f678] fsfilt_ldiskfs_commit_wait at ffffffffa07bf25e 0000003 [ffff88047069f6c8] filter_commitrw_write at ffffffffa0a794c9 0000004 [ffff88047069f908] filter_commitrw at ffffffffa0a6b33d 0000005 [ffff88047069f9c8] obd_commitrw at ffffffffa069df5a 0000006 [ffff88047069fa48] ost_brw_write at ffffffffa06a7922 0000007 [ffff88047069fbf8] ost_handle at ffffffffa06abcd5 0000008 [ffff88047069fd68] ptlrpc_main at ffffffffa07103e9 0000009 [ffff88047069ff48] kernel_thread at ffffffff810041aa PID: 15704 TASK: ffff88062c52c0c0 CPU: 4 COMMAND: "jbd2/dm-5-8" #0 [ffff8804c1467c50] schedule at ffffffff8147bdd9 0000001 [ffff8804c1467d18] jbd2_journal_commit_transaction at ffffffffa0080970 0000002 [ffff8804c1467e68] kjournald2 at ffffffffa0086b48 0000003 [ffff8804c1467ee8] kthread at ffffffff8107ad36 0000004 [ffff8804c1467f48] kernel_thread at ffffffff810041aa and many other like this one PID: 15892 TASK: ffff88062c73f4c0 CPU: 4 COMMAND: "ll_ost_io_36" #0 [ffff8804bd81b430] schedule at ffffffff8147bdd9 0000001 [ffff8804bd81b4f8] start_this_handle at ffffffffa007f092 0000002 [ffff8804bd81b5b8] jbd2_journal_start at ffffffffa007f510 0000003 [ffff8804bd81b608] ldiskfs_journal_start_sb at ffffffffa0a13758 0000004 [ffff8804bd81b618] fsfilt_ldiskfs_brw_start at ffffffffa07bf792 0000005 [ffff8804bd81b6c8] filter_commitrw_write at ffffffffa0a78cb8 0000006 [ffff8804bd81b908] filter_commitrw at ffffffffa0a6b33d 0000007 [ffff8804bd81b9c8] obd_commitrw at ffffffffa069df5a 0000008 [ffff8804bd81ba48] ost_brw_write at ffffffffa06a7922 0000009 [ffff8804bd81bbf8] ost_handle at ffffffffa06abcd5 0000010 [ffff8804bd81bd68] ptlrpc_main at ffffffffa07103e9 0000011 [ffff8804bd81bf48] kernel_thread at ffffffff810041aa ======================================================= But since we are on an OSS this can not be implied from any "llog" activity, can we just consider that we are back on a "pure" JBD2 issue there ???

            So seems that the work-around ("patch which start transaction before catlog locking in
            llog_cat_cancel_records()") described in JIRA LU-81 is not sufficient and we may need a patch to implement
            "the brute-force locking", the alternate solution already described in LU-81 ....

            What do you think ???

            I believe there is a general problem here that is not resolved by simply increasing the journal credits, which really just serves to mask the problem in some cases. We're looking at a case now where cancelling lots of unlink records results in a similar lock inversion caused by the journal restart in the llog updates. The code really needs to be changed so that the lock inversion can't happen.

            nrutman Nathan Rutman added a comment - So seems that the work-around ("patch which start transaction before catlog locking in llog_cat_cancel_records()") described in JIRA LU-81 is not sufficient and we may need a patch to implement "the brute-force locking", the alternate solution already described in LU-81 .... What do you think ??? I believe there is a general problem here that is not resolved by simply increasing the journal credits, which really just serves to mask the problem in some cases. We're looking at a case now where cancelling lots of unlink records results in a similar lock inversion caused by the journal restart in the llog updates. The code really needs to be changed so that the lock inversion can't happen.

            Integrated in lustre-b2_1 » i686,client,el5,inkernel #41
            LU-81 deadlock of changelog adding vs. changelog cancelling (Revision d68d301d065296d2769ea2274bff75b21a98f9b6)

            Result = SUCCESS
            Oleg Drokin : d68d301d065296d2769ea2274bff75b21a98f9b6
            Files :

            • lustre/mds/mds_log.c
            • lustre/mdd/mdd_device.c
            hudson Build Master (Inactive) added a comment - Integrated in lustre-b2_1 » i686,client,el5,inkernel #41 LU-81 deadlock of changelog adding vs. changelog cancelling (Revision d68d301d065296d2769ea2274bff75b21a98f9b6) Result = SUCCESS Oleg Drokin : d68d301d065296d2769ea2274bff75b21a98f9b6 Files : lustre/mds/mds_log.c lustre/mdd/mdd_device.c

            Integrated in lustre-b2_1 » x86_64,client,el5,ofa #41
            LU-81 deadlock of changelog adding vs. changelog cancelling (Revision d68d301d065296d2769ea2274bff75b21a98f9b6)

            Result = SUCCESS
            Oleg Drokin : d68d301d065296d2769ea2274bff75b21a98f9b6
            Files :

            • lustre/mds/mds_log.c
            • lustre/mdd/mdd_device.c
            hudson Build Master (Inactive) added a comment - Integrated in lustre-b2_1 » x86_64,client,el5,ofa #41 LU-81 deadlock of changelog adding vs. changelog cancelling (Revision d68d301d065296d2769ea2274bff75b21a98f9b6) Result = SUCCESS Oleg Drokin : d68d301d065296d2769ea2274bff75b21a98f9b6 Files : lustre/mds/mds_log.c lustre/mdd/mdd_device.c

            Integrated in lustre-b2_1 » i686,server,el5,ofa #41
            LU-81 deadlock of changelog adding vs. changelog cancelling (Revision d68d301d065296d2769ea2274bff75b21a98f9b6)

            Result = SUCCESS
            Oleg Drokin : d68d301d065296d2769ea2274bff75b21a98f9b6
            Files :

            • lustre/mds/mds_log.c
            • lustre/mdd/mdd_device.c
            hudson Build Master (Inactive) added a comment - Integrated in lustre-b2_1 » i686,server,el5,ofa #41 LU-81 deadlock of changelog adding vs. changelog cancelling (Revision d68d301d065296d2769ea2274bff75b21a98f9b6) Result = SUCCESS Oleg Drokin : d68d301d065296d2769ea2274bff75b21a98f9b6 Files : lustre/mds/mds_log.c lustre/mdd/mdd_device.c

            Integrated in lustre-b2_1 » x86_64,server,el5,inkernel #41
            LU-81 deadlock of changelog adding vs. changelog cancelling (Revision d68d301d065296d2769ea2274bff75b21a98f9b6)

            Result = SUCCESS
            Oleg Drokin : d68d301d065296d2769ea2274bff75b21a98f9b6
            Files :

            • lustre/mds/mds_log.c
            • lustre/mdd/mdd_device.c
            hudson Build Master (Inactive) added a comment - Integrated in lustre-b2_1 » x86_64,server,el5,inkernel #41 LU-81 deadlock of changelog adding vs. changelog cancelling (Revision d68d301d065296d2769ea2274bff75b21a98f9b6) Result = SUCCESS Oleg Drokin : d68d301d065296d2769ea2274bff75b21a98f9b6 Files : lustre/mds/mds_log.c lustre/mdd/mdd_device.c

            Integrated in lustre-b2_1 » i686,server,el5,inkernel #41
            LU-81 deadlock of changelog adding vs. changelog cancelling (Revision d68d301d065296d2769ea2274bff75b21a98f9b6)

            Result = SUCCESS
            Oleg Drokin : d68d301d065296d2769ea2274bff75b21a98f9b6
            Files :

            • lustre/mds/mds_log.c
            • lustre/mdd/mdd_device.c
            hudson Build Master (Inactive) added a comment - Integrated in lustre-b2_1 » i686,server,el5,inkernel #41 LU-81 deadlock of changelog adding vs. changelog cancelling (Revision d68d301d065296d2769ea2274bff75b21a98f9b6) Result = SUCCESS Oleg Drokin : d68d301d065296d2769ea2274bff75b21a98f9b6 Files : lustre/mds/mds_log.c lustre/mdd/mdd_device.c

            Integrated in lustre-b2_1 » x86_64,client,el5,inkernel #41
            LU-81 deadlock of changelog adding vs. changelog cancelling (Revision d68d301d065296d2769ea2274bff75b21a98f9b6)

            Result = SUCCESS
            Oleg Drokin : d68d301d065296d2769ea2274bff75b21a98f9b6
            Files :

            • lustre/mdd/mdd_device.c
            • lustre/mds/mds_log.c
            hudson Build Master (Inactive) added a comment - Integrated in lustre-b2_1 » x86_64,client,el5,inkernel #41 LU-81 deadlock of changelog adding vs. changelog cancelling (Revision d68d301d065296d2769ea2274bff75b21a98f9b6) Result = SUCCESS Oleg Drokin : d68d301d065296d2769ea2274bff75b21a98f9b6 Files : lustre/mdd/mdd_device.c lustre/mds/mds_log.c

            Integrated in lustre-b2_1 » i686,server,el6,inkernel #41
            LU-81 deadlock of changelog adding vs. changelog cancelling (Revision d68d301d065296d2769ea2274bff75b21a98f9b6)

            Result = SUCCESS
            Oleg Drokin : d68d301d065296d2769ea2274bff75b21a98f9b6
            Files :

            • lustre/mds/mds_log.c
            • lustre/mdd/mdd_device.c
            hudson Build Master (Inactive) added a comment - Integrated in lustre-b2_1 » i686,server,el6,inkernel #41 LU-81 deadlock of changelog adding vs. changelog cancelling (Revision d68d301d065296d2769ea2274bff75b21a98f9b6) Result = SUCCESS Oleg Drokin : d68d301d065296d2769ea2274bff75b21a98f9b6 Files : lustre/mds/mds_log.c lustre/mdd/mdd_device.c

            People

              niu Niu Yawei (Inactive)
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: