Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4725

wrong lock ordering in rename leads to deadlocks

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.6.0, Lustre 2.5.4
    • Lustre 2.1.6, Lustre 2.6.0, Lustre 2.5.2, Lustre 2.4.3
    • 3
    • 12984

    Description

      the current rename code locks objects in the order: src parent, dst parent, src child, dst child. it may happen that dst is a parent of src, what may lead to deadlock.

      example from a core dump:
      res1 - dst parent
      res2 - dst parent, PDO
      res3 - src parent

      Thread 1 (T1), rename:
      Has RES3 (CW,0x2)
      Wants RES1 (CW,0x2)

      Thread 2 (T2), getattr:
      Has RES1(CR,0x2)
      Has RES2(PR,0x2)
      Wants RES3(PR,0x2) - blocked by T1

      Thread 3 (T3), create or open|create
      Has RES1(CW,0x2)
      Wants RES2(PW,0x2) - blocked by T2

      Thread4 (T4), getattr or similar
      Wants RES1(PR,0x2) - blocked by T3

      T1 has no conflicts, but is sitting in the waiting queue behind T4, thus not granted.

      Attachments

        Issue Links

          Activity

            [LU-4725] wrong lock ordering in rename leads to deadlocks
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-16589 [ LU-16589 ]
            pjones Peter Jones made changes -
            Fix Version/s New: Lustre 2.5.4 [ 11190 ]
            Labels Original: mq414 patch New: patch
            pjones Peter Jones made changes -
            Labels Original: mq314 patch New: mq414 patch
            yujian Jian Yu added a comment -

            Just combined the above two patches into one: http://review.whamcloud.com/11615

            yujian Jian Yu added a comment - Just combined the above two patches into one: http://review.whamcloud.com/11615
            pjones Peter Jones made changes -
            Labels Original: patch New: mq314 patch
            pjones Peter Jones made changes -
            Link New: This issue is related to LU-5514 [ LU-5514 ]

            The result for first patch shows two tests, this because first patch is not complete.
            All 19 tests pass for second patch. (I was not sure where to mention it in gerrit, so updating here).

            http://review.whamcloud.com/#/c/10916/
            http://review.whamcloud.com/#/c/10917/

            rdeshmukh_xyratex Rahul Deshmukh (Inactive) added a comment - The result for first patch shows two tests, this because first patch is not complete. All 19 tests pass for second patch. (I was not sure where to mention it in gerrit, so updating here). http://review.whamcloud.com/#/c/10916/ http://review.whamcloud.com/#/c/10917/

            I have back ported LU-4725 patches to b2_5 and here are the links:

            http://review.whamcloud.com/#/c/10916/
            http://review.whamcloud.com/#/c/10917/

            Not sure if I need to create new bug for this? Please help.

            rdeshmukh_xyratex Rahul Deshmukh (Inactive) added a comment - I have back ported LU-4725 patches to b2_5 and here are the links: http://review.whamcloud.com/#/c/10916/ http://review.whamcloud.com/#/c/10917/ Not sure if I need to create new bug for this? Please help.
            jlevi Jodi Levi (Inactive) made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: Reopened [ 4 ] New: Resolved [ 5 ]

            Patches landed to Master. Closing this ticket and LU-5144 as the patch moved to that ticket was also landed.

            jlevi Jodi Levi (Inactive) added a comment - Patches landed to Master. Closing this ticket and LU-5144 as the patch moved to that ticket was also landed.

            People

              hongchao.zhang Hongchao Zhang
              vitaly_fertman Vitaly Fertman
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: