Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4725

wrong lock ordering in rename leads to deadlocks

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Blocker Blocker
    • Lustre 2.6.0, Lustre 2.5.4
    • Lustre 2.1.6, Lustre 2.6.0, Lustre 2.5.2, Lustre 2.4.3
    • 3
    • 12984

      the current rename code locks objects in the order: src parent, dst parent, src child, dst child. it may happen that dst is a parent of src, what may lead to deadlock.

      example from a core dump:
      res1 - dst parent
      res2 - dst parent, PDO
      res3 - src parent

      Thread 1 (T1), rename:
      Has RES3 (CW,0x2)
      Wants RES1 (CW,0x2)

      Thread 2 (T2), getattr:
      Has RES1(CR,0x2)
      Has RES2(PR,0x2)
      Wants RES3(PR,0x2) - blocked by T1

      Thread 3 (T3), create or open|create
      Has RES1(CW,0x2)
      Wants RES2(PW,0x2) - blocked by T2

      Thread4 (T4), getattr or similar
      Wants RES1(PR,0x2) - blocked by T3

      T1 has no conflicts, but is sitting in the waiting queue behind T4, thus not granted.

            hongchao.zhang Hongchao Zhang
            vitaly_fertman Vitaly Fertman
            Votes:
            0 Vote for this issue
            Watchers:
            15 Start watching this issue

              Created:
              Updated:
              Resolved: