Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11836

DOM read-open resend vs getattr deadlock



    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.13.0
    • None
    • 3
    • 9223372036854775807


      DOM read-on-open may cause resend when reply buffer is larger then client buffer, that is OK in general, client just re-allocate buffer and resend request. The problem occurs when between first reply and resend the new request on the same file is arrived, e.g. getattr.
      Whole scenario in that case:
      1. OPEN takes PARENT WRITE lock and new CHILD PR/PW lock
      2. The CHILD lock on server gets PARENT handle from the client as remote handle (resource change)
      3. Due to resend condition in reply_in_callback() the client didn't finish that resource replacement, so that lock handle is still PARENT lock handle, while it is CHILD one on server
      4. Getattr on server locks the CHILD and cause BL AST to PR/PW lock from OPEN
      5. client gets BL AST but lock handle refers to PARENT lock, so CHILD lock on server will never receive cancel from that BL AST
      6. Meanwhile OPEN resend is arrived on server and try to get WRITE lock on PARENT but it is blocked by getattr process waiting for CHILD cancel, so OPEN resend is waiting on PARENT lock and cannot complete OPEN to send reply with blocked CHILD lock. Deadlock.

      That specific combination exists only with DOM files (PR/PW modes causes conflicts with getattr) and only with read-on-open feature because it produces resent without reconnect.


        Issue Links



              tappro Mikhail Pershin
              tappro Mikhail Pershin
              0 Vote for this issue
              6 Start watching this issue