Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5652

client eviction if lock enqueue reply is lost

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • 3
    • 15838

    Description

      A client will be evicted in this case:

      • ldlm lock is granted while sending lock enqueue reply
      • another thread tries to enqueue conflicting lock which can either set LDLM_FL_AST_SENT for this reply, or send blocking AST.*
      • lock enqueue reply is lost
      • RPC deadline on client side is longer than waiting lock deadline

      If all these happened, this client will be evicted even with AST resend (LU-5520)

      This patch is a workaround, it will guarantee waiting lock deadline is longer than server RPC deadline, which should be close to client side RPC deadline, so client at least has a chance to resend RPC.

      This patch cannot help if there are multiple messages lost, for example, if resent RPC is lost again. Also, if there is huge network latency like router congestion on the path from client to server, then we may still have client eviction even with this patch, because server has no idea about network latency.

      Attachments

        Activity

          [LU-5652] client eviction if lock enqueue reply is lost
          adilger Andreas Dilger made changes -
          Labels Original: mq414 New: mq414 patch
          liang Liang Zhen (Inactive) made changes -
          Labels Original: lu_st mq414 New: mq414
          Summary Original: client eviction is lock enqueue reply is lost New: client eviction if lock enqueue reply is lost
          pjones Peter Jones made changes -
          Labels Original: lu_st New: lu_st mq414
          liang Liang Zhen (Inactive) made changes -
          Description Original: A client will be evicted in this case:
              
          * ldlm lock is granted while sending lock enqueue reply
          * another thread tries to enqueue conflicting lock which can either set LDLM_FL_AST_SENT for this reply, or send blocking AST.*
          * lock enqueue reply is lost
          * RPC deadline on client side is longer than waiting lock deadline

          If all these patched, this client will be evicted even with AST resend (LU-5520)

          This patch is a workaround, it will guarantee waiting lock deadline is longer than server RPC deadline, which should be close to client side RPC deadline, so client at least has a chance to resend RPC.
              
          This patch cannot help if there are multiple messages lost, for example, if resent RPC is lost again. Also, if there is huge network latency like router congestion on the path from client to server, then we may still have client eviction even with this patch, because server has no idea about network latency.
              
          New: A client will be evicted in this case:
              
          * ldlm lock is granted while sending lock enqueue reply
          * another thread tries to enqueue conflicting lock which can either set LDLM_FL_AST_SENT for this reply, or send blocking AST.*
          * lock enqueue reply is lost
          * RPC deadline on client side is longer than waiting lock deadline

          If all these happened, this client will be evicted even with AST resend (LU-5520)

          This patch is a workaround, it will guarantee waiting lock deadline is longer than server RPC deadline, which should be close to client side RPC deadline, so client at least has a chance to resend RPC.
              
          This patch cannot help if there are multiple messages lost, for example, if resent RPC is lost again. Also, if there is huge network latency like router congestion on the path from client to server, then we may still have client eviction even with this patch, because server has no idea about network latency.
              
          Labels New: lu_st
          liang Liang Zhen (Inactive) created issue -

          People

            liang Liang Zhen (Inactive)
            liang Liang Zhen (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: