Details

    • New Feature
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • 16881

    Description

      Because most ptlrpc messages do not have ACK , RPC client cannot distinguish message loss from long service time. Also, in current implementation, message-resend can only be triggered by RPC client after service timeout, no matter which message is lost in lifecycle of RPC.

      To improve Lustre RAS against message loss, we should allow message resend for any step of RPC lifecycle. However, current RPC client already has request message timeout/resend protocol and adaptive timeout, it may need fundamental changes if we want to have ACK for request message and use network timeout instead of service time to trigger request message resend. This may require a lot more efforts and resources, so it is not covered by this document.

      Reply-resend is relatively simple and more practicable, RPC server can repeatedly resend reply at fix time interval (e.g. 20 seconds), which should be sufficient even for latency in environment with router. Reply-resend can be stopped when there is an ACK for reply message, or client is evicted/disconnected.

      Attachments

        Issue Links

          Activity

            [LU-10275] ptlrpc reply acknowledgement
            adilger Andreas Dilger made changes -
            Labels New: lnet performance
            adilger Andreas Dilger made changes -
            Key Original: INTL-173 New: LU-10275
            Workflow Original: classic default workflow [ 34287 ] New: Sub-task Blocking [ 57098 ]
            Project Original: Intel Internal [ 10117 ] New: Lustre [ 10000 ]
            adilger Andreas Dilger made changes -
            Assignee Original: Zhenyu Xu [ bobijam ] New: Amir Shehata [ ashehata ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to INTL-166 [ INTL-166 ]
            pjones Peter Jones made changes -
            Link New: This issue is related to JFC-11 [ JFC-11 ]
            liang Liang Zhen (Inactive) made changes -
            Assignee Original: Liang Zhen [ liang ] New: Zhenyu Xu [ bobijam ]
            johann Johann Lombardi (Inactive) made changes -
            Labels Original: lu_st
            liang Liang Zhen (Inactive) made changes -
            Attachment New: ack_perf_5.xlsx [ 17117 ]
            liang Liang Zhen (Inactive) made changes -
            Labels New: lu_st
            liang Liang Zhen (Inactive) created issue -

            People

              ashehata Amir Shehata (Inactive)
              liang Liang Zhen (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated: