Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5435

Simulate message loss and high latency in LNet

Details

    • Improvement
    • Resolution: Fixed
    • Blocker
    • Lustre 2.7.0
    • None
    • None
    • 15136

    Description

      Although Lustre has OBD_FAILs to concoct request or reply loss of RPCs, but they are mostly used for unit-tests and not flexible enough to inject random message losses while running with workload.

      Combination of OBD_FAIL_PTLRPC_DROP_RPC and CFS_FAIL_RAND can randomly drop RPCs, however, it always drops request before send and leaves RPC in the same status, it cannot simulate LNet message loss which may trigger more complex RPC status, for example, loss of LNet ACK/REPLY of ptlrpc bulk request, or ptlrpc reply etc.

      So we need to create a new mechanism to support randomly silent message loss in network of small testing systems. A straightforward solution is to allow user to control message drop in LNet, it needs new user interfaces to add or remove Drop Rule of message, and internal handlers of these drop rules in core LNet.

      To simplify implementation, LNet Drop Rule should only be applied to the receive side of a connection (this still can cover all message paths), each Drop Rule contains a few attributes:

      • Source NID
      • Destination NID
      • Drop Rate Factor, if the factor is N, in each N incoming messages that can match this rule, LNet will randomly drop one of them.

      User can add new Drop Rule by run command:

      lctl net_drop add --source SOURCE_NID –dest DESTINATION_NID  --rate DROP_RATE
      

      Here are some examples

      $ Lctl net_drop_add --source *@o2ib0 --dest *@tcp2 --rate 1000
        Randomly drop 1 message in each 1000 messages from o2ib0 to tcp2
      
      $ Lctl net_drop_add --source 192.168.1.100@tcp0 --dest *@o2ib3 --rate 500
        Randomly drop 1 message in each 500 messages from 192.168.1.100@tcp0 to any nodes of o2ib3
      
      $ Lctl net_drop_add --source *@o2ib2 --dest * 2000
        Randomly drop 1 message in each 2000 incoming messages from o2ib2.
      

      User can remove Drop Rule by running command

        lctl net_drop_del --source SOURCE_NID --dest DESTINATION_NID
      

      All rules will be removed if user simply run “lctl net_drop_del --all”

      Show all LNet Drop Rules by running command

      lctl net_drop_list
      

      With LNet Drop Rule, we can simulate unreliable network with simple environment and small number of machines. User can add Drop Rule on either end point of cluster (client or server), or LNet routers.

      The major benefit of adding Drop Rules only on LNet routers is, the same router pool can be used to test any Lustre version, because router only needs LNet which does not have compatibility issue. It also means this feature does not need to be backported.

      Attachments

        Issue Links

          Activity

            [LU-5435] Simulate message loss and high latency in LNet

            Patches landed to Master.

            jlevi Jodi Levi (Inactive) added a comment - Patches landed to Master.

            true, "network" here is from perspective of upper layer Lustre stack, so it will only simulate, e.g. lnet router failure or buffer congestion of Lustre network, as you said, it does not examine how LNet/LND behave for underlying network failure.

            liang Liang Zhen (Inactive) added a comment - true, "network" here is from perspective of upper layer Lustre stack, so it will only simulate, e.g. lnet router failure or buffer congestion of Lustre network, as you said, it does not examine how LNet/LND behave for underlying network failure.

            This makes sense as long as it is understood that this approach is only verifying what Lustre does with dropped and delayed messages. It does not examine what our LNDs do to detect and respond to network faults from the underlying fabric (IB, TCP, or Gemini). Dropping/delaying messages intentionally from LNet is not a network fault and will not trigger any sort of switch/router behaviour.

            doug Doug Oucharek (Inactive) added a comment - This makes sense as long as it is understood that this approach is only verifying what Lustre does with dropped and delayed messages. It does not examine what our LNDs do to detect and respond to network faults from the underlying fabric (IB, TCP, or Gemini). Dropping/delaying messages intentionally from LNet is not a network fault and will not trigger any sort of switch/router behaviour.

            The third patch is ready, it's implementation of latency simulation: http://review.whamcloud.com/#/c/11409/
            in current implementation, delayed message on a "Delay Rule" always have fixed latency, we should be able to have random latency between (0, latency), which means we need to somehow sorting delayed message, we may use binheap to sort them, which is very scalable and already implemented in libcfs.

            liang Liang Zhen (Inactive) added a comment - The third patch is ready, it's implementation of latency simulation: http://review.whamcloud.com/#/c/11409/ in current implementation, delayed message on a "Delay Rule" always have fixed latency, we should be able to have random latency between (0, latency), which means we need to somehow sorting delayed message, we may use binheap to sort them, which is very scalable and already implemented in libcfs.

            Hi Doug, as Johann said, this is actually for testing lustre with simulation of unreliable network. There are several benefits of having this in LNet instead of using mechanism of underlying network stack:

            • we can use the same approach for different network types (ib, tcp, even gemini)
            • we have more control to fault simulation, for example, only drop/delay message for specific portal, or specific lnet message type between two nodes.
            • communication between application will not be affected.

            Also, this is a new sub-module of LNet, it only has very little change to LNet, so it's a low risk feature.

            liang Liang Zhen (Inactive) added a comment - Hi Doug, as Johann said, this is actually for testing lustre with simulation of unreliable network. There are several benefits of having this in LNet instead of using mechanism of underlying network stack: we can use the same approach for different network types (ib, tcp, even gemini) we have more control to fault simulation, for example, only drop/delay message for specific portal, or specific lnet message type between two nodes. communication between application will not be affected. Also, this is a new sub-module of LNet, it only has very little change to LNet, so it's a low risk feature.

            Our intent is to run soak testing with fault injection (i.e. message drop/delay, failover, ...) on a configuration which is as close as possible to a real-life Lustre installation. We are thus going to use IB and LNET routing.
            FYI, i used to work on a Lustre bug (a race) a couple of years ago which could only be reproduced from client nodes connected through IB.

            johann Johann Lombardi (Inactive) added a comment - Our intent is to run soak testing with fault injection (i.e. message drop/delay, failover, ...) on a configuration which is as close as possible to a real-life Lustre installation. We are thus going to use IB and LNET routing. FYI, i used to work on a Lustre bug (a race) a couple of years ago which could only be reproduced from client nodes connected through IB.

            I thought the objective was to determine how LNet (and Lustre) reacts to delays and drops. The underlying transport, be it IB or TCP, should not matter at that point.

            doug Doug Oucharek (Inactive) added a comment - I thought the objective was to determine how LNet (and Lustre) reacts to delays and drops. The underlying transport, be it IB or TCP, should not matter at that point.

            We need something that can work with any supported lnd and more particularly IB.

            johann Johann Lombardi (Inactive) added a comment - - edited We need something that can work with any supported lnd and more particularly IB.

            If the delays and drops are "random", could not a TCP/IP network impairment solution be used here rather than changing LNet source?

            doug Doug Oucharek (Inactive) added a comment - If the delays and drops are "random", could not a TCP/IP network impairment solution be used here rather than changing LNet source?

            I'm working on another mechanism which is "Packet Delay Simulator", it will randomly choose some messages and delay sending for arbitrary seconds (finite time). By this way, we can simulate network congestion, and see how Lustre reacts.
            Not like "Drop Rule", this sub-module is only functional on LNet router, which means message sender will see completion event of send immediately (no latency for sender), then message is blocked on router for N seconds, receiver will see delayed message after N seconds.

            To guarantee delayed message can eventually be sent out, router_checker thread will be reused for sending delayed messages, although LND threads can send delayed messages as well (but there is no guarantee that LND threads will be waken up and send delayed message)

            although implementation is going to be different with Drop Rule, but lctl command will be like drop rule:

            lctl net_delay_add --source NID --dest NID --latency SECONDS
            lctl net_delay_del --source NID --dest NID
            lctl net_delay_list
            
            liang Liang Zhen (Inactive) added a comment - I'm working on another mechanism which is "Packet Delay Simulator", it will randomly choose some messages and delay sending for arbitrary seconds (finite time). By this way, we can simulate network congestion, and see how Lustre reacts. Not like "Drop Rule", this sub-module is only functional on LNet router, which means message sender will see completion event of send immediately (no latency for sender), then message is blocked on router for N seconds, receiver will see delayed message after N seconds. To guarantee delayed message can eventually be sent out, router_checker thread will be reused for sending delayed messages, although LND threads can send delayed messages as well (but there is no guarantee that LND threads will be waken up and send delayed message) although implementation is going to be different with Drop Rule, but lctl command will be like drop rule: lctl net_delay_add --source NID --dest NID --latency SECONDS lctl net_delay_del --source NID --dest NID lctl net_delay_list

            People

              liang Liang Zhen (Inactive)
              liang Liang Zhen (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: