[LU-5435] Simulate message loss and high latency in LNet Created: 31/Jul/14  Updated: 27/Apr/15  Resolved: 04/Nov/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.7.0

Type: Improvement Priority: Blocker
Reporter: Liang Zhen (Inactive) Assignee: Liang Zhen (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Rank (Obsolete): 15136

 Description   

Although Lustre has OBD_FAILs to concoct request or reply loss of RPCs, but they are mostly used for unit-tests and not flexible enough to inject random message losses while running with workload.

Combination of OBD_FAIL_PTLRPC_DROP_RPC and CFS_FAIL_RAND can randomly drop RPCs, however, it always drops request before send and leaves RPC in the same status, it cannot simulate LNet message loss which may trigger more complex RPC status, for example, loss of LNet ACK/REPLY of ptlrpc bulk request, or ptlrpc reply etc.

So we need to create a new mechanism to support randomly silent message loss in network of small testing systems. A straightforward solution is to allow user to control message drop in LNet, it needs new user interfaces to add or remove Drop Rule of message, and internal handlers of these drop rules in core LNet.

To simplify implementation, LNet Drop Rule should only be applied to the receive side of a connection (this still can cover all message paths), each Drop Rule contains a few attributes:

  • Source NID
  • Destination NID
  • Drop Rate Factor, if the factor is N, in each N incoming messages that can match this rule, LNet will randomly drop one of them.

User can add new Drop Rule by run command:

lctl net_drop add --source SOURCE_NID –dest DESTINATION_NID  --rate DROP_RATE

Here are some examples

$ Lctl net_drop_add --source *@o2ib0 --dest *@tcp2 --rate 1000
  Randomly drop 1 message in each 1000 messages from o2ib0 to tcp2

$ Lctl net_drop_add --source 192.168.1.100@tcp0 --dest *@o2ib3 --rate 500
  Randomly drop 1 message in each 500 messages from 192.168.1.100@tcp0 to any nodes of o2ib3

$ Lctl net_drop_add --source *@o2ib2 --dest * 2000
  Randomly drop 1 message in each 2000 incoming messages from o2ib2.

User can remove Drop Rule by running command

  lctl net_drop_del --source SOURCE_NID --dest DESTINATION_NID

All rules will be removed if user simply run “lctl net_drop_del --all”

Show all LNet Drop Rules by running command

lctl net_drop_list

With LNet Drop Rule, we can simulate unreliable network with simple environment and small number of machines. User can add Drop Rule on either end point of cluster (client or server), or LNet routers.

The major benefit of adding Drop Rules only on LNet routers is, the same router pool can be used to test any Lustre version, because router only needs LNet which does not have compatibility issue. It also means this feature does not need to be backported.



 Comments   
Comment by Liang Zhen (Inactive) [ 31/Jul/14 ]

links of patches:
http://review.whamcloud.com/#/c/11313
http://review.whamcloud.com/#/c/11314

Comment by Liang Zhen (Inactive) [ 04/Aug/14 ]

I'm working on another mechanism which is "Packet Delay Simulator", it will randomly choose some messages and delay sending for arbitrary seconds (finite time). By this way, we can simulate network congestion, and see how Lustre reacts.
Not like "Drop Rule", this sub-module is only functional on LNet router, which means message sender will see completion event of send immediately (no latency for sender), then message is blocked on router for N seconds, receiver will see delayed message after N seconds.

To guarantee delayed message can eventually be sent out, router_checker thread will be reused for sending delayed messages, although LND threads can send delayed messages as well (but there is no guarantee that LND threads will be waken up and send delayed message)

although implementation is going to be different with Drop Rule, but lctl command will be like drop rule:

lctl net_delay_add --source NID --dest NID --latency SECONDS
lctl net_delay_del --source NID --dest NID
lctl net_delay_list
Comment by Doug Oucharek (Inactive) [ 07/Aug/14 ]

If the delays and drops are "random", could not a TCP/IP network impairment solution be used here rather than changing LNet source?

Comment by Johann Lombardi (Inactive) [ 07/Aug/14 ]

We need something that can work with any supported lnd and more particularly IB.

Comment by Doug Oucharek (Inactive) [ 07/Aug/14 ]

I thought the objective was to determine how LNet (and Lustre) reacts to delays and drops. The underlying transport, be it IB or TCP, should not matter at that point.

Comment by Johann Lombardi (Inactive) [ 07/Aug/14 ]

Our intent is to run soak testing with fault injection (i.e. message drop/delay, failover, ...) on a configuration which is as close as possible to a real-life Lustre installation. We are thus going to use IB and LNET routing.
FYI, i used to work on a Lustre bug (a race) a couple of years ago which could only be reproduced from client nodes connected through IB.

Comment by Liang Zhen (Inactive) [ 12/Aug/14 ]

Hi Doug, as Johann said, this is actually for testing lustre with simulation of unreliable network. There are several benefits of having this in LNet instead of using mechanism of underlying network stack:

  • we can use the same approach for different network types (ib, tcp, even gemini)
  • we have more control to fault simulation, for example, only drop/delay message for specific portal, or specific lnet message type between two nodes.
  • communication between application will not be affected.

Also, this is a new sub-module of LNet, it only has very little change to LNet, so it's a low risk feature.

Comment by Liang Zhen (Inactive) [ 12/Aug/14 ]

The third patch is ready, it's implementation of latency simulation: http://review.whamcloud.com/#/c/11409/
in current implementation, delayed message on a "Delay Rule" always have fixed latency, we should be able to have random latency between (0, latency), which means we need to somehow sorting delayed message, we may use binheap to sort them, which is very scalable and already implemented in libcfs.

Comment by Doug Oucharek (Inactive) [ 12/Aug/14 ]

This makes sense as long as it is understood that this approach is only verifying what Lustre does with dropped and delayed messages. It does not examine what our LNDs do to detect and respond to network faults from the underlying fabric (IB, TCP, or Gemini). Dropping/delaying messages intentionally from LNet is not a network fault and will not trigger any sort of switch/router behaviour.

Comment by Liang Zhen (Inactive) [ 13/Aug/14 ]

true, "network" here is from perspective of upper layer Lustre stack, so it will only simulate, e.g. lnet router failure or buffer congestion of Lustre network, as you said, it does not examine how LNet/LND behave for underlying network failure.

Comment by Jodi Levi (Inactive) [ 04/Nov/14 ]

Patches landed to Master.

Generated at Sat Feb 10 01:51:28 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.