[LU-4027] Network segregation and prioritization Created: 30/Sep/13  Updated: 02/Oct/13

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Josh Fryman (Inactive) Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Rank (Obsolete): 10822

 Description   

In training class, it came up that we have machines with multiple network interfaces (multiple GigE, plus multiple QDR or FDR IB). We would like for Lustre to "not be stupid" about mixing traffic. We would like to be able to prioritize traffic, as well as control how traffic is shaped.

More specifically, Lustre should never take network packets from the microsecond IB system and drop them onto the multiple-millisecond Ethernet system. When possible, it should upgrade Ethernet traffic to IB traffic. Having shaping rules and prioritization rules and some way to describe topology to get priority sorted out would be very useful.

Related is to allow users to choose which network to use for heartbeat/liveness to ensure clients and disks are still operating. Since heartbeat is not interesting, in a multi-network environment, we would like to force heartbeat to always use (for example) Ethernet unless the Ethernet has failed.

In a related concept, "bonding" of interfaces should be specifiable in forms. "Bond for perf/bandwidth interfaces A, B, C into virtual interface V1"; "bond interfaces Q, R into V2"; then "bond for failover V1 then V2".



 Comments   
Comment by Keith Mannthey (Inactive) [ 02/Oct/13 ]

Feel free to contribute code to LNET for the more advanced traffic shaping suggestions. Generally LNET runs on a very controlled private network with careful design. Do have any examples where LNET performance was Degraded due to the issues you are concerned about?

If the lustre System is properly designed it will never take Fast Fabric and route over Slow Fabric. If this is happening it is due to some geographical reason.

Generated at Sat Feb 10 01:39:01 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.