[LU-250] Router random shuffling support. Created: 29/Apr/11  Updated: 05/Jan/12  Resolved: 05/Jan/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0
Fix Version/s: Lustre 2.2.0

Type: Improvement Priority: Minor
Reporter: James A Simmons Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: None
Environment:

Lustre routers


Bugzilla ID: 21,103
Rank (Obsolete): 4778

 Description   

This is the last of the router improvements that where developed at ORNL. This feature allows routes to be randomly placed. Are results show a 20 percent improvement with random shuffling.



 Comments   
Comment by James A Simmons [ 29/Apr/11 ]

Original patch at

http://review.whamcloud.com/#change,249

There was comment by Zhen about what part of the nid address space is used to merge with the random seed.
The current code does

if (LNET_NETTYP(LNET_NIDNET(ni->to ni_nid)) != LOLND)
seed[0] ^= LNET_NIDADDR(ni->ni_nid);

The issue is that you could end up with a system such as

NODE_1: 192.168.1.100@o2ib and 192.168.2.100@tcp;

NODE_2: 192.168.1.101@o2ib and 192.168.2.101@tcp;

Thus generating the same seed. Zhen what would you recommend to merge with the seed value?

Comment by Andreas Dilger [ 13/Dec/11 ]

James, one option is to use the lnet network number and the LND type to disambiguate the IP addresses.

Comment by James A Simmons [ 13/Dec/11 ]

New posted patch with your suggestion at http://review.whamcloud.com/#change,249

Comment by Peter Jones [ 13/Dec/11 ]

Lai

Could you please work with Liang to inspect and land this patch form ORNL?

Thanks

Peter

Comment by Isaac Huang (Inactive) [ 22/Dec/11 ]

Hi James,

You mentioned that "Our results show a 20 percent improvement with random shuffling."

Can you point me to more details about it?

Comment by James A Simmons [ 28/Dec/11 ]

oThe prblem we were seeing at the lab was a very large number of lnet hash collisions. Adding a random number to the selection progress helped reduce the collisions.

Comment by Isaac Huang (Inactive) [ 03/Jan/12 ]

I was quite surprised by the 20% number, and was wondering how it was calculated, e.g. what kind of tests done and when (i.e. right after lnet initialization or long after). The 20% seemed quite significant even comparing the new shuffler and no shuffling at all.

Comment by Build Master (Inactive) [ 05/Jan/12 ]

Integrated in lustre-master » x86_64,server,el5,ofa #407
LU-250 lnet: Router random shuffling support (Revision cd46a96e518722547d4b34a393e0e87bf9828e4c)

Result = FAILURE
Oleg Drokin : cd46a96e518722547d4b34a393e0e87bf9828e4c
Files :

  • lnet/lnet/lib-move.c
  • lnet/lnet/router.c
Comment by Build Master (Inactive) [ 05/Jan/12 ]

Integrated in lustre-master » x86_64,client,el5,ofa #407
LU-250 lnet: Router random shuffling support (Revision cd46a96e518722547d4b34a393e0e87bf9828e4c)

Result = SUCCESS
Oleg Drokin : cd46a96e518722547d4b34a393e0e87bf9828e4c
Files :

  • lnet/lnet/lib-move.c
  • lnet/lnet/router.c
Comment by Build Master (Inactive) [ 05/Jan/12 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #407
LU-250 lnet: Router random shuffling support (Revision cd46a96e518722547d4b34a393e0e87bf9828e4c)

Result = SUCCESS
Oleg Drokin : cd46a96e518722547d4b34a393e0e87bf9828e4c
Files :

  • lnet/lnet/lib-move.c
  • lnet/lnet/router.c
Comment by Build Master (Inactive) [ 05/Jan/12 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #407
LU-250 lnet: Router random shuffling support (Revision cd46a96e518722547d4b34a393e0e87bf9828e4c)

Result = SUCCESS
Oleg Drokin : cd46a96e518722547d4b34a393e0e87bf9828e4c
Files :

  • lnet/lnet/router.c
  • lnet/lnet/lib-move.c
Comment by Build Master (Inactive) [ 05/Jan/12 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #407
LU-250 lnet: Router random shuffling support (Revision cd46a96e518722547d4b34a393e0e87bf9828e4c)

Result = SUCCESS
Oleg Drokin : cd46a96e518722547d4b34a393e0e87bf9828e4c
Files :

  • lnet/lnet/router.c
  • lnet/lnet/lib-move.c
Comment by Build Master (Inactive) [ 05/Jan/12 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #407
LU-250 lnet: Router random shuffling support (Revision cd46a96e518722547d4b34a393e0e87bf9828e4c)

Result = SUCCESS
Oleg Drokin : cd46a96e518722547d4b34a393e0e87bf9828e4c
Files :

  • lnet/lnet/lib-move.c
  • lnet/lnet/router.c
Comment by Build Master (Inactive) [ 05/Jan/12 ]

Integrated in lustre-master » i686,server,el6,inkernel #407
LU-250 lnet: Router random shuffling support (Revision cd46a96e518722547d4b34a393e0e87bf9828e4c)

Result = SUCCESS
Oleg Drokin : cd46a96e518722547d4b34a393e0e87bf9828e4c
Files :

  • lnet/lnet/router.c
  • lnet/lnet/lib-move.c
Comment by Build Master (Inactive) [ 05/Jan/12 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #407
LU-250 lnet: Router random shuffling support (Revision cd46a96e518722547d4b34a393e0e87bf9828e4c)

Result = SUCCESS
Oleg Drokin : cd46a96e518722547d4b34a393e0e87bf9828e4c
Files :

  • lnet/lnet/lib-move.c
  • lnet/lnet/router.c
Comment by Build Master (Inactive) [ 05/Jan/12 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #407
LU-250 lnet: Router random shuffling support (Revision cd46a96e518722547d4b34a393e0e87bf9828e4c)

Result = SUCCESS
Oleg Drokin : cd46a96e518722547d4b34a393e0e87bf9828e4c
Files :

  • lnet/lnet/router.c
  • lnet/lnet/lib-move.c
Comment by Build Master (Inactive) [ 05/Jan/12 ]

Integrated in lustre-master » i686,server,el5,ofa #407
LU-250 lnet: Router random shuffling support (Revision cd46a96e518722547d4b34a393e0e87bf9828e4c)

Result = SUCCESS
Oleg Drokin : cd46a96e518722547d4b34a393e0e87bf9828e4c
Files :

  • lnet/lnet/lib-move.c
  • lnet/lnet/router.c
Comment by Build Master (Inactive) [ 05/Jan/12 ]

Integrated in lustre-master » i686,server,el5,inkernel #407
LU-250 lnet: Router random shuffling support (Revision cd46a96e518722547d4b34a393e0e87bf9828e4c)

Result = SUCCESS
Oleg Drokin : cd46a96e518722547d4b34a393e0e87bf9828e4c
Files :

  • lnet/lnet/lib-move.c
  • lnet/lnet/router.c
Comment by Build Master (Inactive) [ 05/Jan/12 ]

Integrated in lustre-master » i686,client,el6,inkernel #407
LU-250 lnet: Router random shuffling support (Revision cd46a96e518722547d4b34a393e0e87bf9828e4c)

Result = SUCCESS
Oleg Drokin : cd46a96e518722547d4b34a393e0e87bf9828e4c
Files :

  • lnet/lnet/lib-move.c
  • lnet/lnet/router.c
Comment by Build Master (Inactive) [ 05/Jan/12 ]

Integrated in lustre-master » i686,client,el5,inkernel #407
LU-250 lnet: Router random shuffling support (Revision cd46a96e518722547d4b34a393e0e87bf9828e4c)

Result = SUCCESS
Oleg Drokin : cd46a96e518722547d4b34a393e0e87bf9828e4c
Files :

  • lnet/lnet/router.c
  • lnet/lnet/lib-move.c
Comment by Build Master (Inactive) [ 05/Jan/12 ]

Integrated in lustre-master » i686,client,el5,ofa #407
LU-250 lnet: Router random shuffling support (Revision cd46a96e518722547d4b34a393e0e87bf9828e4c)

Result = SUCCESS
Oleg Drokin : cd46a96e518722547d4b34a393e0e87bf9828e4c
Files :

  • lnet/lnet/lib-move.c
  • lnet/lnet/router.c
Generated at Sat Feb 10 01:05:13 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.