[LU-250] Router random shuffling support. Created: 29/Apr/11 Updated: 05/Jan/12 Resolved: 05/Jan/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.0 |
| Fix Version/s: | Lustre 2.2.0 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | James A Simmons | Assignee: | Lai Siyao |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre routers |
||
| Bugzilla ID: | 21,103 |
| Rank (Obsolete): | 4778 |
| Description |
|
This is the last of the router improvements that where developed at ORNL. This feature allows routes to be randomly placed. Are results show a 20 percent improvement with random shuffling. |
| Comments |
| Comment by James A Simmons [ 29/Apr/11 ] |
|
Original patch at http://review.whamcloud.com/#change,249 There was comment by Zhen about what part of the nid address space is used to merge with the random seed. if (LNET_NETTYP(LNET_NIDNET(ni->to ni_nid)) != LOLND) The issue is that you could end up with a system such as NODE_1: 192.168.1.100@o2ib and 192.168.2.100@tcp; NODE_2: 192.168.1.101@o2ib and 192.168.2.101@tcp; Thus generating the same seed. Zhen what would you recommend to merge with the seed value? |
| Comment by Andreas Dilger [ 13/Dec/11 ] |
|
James, one option is to use the lnet network number and the LND type to disambiguate the IP addresses. |
| Comment by James A Simmons [ 13/Dec/11 ] |
|
New posted patch with your suggestion at http://review.whamcloud.com/#change,249 |
| Comment by Peter Jones [ 13/Dec/11 ] |
|
Lai Could you please work with Liang to inspect and land this patch form ORNL? Thanks Peter |
| Comment by Isaac Huang (Inactive) [ 22/Dec/11 ] |
|
Hi James, You mentioned that "Our results show a 20 percent improvement with random shuffling." Can you point me to more details about it? |
| Comment by James A Simmons [ 28/Dec/11 ] |
|
oThe prblem we were seeing at the lab was a very large number of lnet hash collisions. Adding a random number to the selection progress helped reduce the collisions. |
| Comment by Isaac Huang (Inactive) [ 03/Jan/12 ] |
|
I was quite surprised by the 20% number, and was wondering how it was calculated, e.g. what kind of tests done and when (i.e. right after lnet initialization or long after). The 20% seemed quite significant even comparing the new shuffler and no shuffling at all. |
| Comment by Build Master (Inactive) [ 05/Jan/12 ] |
|
Integrated in Result = FAILURE
|
| Comment by Build Master (Inactive) [ 05/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 05/Jan/12 ] |
|
Integrated in Result = SUCCESS
|