[LU-1622] hash MEs on wildcard portal when it's possible Created: 11/Jul/12  Updated: 08/Jan/13  Resolved: 28/Aug/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0
Fix Version/s: Lustre 2.3.0

Type: Bug Priority: Minor
Reporter: Liang Zhen (Inactive) Assignee: Liang Zhen (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Rank (Obsolete): 4539

 Description   

In current LNet, MEs on unique portal are hashed into different lists, MEs on wildcard portal are all attached on a single list. This is fine for most use cases, but there is also requirement that user only use one portal for different request types, buffers for different request types are identified by match-bits, LNet selftest is one of this use-case because both ping and BRW tests share SRPC_REQUEST_PORTAL.
This is normally fine, but it could be a performance issue on server side, for example: if we created a session with bot BRW test and PING test, and we posted a thousand buffers for BRW request, 10 bufferes for PING request, if all these buffers are attached on the single list, searching for PING buffer would be very expensive.

This can be fixed by hashing MEs on wildcard portal too, but we can't do this for those MEs with "ignore bits" because incoming message will have no idea how to hash their match information w/o knowing "ignore bits" of sink buffer. So we should still attach all MEs with ignore bits on the single list, and always check this list first, no matter it's wildcard portal or unique portal.

This means if user mixed MEs with "ignore bits" and w/o "ignore bits" on one portal or set "ignore bits" for MEs on unique portal, there still could be performance issue, but they should be deprecated, and we don't have such use-case.



 Comments   
Comment by Liang Zhen (Inactive) [ 11/Jul/12 ]

Patch is here http://review.whamcloud.com/3376

Comment by Liang Zhen (Inactive) [ 13/Jul/12 ]

We also need to take care of enabled/disabled status of match-table in this patch, so it's a little more complex than I think, will update it soon.

Comment by Cory Spitz [ 16/Jul/12 ]

Liang, are you looking at expanding lnet-selftest.sh to demonstrate? Or, are you just using the existing acc-sm test?

Comment by Liang Zhen (Inactive) [ 17/Jul/12 ]

No, I will use my own scripts, selftest is part of demonstration because it can only demonstrate LNet performance, we will also use mdtest to show performance of the whole server stack.

Comment by Liang Zhen (Inactive) [ 28/Aug/12 ]

Patch landed

Comment by Wally Wang (Inactive) [ 08/Jan/13 ]

Cray's DVS does have a use case that mixes MEs with "ignore bits" and w/o "ignore bits" on one portal. The ignore bits is used as 'catch all' for spurious messages if ME bits not matching. DVS will temporarily fix it by making all MEs with ignore bits so that they all are on one list. But DVS does prefer to have hashed ME lists for performance. Please consider this use case for current testing and future design.

Generated at Sat Feb 10 01:18:15 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.