[LU-35] Make portals_handle use the generic cfs_hash_t Created: 05/Jan/11  Updated: 17/Apr/13  Resolved: 17/Apr/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.0.0
Fix Version/s: Lustre 2.1.0

Type: Improvement Priority: Minor
Reporter: John Spray (Inactive) Assignee: John Spray (Inactive)
Resolution: Incomplete Votes: 0
Labels: None

Rank (Obsolete): 7777

 Description   

Despite the existence of the cfs_hash_t hash table class, the global hash table of portals_handle has its own fixed-bucket-count hash table implementation, using RCU for the lists of handles.

This issue relates to a patch to be pushed to gerrit, which will:

  • Remove the RCU support from the portals_handle hash table (RCU is less portable, requires extra complexity in code using portals_handle, and makes the portals_handle struct larger)
  • Convert the portals_handle hash table to be a standard cfs_hash_t (reducing code size and getting advantage of hash resizing)
  • Introduce per-cpu variables for the portals_handle cookie generation to reduce contention on a global lock when allocating cookies (potentially improving SMP performance)


 Comments   
Comment by John Spray (Inactive) [ 05/Jan/11 ]

See http://review.whamcloud.com/#change,167

Comment by Andreas Dilger [ 05/Jan/11 ]

Changes to lustre_handle can have a significant impact on SMP performance, which was the original reason to add RCU to this structure (see http://bugzilla.lustre.org/show_bug.cgi?id=10706 for the original problem that was resolved by this change). We need to validate this new patch by running it on a many-core SMP system under load. Liang should comment on the testing method and data analysis used during the development of the SMP scaling patches.

Also, please remove the "#if !defined(HAVE_RCU)" compatibility code at the top of lustre/obdclass/lustre_handle.c as part of this patch, since it looks to be unnecessary with the removal of RCU usage.

Comment by Liang Zhen (Inactive) [ 05/Jan/11 ]

Andreas, it's cool to see you here,

there are a few reasons we might want to change lustre_handle:

  • some handles are not rarely changed, i.e: the most commonly used one is ldlm_lock handle, I'm not sure but probably RCU is not quite good choice for that case
  • the RCU related stuff increased about 40 bytes for each handle, so we have more memory footprint.
  • it would be good to have a few bits as "type" information of "cookie"(or have different handle types in different hash-table?), so we got less chance to screw up memory if remote peer passing a totally wrong handle to local which is a different data type. I'm not sure whether it's the thing we should consider about

Anyway, I totally agree we do need to measure performance of the patch when we got environment...

Comment by John Spray (Inactive) [ 17/Apr/13 ]

Cleaning up: this has been idle for >2yrs.

Generated at Sat Feb 10 01:03:05 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.