Andreas, yes it's only for hash table can grow and it's already on master for a while.
I suspect that growing of hash table with one million entries and millions of elements will consume very long time (probably a few seconds on a busy smp server) and too expensive, i.e: if we want to increase the hash entries from 512K to 1M, then we have to:
1) alloc a few megabytes as hash head
2) initialize 1 million hash head
3) move millions of elements from old hash list to new hash list
It will be even more expensive if we don't have the rwlock and just lock different buckets to move elements, and the worst case is that we have to lock/unlock different target buckets for each element moving. Although we do relax CPU while rehashing so other threads still can access the hash table, but I'm still a little nervous if we have such kind of heavy operations on servers.
Another thing we need to notice is lu_site is not using high-level cfs_hash APIs like cfs_hash_find/add/del which will hide locks of cfs_hash, lu_site will directly refer to cfs_hash locks and low-level bucket APIs, so it can use those hash locks to protect it's own data, for example, counters and LRU for shrinker, some waitq etc. Which means we need to make some changes to lu_site if we want to enable rehash.
I think there is another option to support growing of lu_site, we can have multiple cfs_hash tables for the lu_site, i.e: 64 hash tables, and hash objects to different hash tables, any of these hash tables can grow when necessary and we don't need to worry about "big rehash" with millions of elements, global lock wouldn't be an issue either because we have many of these hash tables.
btw: shouldn't caller of lu_site_init() know about which stack (server/client) the lu_site is created for? If so can we just pass in a flag or whatever to indicate client stack to use smaller hash table?
Integrated in
lustre-master » x86_64,server,el5,ofa #285
LU-569: Make lu_object cache size adjustableOleg Drokin : c8d7c99ec50c81a33eea43ed1c535fa4d65cef23
Files :