[LU-8130] Migrate from libcfs hash to rhashtable Created: 12/May/16 Updated: 05/Dec/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.9.0 |
| Fix Version/s: | Upstream |
| Type: | Improvement | Priority: | Minor |
| Reporter: | James A Simmons | Assignee: | James A Simmons |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
The linux kernel has its own resizable hashtable which can be used in place of libcfs hash code. The linux rhashtable was developed for the networking layer and that layer has very high requirements for performance. Some of the benefits are low latency as well as lockless lookups. Migrating Lustre to rhashtable should mean HUGE PERFORMANCE gains!!!!! |
| Comments |
| Comment by Peter Jones [ 12/May/16 ] |
|
I've copied Liang so he is aware of this activity in the upstream client |
| Comment by Gerrit Updater [ 07/Jul/17 ] |
|
James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/27967 |
| Comment by Gerrit Updater [ 17/Apr/18 ] |
|
James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/32036 |
| Comment by Gerrit Updater [ 17/Apr/18 ] |
|
James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/32038 |
| Comment by Gerrit Updater [ 20/Apr/18 ] |
|
James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/32102 |
| Comment by Gerrit Updater [ 21/Apr/18 ] |
|
James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/32111 |
| Comment by Gerrit Updater [ 15/May/18 ] |
|
James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/32408 |
| Comment by Gerrit Updater [ 29/May/18 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/32408/ |
| Comment by Gerrit Updater [ 07/Jun/18 ] |
|
James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/32662 |
| Comment by Gerrit Updater [ 04/Sep/18 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32102/ |
| Comment by Gerrit Updater [ 01/Oct/18 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32036/ |
| Comment by Gerrit Updater [ 25/Oct/18 ] |
|
James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33485 |
| Comment by Gerrit Updater [ 30/Oct/18 ] |
|
James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33518 |
| Comment by Gerrit Updater [ 06/Nov/18 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33595 |
| Comment by Gerrit Updater [ 10/Nov/18 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33595/ |
| Comment by Gerrit Updater [ 05/Dec/18 ] |
|
James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33789 |
| Comment by Gerrit Updater [ 14/Dec/18 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33873 |
| Comment by Gerrit Updater [ 13/Jan/19 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34020 |
| Comment by Gerrit Updater [ 15/Jan/19 ] |
|
James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/34036 |
| Comment by Gerrit Updater [ 11/Feb/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33789/ |
| Comment by Gerrit Updater [ 27/Feb/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34036/ |
| Comment by Gerrit Updater [ 15/Mar/19 ] |
|
James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/34429 |
| Comment by Gerrit Updater [ 19/Mar/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34020/ |
| Comment by Gerrit Updater [ 01/Apr/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33485/ |
| Comment by Gerrit Updater [ 10/Jun/19 ] |
|
Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35143 |
| Comment by Gerrit Updater [ 11/Jun/19 ] |
|
Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35179 |
| Comment by Gerrit Updater [ 20/Jun/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35143/ |
| Comment by Gerrit Updater [ 18/Jul/19 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35565 |
| Comment by Gerrit Updater [ 04/Sep/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35565/ |
| Comment by Gerrit Updater [ 17/Sep/19 ] |
|
James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/36216 |
| Comment by Gerrit Updater [ 17/Sep/19 ] |
|
James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/36218 |
| Comment by Gerrit Updater [ 17/Sep/19 ] |
|
James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/36219 |
| Comment by Gerrit Updater [ 17/Sep/19 ] |
|
James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/36220 |
| Comment by Gerrit Updater [ 23/Sep/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35179/ |
| Comment by Gerrit Updater [ 11/Oct/19 ] |
|
James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/36432 |
| Comment by Gerrit Updater [ 07/Nov/19 ] |
|
James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/36707 |
| Comment by James A Simmons [ 27/Nov/19 ] |
|
I just gather some performance numbers for the lu_object cache with a single MDS server. Without this work I get: Operation Max Min Mean Std Dev With the lu_object rhashtable patches I get: Operation Max Min Mean Std Dev This is with a single MDS but I expect similar scaling with a DNE setup. It will be interesting to see the impact of moving the ldlm locks to rhashtables. |
| Comment by Gerrit Updater [ 06/Dec/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36432/ |
| Comment by Gerrit Updater [ 14/Dec/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36218/ |
| Comment by Gerrit Updater [ 20/Dec/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34429/ |
| Comment by Gerrit Updater [ 03/Jan/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36219/ |
| Comment by Gerrit Updater [ 10/Jan/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36220/ |
| Comment by Gerrit Updater [ 18/Jan/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36216/ |
| Comment by Gerrit Updater [ 14/Feb/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32662/ |
| Comment by Gerrit Updater [ 18/Mar/20 ] |
|
Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/37965 |
| Comment by Gerrit Updater [ 31/Mar/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37965/ |
| Comment by Shuichi Ihara [ 03/Jun/20 ] |
|
patch https://review.whamcloud.com/#/c/36707 (patchset 27) has a regression and mdtest failed on DoM configuration. # lfs setstripe -L mdt -E 1M /ai400/dom/
# salloc -p 40n --nodes=40 --ntasks-per-node=16 mpirun --allow-run-as-root /work/tools/bin/mdtest -t -F -P -w 3901 -e 3901 -d /ai400/dom/mdt_hard -n 10000 -a POSIX -N 1 -i 3 -p 10
-- started at 06/03/2020 13:24:08 --
mdtest-3.3.0+dev was launched with 640 total task(s) on 40 node(s)
Command line used: /work/tools/bin/mdtest '-t' '-F' '-P' '-w' '3901' '-e' '3901' '-d' '/ai400/dom/mdt_hard' '-n' '10000' '-a' 'POSIX' '-N' '1' '-i' '3' '-p' '10'
Path: /ai400/dom
FS: 47.4 TiB Used FS: 0.1% Inodes: 371.8 Mi Used Inodes: 0.0%
Nodemap: 1111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
V-0: Rank 0 Line 2137 Shifting ranks by 16 for each phase.
640 tasks, 6400000 files
[ec01:08604] 639 more processes have sent help message help-mpi-btl-openib.txt / error in device init
[ec01:08604] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
ior ERROR: open64("/ai400/dom/mdt_hard/test-dir.1-0/mdtest_tree.0/file.mdtest.24.9377", 66, 0664) failed, errno 16, Device or resource busy (aiori-POSIX.c:413)
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 24 in communicator MPI_COMM_WORLD
with errorcode -1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
An MPI communication peer process has unexpectedly disconnected. This
usually indicates a failure in the peer process (e.g., a crash or
otherwise exiting without calling MPI_FINALIZE first).
Although this local MPI process will likely now behave unpredictably
(it may even hang or crash), the root cause of this problem is the
failure of the peer -- that is what you need to investigate. For
example, there may be a core file that you can examine. More
generally: such peer hangups are frequently caused by application bugs
or other external events.
Local host: ec40
Local PID: 4108
Peer host: ec02
--------------------------------------------------------------------------
[ec01:08604] 47 more processes have sent help message help-mpi-btl-tcp.txt / peer hung up
salloc: Relinquishing job allocation 4561
|
| Comment by Gerrit Updater [ 16/Jun/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33518/ |
| Comment by Gerrit Updater [ 28/Jun/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33616/ |
| Comment by Gerrit Updater [ 28/Jun/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36707/ |
| Comment by Jian Yu [ 01/Jul/20 ] |
|
Hi James,
In old kernel (like kernel-3.10.0-327.36.1.el7), the codes in linux/rhashtable.h (rhashtable.h CC [M] /root/lustre-release/libcfs/libcfs/linux/linux-hash.o
In file included from /root/lustre-release/libcfs/libcfs/linux/linux-hash.c:33:0:
/root/lustre-release/libcfs/include/libcfs/linux/linux-hash.h:71:13: error: ‘struct rhashtable_iter’ declared inside parameter list [-Werror]
struct rhashtable_iter *iter)
^
/root/lustre-release/libcfs/include/libcfs/linux/linux-hash.h:71:13: error: its scope is only this definition or declaration, which is probably not what you want [-Werror]
/root/lustre-release/libcfs/include/libcfs/linux/linux-hash.h: In function ‘rhashtable_walk_enter’:
/root/lustre-release/libcfs/include/libcfs/linux/linux-hash.h:76:2: error: implicit declaration of function ‘rhashtable_walk_init’ [-Werror=implicit-function-declaration]
return rhashtable_walk_init(ht, iter);
^
/root/lustre-release/libcfs/include/libcfs/linux/linux-hash.h: In function ‘rhltable_init’:
/root/lustre-release/libcfs/include/libcfs/linux/linux-hash.h:98:2: error: passing argument 2 of ‘rhashtable_init’ discards ‘const’ qualifier from pointer target type [-Werror]
return rhashtable_init(&hlt->ht, params);
<~snip~>
Here is a log that contains all of the build failures: make.log Is there a proper way to fix the codes to make them compatible with old kernel? |
| Comment by Jian Yu [ 06/Jul/20 ] |
This can be resolved by using HAVE_SERVER_SUPPORT to disable the rhashtable codes while building client. The rhashtable codes are only used by lu_env in lustre/obdclass/lu_object.c on server. |
| Comment by James A Simmons [ 06/Jul/20 ] |
|
That RHEL7.2 correct? Do we even test RHEL7.2 client. I know for 2.13.X you can't even build server rpms for RHEL7.2. Is this a 2.12 issue? |
| Comment by Jian Yu [ 06/Jul/20 ] |
|
Yes, James. That's a 2.12 issue on RHEL 7.2 client. After adding HAVE_SERVER_SUPPORT check for rhashtable codes in libcfs/include/libcfs/linux/linux-hash.h, the issue is resolved. |
| Comment by James A Simmons [ 06/Jul/20 ] |
|
Ah, that makes sense. rhashtable is barely uses in 2.12 so it should be a easy fix. |
| Comment by Gerrit Updater [ 01/Oct/20 ] |
|
James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/40113 |
| Comment by Gerrit Updater [ 26/Dec/20 ] |
|
James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/41092 |
| Comment by Gerrit Updater [ 17/Dec/21 ] |
|
"James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/45882 |
| Comment by Gerrit Updater [ 01/Aug/23 ] |
|
"James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51840 |
| Comment by Gerrit Updater [ 19/Aug/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51840/ |
| Comment by Gerrit Updater [ 19/Aug/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/40113/ |
| Comment by Gerrit Updater [ 06/Sep/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/32038/ |