[LU-7689] limit lu_site hash table size on clients Created: 20/Jan/16  Updated: 14/Mar/16  Resolved: 14/Mar/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.9.0

Type: Improvement Priority: Minor
Reporter: Li Dongyang (Inactive) Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: patch

Rank (Obsolete): 9223372036854775807

 Description   

lu_site_init() will allocate a hash table during the setup of both client and osd,
which use the same default formula: we assume the lu_site cache can take up to 20% of total memory.

It makes sense for osd but on client, we are allocating a ~128M hash table on a 32G box per mount. To make it worse we are mounting multiple lustre fs on a box so it can take ~128M * mounts of memory.

We have lu_cache_percent as a module param but we should limit the hash table size by default on clients.



 Comments   
Comment by Gerrit Updater [ 20/Jan/16 ]

Li Dongyang (dongyang.li@anu.edu.au) uploaded a new patch: http://review.whamcloud.com/18048
Subject: LU-7689 obdclass: limit lu_site hash table size on clients
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d715c3ad1be054f52c9787472fae9d16b5e6f94e

Comment by Li Dongyang (Inactive) [ 28/Jan/16 ]

I've done mdtest with 16 processes, 262144 objects each, 3 iterations.
That gives us 4 million objects each iteration.

without the patch:

mdtest-1.9.3 was launched with 16 total task(s) on 1 node(s)
Command line used: /system/Benchmarks/mdtest/1.9.3/mdtest -d /mnt/testfs/z00/dyl900/mdtest -n 262144 -i 3
Path: /mnt/testfs/z00/dyl900
FS: 854.4 TiB Used FS: 0.6% Inodes: 854.5 Mi Used Inodes: 0.0%

16 tasks, 4194304 files/directories

SUMMARY: (of 3 iterations)
Operation Max Min Mean Std Dev
--------- — — ---- -------
Directory creation: 4169.451 4049.008 4113.665 49.569
Directory stat : 8553.647 7678.810 8243.482 399.929
Directory removal : 2312.643 1684.909 1941.188 268.901
File creation : 3714.316 3666.937 3686.580 20.171
File stat : 7936.584 7511.260 7787.747 195.697
File read : 8772.322 7397.213 8055.384 562.922
File removal : 2814.176 2177.893 2579.797 285.497
Tree creation : 3472.106 2173.215 2956.409 562.996
Tree removal : 7.550 6.808 7.093 0.326

with the patch applied:
mdtest-1.9.3 was launched with 16 total task(s) on 1 node(s)
Command line used: /system/Benchmarks/mdtest/1.9.3/mdtest -d /mnt/testfs/z00/dyl900/mdtest -n 262144 -i 3
Path: /mnt/testfs/z00/dyl900
FS: 854.4 TiB Used FS: 0.6% Inodes: 854.5 Mi Used Inodes: 0.0%

16 tasks, 4194304 files/directories

SUMMARY: (of 3 iterations)
Operation Max Min Mean Std Dev
--------- — — ---- -------
Directory creation: 4258.673 4128.302 4178.702 57.184
Directory stat : 8203.775 8060.866 8147.118 61.982
Directory removal : 2356.021 1868.107 2173.262 217.180
File creation : 3683.373 3608.999 3655.445 33.067
File stat : 8058.992 7698.923 7898.330 149.529
File read : 8838.700 8575.945 8680.637 113.714
File removal : 2857.897 2262.444 2657.039 279.036
Tree creation : 4132.319 2755.784 3630.101 620.513
Tree removal : 7.673 6.568 7.151 0.453

and the hash_bd_depmax is 130 without the patch, 221 while applied.

Is there any other particular benchmark I should run?
Thanks

Comment by nasf (Inactive) [ 29/Jan/16 ]

mdtest cannot handle the client-side cache properly since it is general test tool, not Lustre special. I would suggest to do the following test for the performance of traversing directory:

1) Assume the test directory is TDIR, under the TDIR, generate 10M regular files, you can do that via 'touch' (slow), or via Lustre test tool 'createmany' (faster than touch).
2) After generating the test data set, remount the client to drop the client side cache. Then "ls -l TDIR" to measure the performance without cache.
3) And then "ls -l TDIR" again to measure the performance with cache.

You can measure the performance with and without your patch, then compare them. Usually, we use multiple clients to generate the test data set in parallel to increase the create speed. 0-stripe file is the most fast case. Depends on you test environment and time, you also can consider to measure 1-stripe and 4-stirpes cases.

Comment by Li Dongyang (Inactive) [ 01/Feb/16 ]

Hi all,
I followed the comment from nash, created 10M files with createmany and here are the results from 0-stripe file:
without the patch:
[12:04:58 root@r3:z00] # time ls -l dyl900 > /dev/null

real 26m57.756s
user 1m7.876s
sys 15m39.102s
[12:32:54 root@r3:z00] # time ls -l dyl900 > /dev/null

real 27m19.028s
user 1m6.301s
sys 15m55.731s

with the patch:
[14:15:29 root@r3:z00] # time ls -l dyl900 > /dev/null

real 25m10.092s
user 1m6.833s
sys 13m53.716s
[14:40:51 root@r3:z00] # time ls -l dyl900 > /dev/null

real 28m52.916s
user 1m8.438s
sys 16m9.032s

Looking into testing with 1-stripe and 4-stripes files, but I do have a question: how can I create 1-stripe and 4-stripes files using 'createmany'?

Comment by nasf (Inactive) [ 01/Feb/16 ]

Looking into testing with 1-stripe and 4-stripes files, but I do have a question: how can I create 1-stripe and 4-stripes files using 'createmany'?

"lfs setstripe -c 1 $TDIR", then all the regular files created under $TDIR (after the setstripe) will be 1-striped unless you specify the file layout manually. Similarly for 4-striped case, specify "-c 4".

Comment by Li Dongyang (Inactive) [ 04/Feb/16 ]

Here are the results for 1 and 4 stripes:

1-stripe
baseline:
bash-4.1# time ls -l testdir > /dev/null

real 25m22.670s
user 1m4.831s
sys 13m41.840s
bash-4.1# time ls -l testdir > /dev/null

real 27m5.056s
user 1m5.908s
sys 15m2.135s

patched:
bash-4.1# time ls -l testdir > /dev/null

real 25m42.354s
user 1m9.786s
sys 13m57.352s
bash-4.1# time ls -l testdir > /dev/null

real 27m3.739s
user 1m10.196s
sys 14m59.850s

4-stripes:

baseline:
bash-4.1# time ls -l testdir > /dev/null

real 41m45.328s
user 1m14.652s
sys 32m12.239s
bash-4.1# time ls -l testdir > /dev/null

real 46m57.302s
user 1m15.295s
sys 36m12.470s

patched:
bash-4.1# time ls -l testdir > /dev/null

real 39m48.663s
user 1m15.712s
sys 29m54.248s
bash-4.1# time ls -l testdir > /dev/null

real 42m28.805s
user 1m13.754s
sys 33m8.527s

It beats me that with 4 stripes the patched client is actually faster, I did multiple runs to verify that.

Comment by nasf (Inactive) [ 04/Feb/16 ]

Dongyang, thanks for sharing the test results.

Comment by Gerrit Updater [ 14/Mar/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18048/
Subject: LU-7689 obdclass: limit lu_site hash table size on clients
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 522c1eb4d2f5faf1fa87be07d9617df1439fc0d6

Generated at Sat Feb 10 02:11:04 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.