[LU-7689] limit lu_site hash table size on clients Created: 20/Jan/16 Updated: 14/Mar/16 Resolved: 14/Mar/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Li Dongyang (Inactive) | Assignee: | WC Triage |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
lu_site_init() will allocate a hash table during the setup of both client and osd, It makes sense for osd but on client, we are allocating a ~128M hash table on a 32G box per mount. To make it worse we are mounting multiple lustre fs on a box so it can take ~128M * mounts of memory. We have lu_cache_percent as a module param but we should limit the hash table size by default on clients. |
| Comments |
| Comment by Gerrit Updater [ 20/Jan/16 ] |
|
Li Dongyang (dongyang.li@anu.edu.au) uploaded a new patch: http://review.whamcloud.com/18048 |
| Comment by Li Dongyang (Inactive) [ 28/Jan/16 ] |
|
I've done mdtest with 16 processes, 262144 objects each, 3 iterations. without the patch: mdtest-1.9.3 was launched with 16 total task(s) on 1 node(s) 16 tasks, 4194304 files/directories SUMMARY: (of 3 iterations) with the patch applied: 16 tasks, 4194304 files/directories SUMMARY: (of 3 iterations) and the hash_bd_depmax is 130 without the patch, 221 while applied. Is there any other particular benchmark I should run? |
| Comment by nasf (Inactive) [ 29/Jan/16 ] |
|
mdtest cannot handle the client-side cache properly since it is general test tool, not Lustre special. I would suggest to do the following test for the performance of traversing directory: 1) Assume the test directory is TDIR, under the TDIR, generate 10M regular files, you can do that via 'touch' (slow), or via Lustre test tool 'createmany' (faster than touch). You can measure the performance with and without your patch, then compare them. Usually, we use multiple clients to generate the test data set in parallel to increase the create speed. 0-stripe file is the most fast case. Depends on you test environment and time, you also can consider to measure 1-stripe and 4-stirpes cases. |
| Comment by Li Dongyang (Inactive) [ 01/Feb/16 ] |
|
Hi all, real 26m57.756s real 27m19.028s with the patch: real 25m10.092s real 28m52.916s Looking into testing with 1-stripe and 4-stripes files, but I do have a question: how can I create 1-stripe and 4-stripes files using 'createmany'? |
| Comment by nasf (Inactive) [ 01/Feb/16 ] |
"lfs setstripe -c 1 $TDIR", then all the regular files created under $TDIR (after the setstripe) will be 1-striped unless you specify the file layout manually. Similarly for 4-striped case, specify "-c 4". |
| Comment by Li Dongyang (Inactive) [ 04/Feb/16 ] |
|
Here are the results for 1 and 4 stripes: 1-stripe real 25m22.670s real 27m5.056s patched: real 25m42.354s real 27m3.739s 4-stripes: baseline: real 41m45.328s real 46m57.302s patched: real 39m48.663s real 42m28.805s It beats me that with 4 stripes the patched client is actually faster, I did multiple runs to verify that. |
| Comment by nasf (Inactive) [ 04/Feb/16 ] |
|
Dongyang, thanks for sharing the test results. |
| Comment by Gerrit Updater [ 14/Mar/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18048/ |