[LU-6770] use per_cpu request pool osc_rq_pools Created: 27/Jun/15 Updated: 04/Aug/17 Resolved: 13/Aug/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Wang Shilong (Inactive) | Assignee: | Yang Sheng |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Issue Links: |
|
||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
With many OSCs, the osc will pre-alloc memory at start. We can solve it by using a golbal per_cpu pool 'osc_rq_pools' rather than Also, administrator can use a module parameter to limit the momory |
| Comments |
| Comment by Gerrit Updater [ 27/Jun/15 ] | |||||||||||||||
|
Wang Shilong (wshilong@ddn.com) uploaded a new patch: http://review.whamcloud.com/15422 | |||||||||||||||
| Comment by Peter Jones [ 27/Jun/15 ] | |||||||||||||||
|
Yang Sheng Could you please take care of this patch? Thanks Peter | |||||||||||||||
| Comment by Yang Sheng [ 30/Jun/15 ] | |||||||||||||||
|
Hello Wang, I have reviewed the patch you provided and have started to run Lustre, with this patch, on my local system to verify that it works well. Once I have completed my testing I will work with you to get the patch reviewed by others, so that it can be landed. Thanks, | |||||||||||||||
| Comment by Yang Sheng [ 07/Jul/15 ] | |||||||||||||||
|
The patch has passed test & review process. But Oleg has some comment as below: [CST下午1时53分29秒] yang sheng: could you please give me some point about http://review.whamcloud.com/#/c/15422/ [CST下午1时54分01秒] yang sheng: can it be landed or still need waiting a while? [CST下午1时54分07秒] Oleg Drokin: why do we need percpu pool there? [CST下午1时54分28秒] Oleg Drokin: I mean it's still an improvement, but what if I have 260 CPUs? [CST下午1时54分40秒] Oleg Drokin: I would imagine havign a static pool of a fixed size is probably best of all [CST下午1时57分45秒] Oleg Drokin: I think the pool does not need to be super big. Just a fixed number of reqests, something like 50 (or 100, need to see how big they are) should be enough. we only expect to use them during severe OOM anyway [CST下午1时58分00秒] Oleg Drokin: with perhaps a module parameter if somebody wants an override [CST下午1时59分05秒] yang sheng: Yes, it is reasonable. [CST下午1时59分44秒] yang sheng: as this patch given the limit is %1 of total memory. [CST下午2时00分16秒] yang sheng: seem big than you are point. [CST下午2时01分19秒] Oleg Drokin: Yes. I feel it's really excessive. But the initial reasoning was that every OSC could have up to 32M of dirty pages and can send up to 8 (default) RPCs in flight. [CST下午2时01分40秒] Oleg Drokin: so every OSC had this pool in order to send the many RPCs even in OOM [CST下午2时02分02秒] Oleg Drokin: in reality if you have 2000 OSTs, it's unlikely you'd have dirty pages in all of them at the same time [CST下午2时02分11秒] Oleg Drokin: so we need to be reasonable here [CST下午2时02分38秒] Oleg Drokin: 1% of 1T of memory is still a cool 10G [CST下午2时03分34秒] yang sheng: so a fixed size is enough to handle such stiuation. [CST下午2时03分46秒] Oleg Drokin: finding a proper number is going to be tricky, but I feel it should be on the lower side somewhere in tens or low hundreds for most cases except perhaps the most extreme ones [CST下午2时04分10秒] Oleg Drokin: that's why having an override is important of course, with a good documentation about it like I explained above [CST下午2时05分02秒] yang sheng: I see. got it. Thank you very much. Oleg. | |||||||||||||||
| Comment by Wang Shilong (Inactive) [ 07/Jul/15 ] | |||||||||||||||
|
Hi Yang Sheng, So maybe we can set @osc_reqpool_mem_max=100MB or so in default, and it will try to allocate memory by checking Best Regards, | |||||||||||||||
| Comment by Yang Sheng [ 08/Jul/15 ] | |||||||||||||||
|
Hi, Shilong, I think maybe we still need consider OST number when we decide the mem_max parameter. Unless it big than a fixed number or override by module paramter. How do you think about it? Thanks, | |||||||||||||||
| Comment by Gerrit Updater [ 13/Jul/15 ] | |||||||||||||||
|
Li Xi (lixi@ddn.com) uploaded a new patch: http://review.whamcloud.com/15585 | |||||||||||||||
| Comment by Shuichi Ihara (Inactive) [ 05/Aug/15 ] | |||||||||||||||
|
patch http://review.whamcloud.com/15585 Abandoned. new patch is http://review.whamcloud.com/#/c/15422/ | |||||||||||||||
| Comment by Shuichi Ihara (Inactive) [ 05/Aug/15 ] | |||||||||||||||
|
Here is quick benchmark results on master with/without http://review.whamcloud.com/#/c/15422
Here is IOR syntax on this test. # mpirun -np 10 /work/ihara/IOR -w -e -t 1m -b 26g -k -F -o /scratch1/file # pdsh -g oss,client "sync; echo 3 > /proc/sys/vm/drop_caches" # mpirun -np 10 /work/ihara/IOR -r -e -t 1m -b 26g -F -o /scratch1/file For IOR with stress testing, I generated memory pressure with "stress" command and ran IOR under it. # stress --vm 10 --vm-bytes 10G No perforamnce regression with patch 15422 so far. | |||||||||||||||
| Comment by Shuichi Ihara (Inactive) [ 05/Aug/15 ] | |||||||||||||||
|
This is memory usage on client with 200 OSTs configuation.
Mem usage=MemFree(Before mount) - MemFree(after mount) Slab allocation=Slab(After mount) - Slab(Before mount) Patch 15422 helps to reduce significant memory usages. | |||||||||||||||
| Comment by Gerrit Updater [ 12/Aug/15 ] | |||||||||||||||
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15422/ | |||||||||||||||
| Comment by Yang Sheng [ 13/Aug/15 ] | |||||||||||||||
|
Patch landed, Close this ticket. | |||||||||||||||
| Comment by Gerrit Updater [ 08/Sep/15 ] | |||||||||||||||
|
Wrong ticket number. |