[LU-6770] use per_cpu request pool osc_rq_pools Created: 27/Jun/15  Updated: 04/Aug/17  Resolved: 13/Aug/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.8.0

Type: Improvement Priority: Minor
Reporter: Wang Shilong (Inactive) Assignee: Yang Sheng
Resolution: Fixed Votes: 0
Labels: patch

Issue Links:
Related
Rank (Obsolete): 9223372036854775807

 Description   

With many OSCs, the osc will pre-alloc memory at start.
That will occupy the memory of application, especially when the
client need to interact with hundreds of OSTs.

We can solve it by using a golbal per_cpu pool 'osc_rq_pools' rather than
local pool for per osc to change this situation. The upper limit
size of requests in pools is about 1 percent of the total memory.

Also, administrator can use a module parameter to limit the momory
usage by:
options osc osc_reqpool_mem_max=num
The unit of num is MB, and the upper limit will be:
MIN(num, 1% total memory)



 Comments   
Comment by Gerrit Updater [ 27/Jun/15 ]

Wang Shilong (wshilong@ddn.com) uploaded a new patch: http://review.whamcloud.com/15422
Subject: LU-6770 osc: use per_cpu request pool osc_rq_pools
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8fec66ecc9871d9b0c52f7f5ce65bda3a130cc38

Comment by Peter Jones [ 27/Jun/15 ]

Yang Sheng

Could you please take care of this patch?

Thanks

Peter

Comment by Yang Sheng [ 30/Jun/15 ]

Hello Wang,

I have reviewed the patch you provided and have started to run Lustre, with this patch, on my local system to verify that it works well.

Once I have completed my testing I will work with you to get the patch reviewed by others, so that it can be landed.

Thanks,
Yang Sheng.

Comment by Yang Sheng [ 07/Jul/15 ]

The patch has passed test & review process. But Oleg has some comment as below:

[CST下午1时53分29秒] yang sheng: could you please give me some point about http://review.whamcloud.com/#/c/15422/
[CST下午1时54分01秒] yang sheng: can it be landed or still need waiting a while?
[CST下午1时54分07秒] Oleg Drokin: why do we need percpu pool there?
[CST下午1时54分28秒] Oleg Drokin: I mean it's still an improvement, but what if I have 260 CPUs?
[CST下午1时54分40秒] Oleg Drokin: I would imagine havign a static pool of a fixed size is probably best of all
[CST下午1时57分45秒] Oleg Drokin: I think the pool does not need to be super big. Just a fixed number of reqests, something like 50 (or 100, need to see how big they are) should be enough. we only expect to use them during severe OOM anyway
[CST下午1时58分00秒] Oleg Drokin: with perhaps a module parameter if somebody wants an override
[CST下午1时59分05秒] yang sheng: Yes, it is reasonable.
[CST下午1时59分44秒] yang sheng: as this patch given the limit is %1 of total memory.
[CST下午2时00分16秒] yang sheng: seem big than you are point.
[CST下午2时01分19秒] Oleg Drokin: Yes. I feel it's really excessive. But the initial reasoning was that every OSC could have up to 32M of dirty pages and can send up to 8 (default) RPCs in flight.
[CST下午2时01分40秒] Oleg Drokin: so every OSC had this pool in order to send the many RPCs even in OOM
[CST下午2时02分02秒] Oleg Drokin: in reality if you have 2000 OSTs, it's unlikely you'd have dirty pages in all of them at the same time
[CST下午2时02分11秒] Oleg Drokin: so we need to be reasonable here
[CST下午2时02分38秒] Oleg Drokin: 1% of 1T of memory is still a cool 10G
[CST下午2时03分34秒] yang sheng: so a fixed size is enough to handle such stiuation.
[CST下午2时03分46秒] Oleg Drokin: finding a proper number is going to be tricky, but I feel it should be on the lower side somewhere in tens or low hundreds for most cases except perhaps the most extreme ones
[CST下午2时04分10秒] Oleg Drokin: that's why having an override is important of course, with a good documentation about it like I explained above
[CST下午2时05分02秒] yang sheng: I see. got it. Thank you very much. Oleg.
Comment by Wang Shilong (Inactive) [ 07/Jul/15 ]

Hi Yang Sheng,

So maybe we can set @osc_reqpool_mem_max=100MB or so in default, and it will try to allocate memory by checking
min(100M, 1% of memory), dose this make sense for you?

Best Regards,
Shilong

Comment by Yang Sheng [ 08/Jul/15 ]

Hi, Shilong,

I think maybe we still need consider OST number when we decide the mem_max parameter. Unless it big than a fixed number or override by module paramter. How do you think about it?

Thanks,
Yang Sheng

Comment by Gerrit Updater [ 13/Jul/15 ]

Li Xi (lixi@ddn.com) uploaded a new patch: http://review.whamcloud.com/15585
Subject: LU-6770 osc: use global osc_rq_pool to reduce memory usage
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c22aa4d8c553e974214a27e516728d88df73663c

Comment by Shuichi Ihara (Inactive) [ 05/Aug/15 ]

patch http://review.whamcloud.com/15585 Abandoned. new patch is http://review.whamcloud.com/#/c/15422/

Comment by Shuichi Ihara (Inactive) [ 05/Aug/15 ]

Here is quick benchmark results on master with/without http://review.whamcloud.com/#/c/15422
4 x OSS and an client(2 x E5-2660v3, 20 CPU cores, 128GB memory and 1 x FDR Infiniband)

  master(w/o stress) master(w/ stress) master+15422(w/o stress) master+15422(w/ stress)
Write(M/B/sec) 5604 4838 5702 4846
Read(M/B/sec) 4218 3703 4261 3939

Here is IOR syntax on this test.

# mpirun -np 10 /work/ihara/IOR -w -e -t 1m -b 26g -k -F -o /scratch1/file
# pdsh -g oss,client "sync; echo 3 > /proc/sys/vm/drop_caches"
# mpirun -np 10 /work/ihara/IOR -r -e -t 1m -b 26g -F -o /scratch1/file

For IOR with stress testing, I generated memory pressure with "stress" command and ran IOR under it.

# stress --vm 10 --vm-bytes 10G

No perforamnce regression with patch 15422 so far.

Comment by Shuichi Ihara (Inactive) [ 05/Aug/15 ]

This is memory usage on client with 200 OSTs configuation.

  Mem usage Slab allocation
master 444616 136412
master+15422 91324 64412
Mem usage=MemFree(Before mount) - MemFree(after mount)
Slab allocation=Slab(After mount) - Slab(Before mount)

Patch 15422 helps to reduce significant memory usages.

Comment by Gerrit Updater [ 12/Aug/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15422/
Subject: LU-6770 osc: use global osc_rq_pool to reduce memory usage
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 44c4f47c4d1f185831d4629cc9ca5ae5f50a8e07

Comment by Yang Sheng [ 13/Aug/15 ]

Patch landed, Close this ticket.

Comment by Gerrit Updater [ 08/Sep/15 ]

Wrong ticket number.

Generated at Sat Feb 10 02:03:05 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.