[LU-11897] ineffective memory allocation in ptlrpc Created: 29/Jan/19  Updated: 19/Feb/19  Resolved: 19/Feb/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.13.0

Type: Bug Priority: Major
Reporter: Andrew Perepechko Assignee: Andrew Perepechko
Resolution: Fixed Votes: 0
Labels: patch

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

For a few ptlrpc services, rqbd buffers are allocated using non-2^n allocation requests. This leads to ineffective memory usage and in some cases even to OOM.

A patch will be uploaded shortly.



 Comments   
Comment by Gerrit Updater [ 29/Jan/19 ]

Andrew Perepechko (c17827@cray.com) uploaded a new patch: https://review.whamcloud.com/34127
Subject: LU-11897 ost: improve memory allocation for ost
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c3c7734466d1b0b6ca248037e8eb4546371739f4

Comment by Andrew Perepechko [ 29/Jan/19 ]

How this bug led to an OOM event:

[12:19 PM] Alexey Lyashkov: we have a 470k (+/-) rqbd buffers with 16k-32k size which hold just 660k rpc with 383 byte size.
[12:20 PM] Alexey Lyashkov: hm.. it looks we have a trivial bug in ptlrpc code
[12:20 PM] Alexey Lyashkov: #define OST_MAXREQSIZE          (16 * 1024)
[12:20 PM] Alexey Lyashkov: but Panda say
[12:21 PM] Alexey Lyashkov: ```srv_max_req_size = 17408,```
[12:21 PM] Alexey Lyashkov: so MM uses a 32k allocation for 17k buffer
[12:21 PM] Alexey Lyashkov: and 15k is useless..
[12:21 PM] Alexey Lyashkov: and overhead..
...
[12:42 PM] Alexey Lyashkov: next catch is near of it - one more "17k" "leak"
[12:42 PM] Alexey Lyashkov: srv_buf_size = 17408,  /   srv_max_req_size = 17408,
[12:43 PM] Alexey Lyashkov: so 17k buffer will be unlinked after 328 bytes arrived.

Comment by Andrew Perepechko [ 29/Jan/19 ]

While originally a whole 17 KiB (in fact, 32 KiB due to rounding) was used for a single 328 byte RPC, with the patch it is possible to put additional (32768-17408)/328-1=45 RPCs in the same buffer, no additional memory required.

Comment by Gerrit Updater [ 18/Feb/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34127/
Subject: LU-11897 ost: improve memory allocation for ost
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 3a90458bd84d43cc75c5a80f8c02f30d6412690a

Comment by Peter Jones [ 19/Feb/19 ]

Landed for 2.13

Generated at Sat Feb 10 02:47:53 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.