[LU-13680] large allocations in osd_bufs_get() failing Created: 15/Jun/20  Updated: 30/Nov/23  Resolved: 04/Jul/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Minor
Reporter: Andreas Dilger Assignee: Andreas Dilger
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
Related
is related to LU-11347 Do not use pagecache for SSD I/O when... Resolved
is related to LU-12071 bypass pagecache for large files Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Large allocations in osd_bufs_get() can fail if the OSS memory is fragmented after use, if that thread has not serviced a cacheless IO since startup or is newly started:

kernel: ll_ost_io04_052: page allocation failure: order:5, mode:0x10c050
kernel: CPU: 4 PID: 1980 Comm: ll_ost_io04_052 
Call Trace:
dump_stack+0x19/0x1b
warn_alloc_failed+0x110/0x180
__alloc_pages_slowpath+0x6b6/0x724
__alloc_pages_nodemask+0x404/0x420
alloc_pages_current+0x98/0x110
__get_free_pages+0xe/0x40
kmalloc_order_trace+0x2e/0xa0
osd_bufs_get+0x7b7/0x870 [osd_ldiskfs]
ofd_preprw_read+0x2ea/0x1110 [ofd]
ofd_preprw+0x499/0x8c0 [ofd]
tgt_brw_read+0x9e3/0x1e40 [ptlrpc]
tgt_request_handle+0xada/0x1570 [ptlrpc]
ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
ptlrpc_main+0xb34/0x1470 [ptlrpc]

The order 5 allocation is 128KB, so osd_bufs_get() should be using OBD_ALLOC_LARGE() or equivalent.



 Comments   
Comment by Gerrit Updater [ 15/Jun/20 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38943
Subject: LU-13680 osd-ldiskfs: handle large allocations
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5cab7541253e2f83bba1e6725f97708bfc1679eb

Comment by Gerrit Updater [ 04/Jul/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38943/
Subject: LU-13680 osd-ldiskfs: handle large allocations
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: bbb14d40a4be6a9172b80ed3208f81be2f1d1b66

Comment by Peter Jones [ 04/Jul/20 ]

Landed for 2.14

Comment by Gerrit Updater [ 21/Dec/22 ]

"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49478
Subject: LU-13680 osd-ldiskfs: handle large allocations
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: a7cfc5ffe18db69f63d4261526b0dda1914dc9bb

Generated at Sat Feb 10 03:03:18 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.