[LU-13680] large allocations in osd_bufs_get() failing Created: 15/Jun/20 Updated: 30/Nov/23 Resolved: 04/Jul/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.14.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Andreas Dilger | Assignee: | Andreas Dilger |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
Large allocations in osd_bufs_get() can fail if the OSS memory is fragmented after use, if that thread has not serviced a cacheless IO since startup or is newly started: kernel: ll_ost_io04_052: page allocation failure: order:5, mode:0x10c050 kernel: CPU: 4 PID: 1980 Comm: ll_ost_io04_052 Call Trace: dump_stack+0x19/0x1b warn_alloc_failed+0x110/0x180 __alloc_pages_slowpath+0x6b6/0x724 __alloc_pages_nodemask+0x404/0x420 alloc_pages_current+0x98/0x110 __get_free_pages+0xe/0x40 kmalloc_order_trace+0x2e/0xa0 osd_bufs_get+0x7b7/0x870 [osd_ldiskfs] ofd_preprw_read+0x2ea/0x1110 [ofd] ofd_preprw+0x499/0x8c0 [ofd] tgt_brw_read+0x9e3/0x1e40 [ptlrpc] tgt_request_handle+0xada/0x1570 [ptlrpc] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] ptlrpc_main+0xb34/0x1470 [ptlrpc] The order 5 allocation is 128KB, so osd_bufs_get() should be using OBD_ALLOC_LARGE() or equivalent. |
| Comments |
| Comment by Gerrit Updater [ 15/Jun/20 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38943 |
| Comment by Gerrit Updater [ 04/Jul/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38943/ |
| Comment by Peter Jones [ 04/Jul/20 ] |
|
Landed for 2.14 |
| Comment by Gerrit Updater [ 21/Dec/22 ] |
|
"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49478 |