[LU-14607] osd xattr cache wasting memory Created: 12/Apr/21  Updated: 18/Feb/23  Resolved: 09/Jun/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Minor
Reporter: John Hammond Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

In some cases we pass a 64K xattr value buffer to osp_oac_xattr_find_or_add() (see below). The turns into a 128K alloc since we allocate a single memory block for oxe, the xattr name, and the xattr value. If oxe used a separate memory block for the xattr value then the allocation would be down to 64K plus change.

pr  9 16:21:54 mds-0 kernel: mdt06_014: page allocation failure: order:5, mode:0xc050
Apr  9 16:21:54 mds-0 kernel: CPU: 12 PID: 94381 Comm: mdt06_014 Kdump: loaded Tainted: G           OE  ------------ T 3.10.0-1062.18.1.el7_lustre.ddn12.x86_64 #1
Apr  9 16:21:54 mds-0 kernel: Hardware name:    /0XFK4K, BIOS 2.7.7 05/04/2020
Apr  9 16:21:54 mds-0 kernel: Call Trace:
Apr  9 16:21:54 mds-0 kernel: [<ffffffff8697b416>] dump_stack+0x19/0x1b
Apr  9 16:21:54 mds-0 kernel: [<ffffffff863c3fc0>] warn_alloc_failed+0x110/0x180
Apr  9 16:21:54 mds-0 kernel: [<ffffffff8697698a>] __alloc_pages_slowpath+0x6bb/0x729
Apr  9 16:21:54 mds-0 kernel: [<ffffffff863c8636>] __alloc_pages_nodemask+0x436/0x450
Apr  9 16:21:54 mds-0 kernel: [<ffffffff86416c58>] alloc_pages_current+0x98/0x110
Apr  9 16:21:54 mds-0 kernel: [<ffffffff863e3658>] kmalloc_order+0x18/0x40
Apr  9 16:21:54 mds-0 kernel: [<ffffffff86422216>] kmalloc_order_trace+0x26/0xa0
Apr  9 16:21:54 mds-0 kernel: [<ffffffff864261a1>] __kmalloc+0x211/0x230
Apr  9 16:21:54 mds-0 kernel: [<ffffffffc1ac43f2>] osp_oac_xattr_find_or_add+0x72/0x270 [osp]
Apr  9 16:21:54 mds-0 kernel: [<ffffffffc1ac8639>] osp_xattr_get+0xd29/0x1140 [osp]
Apr  9 16:21:54 mds-0 kernel: [<ffffffffc1ac7f61>] ? osp_xattr_get+0x651/0x1140 [osp]
Apr  9 16:21:54 mds-0 kernel: [<ffffffffc17b828a>] ? ldiskfs_xattr_trusted_get+0x2a/0x30 [ldiskfs]
Apr  9 16:21:54 mds-0 kernel: [<ffffffffc1a47b8e>] lod_xattr_get+0xee/0x700 [lod]
Apr  9 16:21:54 mds-0 kernel: [<ffffffffc16f294c>] __mdd_permission_internal+0x71c/0x9a0 [mdd]
Apr  9 16:21:54 mds-0 kernel: [<ffffffffc16cc96f>] __mdd_lookup.isra.17+0x19f/0x440 [mdd]
Apr  9 16:21:54 mds-0 kernel: [<ffffffffc16cccbf>] mdd_lookup+0xaf/0x170 [mdd]
Apr  9 16:21:54 mds-0 kernel: [<ffffffffc1970332>] mdt_lookup_version_check+0x72/0x2c0 [mdt]
Apr  9 16:21:54 mds-0 kernel: [<ffffffffc1971efb>] mdt_reint_rename+0xddb/0x28a0 [mdt]
Apr  9 16:21:54 mds-0 kernel: [<ffffffffc12dd826>] ? null_alloc_rs+0x186/0x340 [ptlrpc]
Apr  9 16:21:54 mds-0 kernel: [<ffffffffc197a533>] mdt_reint_rec+0x83/0x210 [mdt]
Apr  9 16:21:54 mds-0 kernel: [<ffffffffc1956483>] mdt_reint_internal+0x6e3/0xaf0 [mdt]
Apr  9 16:21:54 mds-0 kernel: [<ffffffffc1961e37>] mdt_reint+0x67/0x140 [mdt]
Apr  9 16:21:54 mds-0 kernel: [<ffffffffc130ff9e>] tgt_request_handle+0xaee/0x15f0 [ptlrpc]
Apr  9 16:21:54 mds-0 kernel: [<ffffffffc12e70a1>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc]
Apr  9 16:21:54 mds-0 kernel: [<ffffffffc0e73bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs]
Apr  9 16:21:54 mds-0 kernel: [<ffffffffc12b237b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
Apr  9 16:21:54 mds-0 kernel: [<ffffffffc12af195>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc]
Apr  9 16:21:54 mds-0 kernel: [<ffffffff862d3a33>] ? __wake_up+0x13/0x20
Apr  9 16:21:54 mds-0 kernel: [<ffffffffc12b5ce4>] ptlrpc_main+0xb34/0x1470 [ptlrpc]
Apr  9 16:21:54 mds-0 kernel: [<ffffffffc12b51b0>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc]
Apr  9 16:21:54 mds-0 kernel: [<ffffffff862c6321>] kthread+0xd1/0xe0
Apr  9 16:21:54 mds-0 kernel: [<ffffffff862c6250>] ? insert_kthread_work+0x40/0x40
Apr  9 16:21:54 mds-0 kernel: [<ffffffff8698ed1d>] ret_from_fork_nospec_begin+0x7/0x21
Apr  9 16:21:54 mds-0 kernel: [<ffffffff862c6250>] ? insert_kthread_work+0x40/0x40
Apr  9 16:21:54 mds-0 kernel: Mem-Info:


 Comments   
Comment by Peter Jones [ 13/Apr/21 ]

Lai

Could you please look into this one?

Peter

Comment by Andreas Dilger [ 08/May/21 ]

I was looking into this briefly, and it makes sense to split the cache allocation into two parts if "size > PAGE_SIZE". For the small xattr/common case, having a single allocation is more efficient, but in case of a large xattr just the value part should be allocated with a separate OBD_ALLOC_LARGE(). The data struct already tracks namelen, buflen separately, and has a separate pointer to the value (which can be inline for small xattrs and point to a separate buffer for large xattrs).

Comment by Gerrit Updater [ 19/May/21 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43736
Subject: LU-14607 osp: separate buffer for XATTR cache
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ba60344d5996e2545f8dedef5981d6a48c16c93b

Comment by Gerrit Updater [ 08/Jun/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43736/
Subject: LU-14607 osp: separate buffer for large XATTR
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: a1c5adf7f466cce5b9abae46704c126b1f11d6da

Comment by Peter Jones [ 09/Jun/21 ]

Landed for 2.15

Generated at Sat Feb 10 03:11:12 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.