[LU-14469] lmv_rmfid() does 128K kmalloc() Created: 23/Feb/21  Updated: 19/Sep/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: John Hammond Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: easy, lad23dd

Issue Links:
Related
is related to LU-16427 'lfs rmfid' does not print anything o... Resolved
is related to LU-15058 replace critical vmalloc allocations ... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

When ll_rmfid() is called with 4096 FIDs (which is the max) then lmv_rmfid() will kmalloc() a 128K fid_array. Seen running sanity 421d locally:

[89441.204480] Lustre: DEBUG MARKER: == sanity test 421d: rmfid en masse =========================
========================================= 09:24:51 (1614093891)
[89445.855469] lt-lfs: page allocation failure: order:5, mode:0xc050
[89445.857280] CPU: 2 PID: 31000 Comm: lt-lfs Kdump: loaded Tainted: G        W  OE  ------------ 
  3.10.0-1062.9.1.el7.x86_64.debug #1
[89445.860587] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/20
14
[89445.862335] Call Trace:
[89445.862774]  [<ffffffffa68704ae>] dump_stack+0x19/0x1b
[89445.863489]  [<ffffffffa620d86a>] warn_alloc_failed+0x11a/0x190
[89445.864314]  [<ffffffffa686b1ad>] __alloc_pages_slowpath+0x708/0x776
[89445.865171]  [<ffffffffa6213410>] __alloc_pages_nodemask+0x660/0x680
[89445.866104]  [<ffffffffa6269b2e>] alloc_pages_current+0x10e/0x1a0
[89445.866938]  [<ffffffffa6232115>] ? kmalloc_order+0x25/0x70
[89445.867705]  [<ffffffffa6232115>] kmalloc_order+0x25/0x70
[89445.868413]  [<ffffffffa62762f4>] kmalloc_order_trace+0x24/0x170
[89445.869234]  [<ffffffffa627a279>] __kmalloc+0x409/0x430
[89445.869991]  [<ffffffffc141f0c8>] ? lmv_fld_lookup+0x258/0x430 [lmv]
[89445.870872]  [<ffffffffc14119bb>] lmv_rmfid+0x71b/0xc20 [lmv]
[89445.871661]  [<ffffffffc2072e81>] ll_rmfid+0x581/0x780 [lustre]
[89445.872572]  [<ffffffffc2074841>] ll_dir_ioctl+0x17c1/0x61c0 [lustre]
[89445.873550]  [<ffffffffa62d1a94>] ? mntput+0x24/0x40
[89445.874349]  [<ffffffffa62b4dc1>] ? terminate_walk+0xb1/0xc0
[89445.875228]  [<ffffffffa642a45b>] ? debug_check_no_obj_freed+0xfb/0x270
[89445.876216]  [<ffffffffa61409fd>] ? trace_hardirqs_on+0xd/0x10
[89445.877098]  [<ffffffffa642a4d8>] ? debug_check_no_obj_freed+0x178/0x270
[89445.878880]  [<ffffffffa62be880>] do_vfs_ioctl+0x410/0x6c0
[89445.880163]  [<ffffffffa61787d4>] ? __audit_syscall_entry+0xb4/0x110
[89445.881656]  [<ffffffffa62cc6c0>] ? fget_light+0x2b0/0x550
[89445.882968]  [<ffffffffa62bebd1>] SyS_ioctl+0xa1/0xc0
[89445.884175]  [<ffffffffa6887a9e>] system_call_fastpath+0x25/0x2a
[89445.885552] Mem-Info:


 Comments   
Comment by John Hammond [ 25/Feb/21 ]

lmv_rmfid() also leaks the request set in some error cases.

Comment by James A Simmons [ 20/Dec/21 ]

Wrong LU number. For another ticket.

Generated at Sat Feb 10 03:10:02 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.