[LU-10085] Memory leak from memory cache lnet_small_mds_cachep Created: 05/Oct/17  Updated: 10/Oct/17

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Jinshan Xiong (Inactive) Assignee: Sonia Sharma (Inactive)
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Recently I discovered the following issue when I was trying to clean up modules.

[50094.228576] LNet: Removed LNI 10.8.1.68@tcp                                  
[50094.236330] =============================================================================
[50094.247508] BUG kmalloc-128 (Tainted: G           OE  ------------  ): Objects remaining in kmalloc-128 on kmem_cache_close()
[50094.262048] -----------------------------------------------------------------------------
                                                                                
[50094.276429] Disabling lock debugging due to kernel taint                     
[50094.284021] INFO: Slab 0xffffea00216f7900 objects=64 used=4 fp=0xffff88085bde4e00 flags=0x2fffff00004080
[50094.296314] CPU: 67 PID: 92869 Comm: rmmod Tainted: G    B      OE  ------------   3.10.0-514.26.2.el7_lustre.x86_64 #1
[50094.309978] Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
[50094.323386]  ffffea00216f7900 0000000073dedcc2 ffff88072bf4fc98 ffffffff8168729f
[50094.333183]  ffff88072bf4fd70 ffffffff811da714 ffff880100000020 ffff88072bf4fd80
[50094.342911]  ffff88072bf4fd30 656a624f000000c0 616d657220737463 6e6920676e696e69
[50094.352564] Call Trace:                                                      
[50094.356605]  [<ffffffff8168729f>] dump_stack+0x19/0x1b                       
[50094.363583]  [<ffffffff811da714>] slab_err+0xb4/0xe0                         
[50094.370353]  [<ffffffff81002928>] ? calibrate_delay+0x208/0x8e0              
[50094.378162]  [<ffffffff811d8cf0>] ? arch_local_irq_save+0x20/0x20            
[50094.386167]  [<ffffffff81318729>] ? free_cpumask_var+0x9/0x10                
[50094.393769]  [<ffffffff810fa1bd>] ? on_each_cpu_cond+0xcd/0x180              
[50094.401556]  [<ffffffff811dc150>] ? kmem_cache_alloc_bulk+0x140/0x140        
[50094.409938]  [<ffffffff811dda13>] ? __kmalloc+0x1f3/0x240                    
[50094.417155]  [<ffffffff811e00eb>] ? kmem_cache_close+0x12b/0x2f0             
[50094.425038]  [<ffffffff811e010c>] kmem_cache_close+0x14c/0x2f0               
[50094.432723]  [<ffffffff811e02c4>] __kmem_cache_shutdown+0x14/0x80            
[50094.440705]  [<ffffffff811a5e14>] kmem_cache_destroy+0x44/0xf0               
[50094.448391]  [<ffffffffa09fd4f1>] lnet_unprepare+0x161/0x2f0 [lnet]          
[50094.456544]  [<ffffffffa0a00abd>] LNetNIFini+0x8d/0x110 [lnet]               
[50094.464264]  [<ffffffffa0cb293d>] ptlrpc_ni_fini+0x15d/0x1e0 [ptlrpc]        
[50094.472607]  [<ffffffffa0ccdd25>] ? ptlrpcd_free+0x145/0x2d0 [ptlrpc]        
[50094.480915]  [<ffffffffa0cb2c73>] ptlrpc_exit_portals+0x13/0x20 [ptlrpc]     
[50094.489763]  [<ffffffffa0d433e3>] ptlrpc_exit+0x22/0xc3f [ptlrpc]            
[50094.497874]  [<ffffffff810fe3db>] SyS_delete_module+0x16b/0x2d0              
[50094.505766]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b             
[50094.513724] INFO: Object 0xffff88085bde4000 @offset=0                        
[50094.520585] INFO: Object 0xffff88085bde4200 @offset=512                      
[50094.527597] INFO: Object 0xffff88085bde5400 @offset=5120                     
[50094.534679] INFO: Object 0xffff88085bde5a80 @offset=6784    

From the backtrace, the kernel found that there existed objects from memory cache lnet_small_mds_cachep:

(gdb) l *(lnet_unprepare+0x161)
0x1521 is in lnet_unprepare (/home/jinxiong/work/flr/lnet/lnet/api-ni.c:253).
248	lnet_descriptor_cleanup(void)
249	{
250	
251		if (lnet_small_mds_cachep) {
252			kmem_cache_destroy(lnet_small_mds_cachep);
253			lnet_small_mds_cachep = NULL;
254		}
255	
256		if (lnet_mes_cachep) {
257			kmem_cache_destroy(lnet_mes_cachep);

Please investigate if there is any memory leaks in LNET for this memory cache.



 Comments   
Comment by Joseph Gmitter (Inactive) [ 09/Oct/17 ]

Hi Sonia,

Is this something you can look into when you have time?

Thanks.
Joe

Comment by Sonia Sharma (Inactive) [ 10/Oct/17 ]

It might have cropped up after the fix for LU-9203. Investigating.

Generated at Sat Feb 10 02:31:55 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.