Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10085

Memory leak from memory cache lnet_small_mds_cachep

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Recently I discovered the following issue when I was trying to clean up modules.

      [50094.228576] LNet: Removed LNI 10.8.1.68@tcp                                  
      [50094.236330] =============================================================================
      [50094.247508] BUG kmalloc-128 (Tainted: G           OE  ------------  ): Objects remaining in kmalloc-128 on kmem_cache_close()
      [50094.262048] -----------------------------------------------------------------------------
                                                                                      
      [50094.276429] Disabling lock debugging due to kernel taint                     
      [50094.284021] INFO: Slab 0xffffea00216f7900 objects=64 used=4 fp=0xffff88085bde4e00 flags=0x2fffff00004080
      [50094.296314] CPU: 67 PID: 92869 Comm: rmmod Tainted: G    B      OE  ------------   3.10.0-514.26.2.el7_lustre.x86_64 #1
      [50094.309978] Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
      [50094.323386]  ffffea00216f7900 0000000073dedcc2 ffff88072bf4fc98 ffffffff8168729f
      [50094.333183]  ffff88072bf4fd70 ffffffff811da714 ffff880100000020 ffff88072bf4fd80
      [50094.342911]  ffff88072bf4fd30 656a624f000000c0 616d657220737463 6e6920676e696e69
      [50094.352564] Call Trace:                                                      
      [50094.356605]  [<ffffffff8168729f>] dump_stack+0x19/0x1b                       
      [50094.363583]  [<ffffffff811da714>] slab_err+0xb4/0xe0                         
      [50094.370353]  [<ffffffff81002928>] ? calibrate_delay+0x208/0x8e0              
      [50094.378162]  [<ffffffff811d8cf0>] ? arch_local_irq_save+0x20/0x20            
      [50094.386167]  [<ffffffff81318729>] ? free_cpumask_var+0x9/0x10                
      [50094.393769]  [<ffffffff810fa1bd>] ? on_each_cpu_cond+0xcd/0x180              
      [50094.401556]  [<ffffffff811dc150>] ? kmem_cache_alloc_bulk+0x140/0x140        
      [50094.409938]  [<ffffffff811dda13>] ? __kmalloc+0x1f3/0x240                    
      [50094.417155]  [<ffffffff811e00eb>] ? kmem_cache_close+0x12b/0x2f0             
      [50094.425038]  [<ffffffff811e010c>] kmem_cache_close+0x14c/0x2f0               
      [50094.432723]  [<ffffffff811e02c4>] __kmem_cache_shutdown+0x14/0x80            
      [50094.440705]  [<ffffffff811a5e14>] kmem_cache_destroy+0x44/0xf0               
      [50094.448391]  [<ffffffffa09fd4f1>] lnet_unprepare+0x161/0x2f0 [lnet]          
      [50094.456544]  [<ffffffffa0a00abd>] LNetNIFini+0x8d/0x110 [lnet]               
      [50094.464264]  [<ffffffffa0cb293d>] ptlrpc_ni_fini+0x15d/0x1e0 [ptlrpc]        
      [50094.472607]  [<ffffffffa0ccdd25>] ? ptlrpcd_free+0x145/0x2d0 [ptlrpc]        
      [50094.480915]  [<ffffffffa0cb2c73>] ptlrpc_exit_portals+0x13/0x20 [ptlrpc]     
      [50094.489763]  [<ffffffffa0d433e3>] ptlrpc_exit+0x22/0xc3f [ptlrpc]            
      [50094.497874]  [<ffffffff810fe3db>] SyS_delete_module+0x16b/0x2d0              
      [50094.505766]  [<ffffffff81697989>] system_call_fastpath+0x16/0x1b             
      [50094.513724] INFO: Object 0xffff88085bde4000 @offset=0                        
      [50094.520585] INFO: Object 0xffff88085bde4200 @offset=512                      
      [50094.527597] INFO: Object 0xffff88085bde5400 @offset=5120                     
      [50094.534679] INFO: Object 0xffff88085bde5a80 @offset=6784    
      

      From the backtrace, the kernel found that there existed objects from memory cache lnet_small_mds_cachep:

      (gdb) l *(lnet_unprepare+0x161)
      0x1521 is in lnet_unprepare (/home/jinxiong/work/flr/lnet/lnet/api-ni.c:253).
      248	lnet_descriptor_cleanup(void)
      249	{
      250	
      251		if (lnet_small_mds_cachep) {
      252			kmem_cache_destroy(lnet_small_mds_cachep);
      253			lnet_small_mds_cachep = NULL;
      254		}
      255	
      256		if (lnet_mes_cachep) {
      257			kmem_cache_destroy(lnet_mes_cachep);
      

      Please investigate if there is any memory leaks in LNET for this memory cache.

      Attachments

        Activity

          People

            sharmaso Sonia Sharma (Inactive)
            jay Jinshan Xiong (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: