Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12341

Make test-framework aware of kmemleak

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.13.0
    • Lustre 2.13.0
    • None
    • 9223372036854775807

    Description

      Modern kernels include memory leak tracing facility called kmemleak.

      It's controlled by the /sys/kernel/debug/kmemleak file when present (disabled when not).

      Contents show potential memory leaks and while sstem might have their own leaks, if we clear it before Lustre modules load we can then query it before and after modules unload.

      One potential problem with unloaded modules is the backtrace symbol resolution stops working. To combat that we also need to know where the modules were loaded - something we can get from /proc/modules.

      Here's how to resolve an unknown symbol post module unload having saved /proc/modules output before unload:

      unreferenced object 0xffff8803308e6e00 (size 512):
       comm "mount.lustre", pid 10718, jiffies 4294871396 (age 55.613s)
       hex dump (first 32 bytes):
         04 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00  ................
         10 00 00 00 00 00 00 00 20 00 00 00 00 00 00 00  ........ .......
       backtrace:
         [<ffffffff81211684>] kmem_cache_alloc_trace+0x134/0x620
         [<ffffffffa0a25584>] 0xffffffffa0a25584
         [<ffffffffa0a572b6>] 0xffffffffa0a572b6
         [<ffffffff8123a690>] mount_bdev+0x1b0/0x1f0
         [<ffffffffa0a4f0d5>] 0xffffffffa0a4f0d5
         [<ffffffff8123aff9>] mount_fs+0x39/0x1b0
         [<ffffffff81258b27>] vfs_kern_mount+0x67/0x110
         [<ffffffffa0af605c>] 0xffffffffa0af605c
         [<ffffffffa0af6c9a>] 0xffffffffa0af6c9a
         [<ffffffffa033f7f3>] 0xffffffffa033f7f3
         [<ffffffffa0340486>] 0xffffffffa0340486
      

      we'll look towards the 1st unknown item in the trace as the demo here: 0xffffffffa0a25584

      It seems to be matching this like in saved modules output for the address range:

      ldiskfs 636444 1 osd_ldiskfs, Live 0xffffffffa0a1d000
      

      So now to resolve the symbol we will use gdb:

      [root@centos6-16 tests]# gdb
      GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-110.el7
      Copyright (C) 2013 Free Software Foundation, Inc.
      License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
      and "show warranty" for details.
      This GDB was configured as "x86_64-redhat-linux-gnu".
      For bug reporting instructions, please see:
      <http://www.gnu.org/software/gdb/bugs/>.
      (gdb) add-symbol-file ../../ldiskfs/ldiskfs.o 0xffffffffa0a1d000
      add symbol table from file "../../ldiskfs/ldiskfs.o" at
      	.text_addr = 0xa0a1d000
      (y or n) y
      Reading symbols from /home/green/git/lustre-releas1/ldiskfs/ldiskfs.o...done.
      (gdb) l *(0xffffffffa0a25584)
      
      0xffffffffa0a25584 is in ldiskfs_mb_init (/home/green/git/lustre-releas1/ldiskfs/mballoc.c:2810).
      2805		 */
      2806	
      2807		/* Allocate table once */
      2808		sbi->s_mb_prealloc_table = kzalloc(
      2809			LDISKFS_MAX_PREALLOC_TABLE * sizeof(unsigned long), GFP_NOFS);
      2810		if (sbi->s_mb_prealloc_table == NULL) {
      2811			ret = -ENOMEM;
      2812			goto out;
      2813		}
      2814	
      

      This tells us that the allocation that leaked was that for s_mb_prealloc_table (remember that addresses in backtrace often point to the next instruction after the actual call).

      Attachments

        Activity

          People

            green Oleg Drokin
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: