Details
-
Improvement
-
Resolution: Fixed
-
Minor
-
Lustre 2.13.0
-
None
-
9223372036854775807
Description
Modern kernels include memory leak tracing facility called kmemleak.
It's controlled by the /sys/kernel/debug/kmemleak file when present (disabled when not).
Contents show potential memory leaks and while sstem might have their own leaks, if we clear it before Lustre modules load we can then query it before and after modules unload.
One potential problem with unloaded modules is the backtrace symbol resolution stops working. To combat that we also need to know where the modules were loaded - something we can get from /proc/modules.
Here's how to resolve an unknown symbol post module unload having saved /proc/modules output before unload:
unreferenced object 0xffff8803308e6e00 (size 512): comm "mount.lustre", pid 10718, jiffies 4294871396 (age 55.613s) hex dump (first 32 bytes): 04 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 ................ 10 00 00 00 00 00 00 00 20 00 00 00 00 00 00 00 ........ ....... backtrace: [<ffffffff81211684>] kmem_cache_alloc_trace+0x134/0x620 [<ffffffffa0a25584>] 0xffffffffa0a25584 [<ffffffffa0a572b6>] 0xffffffffa0a572b6 [<ffffffff8123a690>] mount_bdev+0x1b0/0x1f0 [<ffffffffa0a4f0d5>] 0xffffffffa0a4f0d5 [<ffffffff8123aff9>] mount_fs+0x39/0x1b0 [<ffffffff81258b27>] vfs_kern_mount+0x67/0x110 [<ffffffffa0af605c>] 0xffffffffa0af605c [<ffffffffa0af6c9a>] 0xffffffffa0af6c9a [<ffffffffa033f7f3>] 0xffffffffa033f7f3 [<ffffffffa0340486>] 0xffffffffa0340486
we'll look towards the 1st unknown item in the trace as the demo here: 0xffffffffa0a25584
It seems to be matching this like in saved modules output for the address range:
ldiskfs 636444 1 osd_ldiskfs, Live 0xffffffffa0a1d000
So now to resolve the symbol we will use gdb:
[root@centos6-16 tests]# gdb GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-110.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. (gdb) add-symbol-file ../../ldiskfs/ldiskfs.o 0xffffffffa0a1d000 add symbol table from file "../../ldiskfs/ldiskfs.o" at .text_addr = 0xa0a1d000 (y or n) y Reading symbols from /home/green/git/lustre-releas1/ldiskfs/ldiskfs.o...done. (gdb) l *(0xffffffffa0a25584) 0xffffffffa0a25584 is in ldiskfs_mb_init (/home/green/git/lustre-releas1/ldiskfs/mballoc.c:2810). 2805 */ 2806 2807 /* Allocate table once */ 2808 sbi->s_mb_prealloc_table = kzalloc( 2809 LDISKFS_MAX_PREALLOC_TABLE * sizeof(unsigned long), GFP_NOFS); 2810 if (sbi->s_mb_prealloc_table == NULL) { 2811 ret = -ENOMEM; 2812 goto out; 2813 } 2814
This tells us that the allocation that leaked was that for s_mb_prealloc_table (remember that addresses in backtrace often point to the next instruction after the actual call).