[LU-12341] Make test-framework aware of kmemleak Created: 26/May/19  Updated: 04/Jun/19  Resolved: 04/Jun/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0
Fix Version/s: Lustre 2.13.0

Type: Improvement Priority: Minor
Reporter: Oleg Drokin Assignee: Oleg Drokin
Resolution: Fixed Votes: 0
Labels: None

Rank (Obsolete): 9223372036854775807

 Description   

Modern kernels include memory leak tracing facility called kmemleak.

It's controlled by the /sys/kernel/debug/kmemleak file when present (disabled when not).

Contents show potential memory leaks and while sstem might have their own leaks, if we clear it before Lustre modules load we can then query it before and after modules unload.

One potential problem with unloaded modules is the backtrace symbol resolution stops working. To combat that we also need to know where the modules were loaded - something we can get from /proc/modules.

Here's how to resolve an unknown symbol post module unload having saved /proc/modules output before unload:

unreferenced object 0xffff8803308e6e00 (size 512):
 comm "mount.lustre", pid 10718, jiffies 4294871396 (age 55.613s)
 hex dump (first 32 bytes):
   04 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00  ................
   10 00 00 00 00 00 00 00 20 00 00 00 00 00 00 00  ........ .......
 backtrace:
   [<ffffffff81211684>] kmem_cache_alloc_trace+0x134/0x620
   [<ffffffffa0a25584>] 0xffffffffa0a25584
   [<ffffffffa0a572b6>] 0xffffffffa0a572b6
   [<ffffffff8123a690>] mount_bdev+0x1b0/0x1f0
   [<ffffffffa0a4f0d5>] 0xffffffffa0a4f0d5
   [<ffffffff8123aff9>] mount_fs+0x39/0x1b0
   [<ffffffff81258b27>] vfs_kern_mount+0x67/0x110
   [<ffffffffa0af605c>] 0xffffffffa0af605c
   [<ffffffffa0af6c9a>] 0xffffffffa0af6c9a
   [<ffffffffa033f7f3>] 0xffffffffa033f7f3
   [<ffffffffa0340486>] 0xffffffffa0340486

we'll look towards the 1st unknown item in the trace as the demo here: 0xffffffffa0a25584

It seems to be matching this like in saved modules output for the address range:

ldiskfs 636444 1 osd_ldiskfs, Live 0xffffffffa0a1d000

So now to resolve the symbol we will use gdb:

[root@centos6-16 tests]# gdb
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-110.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
(gdb) add-symbol-file ../../ldiskfs/ldiskfs.o 0xffffffffa0a1d000
add symbol table from file "../../ldiskfs/ldiskfs.o" at
	.text_addr = 0xa0a1d000
(y or n) y
Reading symbols from /home/green/git/lustre-releas1/ldiskfs/ldiskfs.o...done.
(gdb) l *(0xffffffffa0a25584)

0xffffffffa0a25584 is in ldiskfs_mb_init (/home/green/git/lustre-releas1/ldiskfs/mballoc.c:2810).
2805		 */
2806	
2807		/* Allocate table once */
2808		sbi->s_mb_prealloc_table = kzalloc(
2809			LDISKFS_MAX_PREALLOC_TABLE * sizeof(unsigned long), GFP_NOFS);
2810		if (sbi->s_mb_prealloc_table == NULL) {
2811			ret = -ENOMEM;
2812			goto out;
2813		}
2814	

This tells us that the allocation that leaked was that for s_mb_prealloc_table (remember that addresses in backtrace often point to the next instruction after the actual call).



 Comments   
Comment by Gerrit Updater [ 04/Jun/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34959/
Subject: LU-12341 tests: Add kmemleak awareness to test-framework
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 15c0a21ea9a6e98d642e6d16898e46ba9e9b2fa9

Comment by Peter Jones [ 04/Jun/19 ]

Landed for 21.3

Generated at Sat Feb 10 02:51:43 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.