[LU-10085] Memory leak from memory cache lnet_small_mds_cachep Created: 05/Oct/17 Updated: 10/Oct/17 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Jinshan Xiong (Inactive) | Assignee: | Sonia Sharma (Inactive) |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Recently I discovered the following issue when I was trying to clean up modules. [50094.228576] LNet: Removed LNI 10.8.1.68@tcp
[50094.236330] =============================================================================
[50094.247508] BUG kmalloc-128 (Tainted: G OE ------------ ): Objects remaining in kmalloc-128 on kmem_cache_close()
[50094.262048] -----------------------------------------------------------------------------
[50094.276429] Disabling lock debugging due to kernel taint
[50094.284021] INFO: Slab 0xffffea00216f7900 objects=64 used=4 fp=0xffff88085bde4e00 flags=0x2fffff00004080
[50094.296314] CPU: 67 PID: 92869 Comm: rmmod Tainted: G B OE ------------ 3.10.0-514.26.2.el7_lustre.x86_64 #1
[50094.309978] Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
[50094.323386] ffffea00216f7900 0000000073dedcc2 ffff88072bf4fc98 ffffffff8168729f
[50094.333183] ffff88072bf4fd70 ffffffff811da714 ffff880100000020 ffff88072bf4fd80
[50094.342911] ffff88072bf4fd30 656a624f000000c0 616d657220737463 6e6920676e696e69
[50094.352564] Call Trace:
[50094.356605] [<ffffffff8168729f>] dump_stack+0x19/0x1b
[50094.363583] [<ffffffff811da714>] slab_err+0xb4/0xe0
[50094.370353] [<ffffffff81002928>] ? calibrate_delay+0x208/0x8e0
[50094.378162] [<ffffffff811d8cf0>] ? arch_local_irq_save+0x20/0x20
[50094.386167] [<ffffffff81318729>] ? free_cpumask_var+0x9/0x10
[50094.393769] [<ffffffff810fa1bd>] ? on_each_cpu_cond+0xcd/0x180
[50094.401556] [<ffffffff811dc150>] ? kmem_cache_alloc_bulk+0x140/0x140
[50094.409938] [<ffffffff811dda13>] ? __kmalloc+0x1f3/0x240
[50094.417155] [<ffffffff811e00eb>] ? kmem_cache_close+0x12b/0x2f0
[50094.425038] [<ffffffff811e010c>] kmem_cache_close+0x14c/0x2f0
[50094.432723] [<ffffffff811e02c4>] __kmem_cache_shutdown+0x14/0x80
[50094.440705] [<ffffffff811a5e14>] kmem_cache_destroy+0x44/0xf0
[50094.448391] [<ffffffffa09fd4f1>] lnet_unprepare+0x161/0x2f0 [lnet]
[50094.456544] [<ffffffffa0a00abd>] LNetNIFini+0x8d/0x110 [lnet]
[50094.464264] [<ffffffffa0cb293d>] ptlrpc_ni_fini+0x15d/0x1e0 [ptlrpc]
[50094.472607] [<ffffffffa0ccdd25>] ? ptlrpcd_free+0x145/0x2d0 [ptlrpc]
[50094.480915] [<ffffffffa0cb2c73>] ptlrpc_exit_portals+0x13/0x20 [ptlrpc]
[50094.489763] [<ffffffffa0d433e3>] ptlrpc_exit+0x22/0xc3f [ptlrpc]
[50094.497874] [<ffffffff810fe3db>] SyS_delete_module+0x16b/0x2d0
[50094.505766] [<ffffffff81697989>] system_call_fastpath+0x16/0x1b
[50094.513724] INFO: Object 0xffff88085bde4000 @offset=0
[50094.520585] INFO: Object 0xffff88085bde4200 @offset=512
[50094.527597] INFO: Object 0xffff88085bde5400 @offset=5120
[50094.534679] INFO: Object 0xffff88085bde5a80 @offset=6784
From the backtrace, the kernel found that there existed objects from memory cache lnet_small_mds_cachep: (gdb) l *(lnet_unprepare+0x161)
0x1521 is in lnet_unprepare (/home/jinxiong/work/flr/lnet/lnet/api-ni.c:253).
248 lnet_descriptor_cleanup(void)
249 {
250
251 if (lnet_small_mds_cachep) {
252 kmem_cache_destroy(lnet_small_mds_cachep);
253 lnet_small_mds_cachep = NULL;
254 }
255
256 if (lnet_mes_cachep) {
257 kmem_cache_destroy(lnet_mes_cachep);
Please investigate if there is any memory leaks in LNET for this memory cache. |
| Comments |
| Comment by Joseph Gmitter (Inactive) [ 09/Oct/17 ] |
|
Hi Sonia, Is this something you can look into when you have time? Thanks. |
| Comment by Sonia Sharma (Inactive) [ 10/Oct/17 ] |
|
It might have cropped up after the fix for |