[LU-9909] memory leak in lnet_discover() Created: 24/Aug/17  Updated: 31/Aug/17  Resolved: 31/Aug/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0
Fix Version/s: Lustre 2.11.0

Type: Bug Priority: Minor
Reporter: John Hammond Assignee: Amir Shehata (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Rank (Obsolete): 9223372036854775807

 Description   

In lnet_discover() the buffer allocated by LIBCFS_ALLOC(buf, n_ids * sizeof(*buf)) is never freed.



 Comments   
Comment by Amir Shehata (Inactive) [ 24/Aug/17 ]

thanks for catching that. Will address it.

Comment by James Nunez (Inactive) [ 24/Aug/17 ]

I'm seeing several of our tests failing with

[13455.203067] LNetError: 14901:0:(module.c:689:libcfs_exit()) Portals memory leaked: 184 bytes
mv: cannot stat '/tmp/debug': No such file or directory
Memory leaks detected

Is this the same issue?

Logs for this failure can be found at
https://testing.hpdd.intel.com/test_sets/972a44c6-87e0-11e7-b4b0-5254006e85c2

Comment by Amir Shehata (Inactive) [ 24/Aug/17 ]

Did that just start happening after the latest landing? or has it been happening for a while?

Comment by James Nunez (Inactive) [ 24/Aug/17 ]

So far, the earliest date I see the 'Portals memory leaked' error is on August 22. So, yes, they started after the last batch of patch landings to master.

Comment by John Hammond [ 24/Aug/17 ]

I bisected this locally by running conf-sanity 35a. This was introduced by commit 0f1aaad4c1b4447ee5097b8bb79a49d09eaa23c2 https://review.whamcloud.com/25789 LU-9480 lnet: implement Peer Discovery. Unfortunately leak finder doesn't work for LNet allocations. But the leak is most likely an LNet peer:

m:lustre-release# gdb lnet/lnet/lnet.ko
...
(gdb) p sizeof(struct lnet_peer)
$1 = 184
Comment by Olaf Weber [ 24/Aug/17 ]

If the leak is an lnet_peer then it differs from the problem for which this LU was opened.

Comment by James Nunez (Inactive) [ 24/Aug/17 ]

I created LU-9913 to capture the lnet_peer memory leak error.

Comment by Gerrit Updater [ 25/Aug/17 ]

Amir Shehata (amir.shehata@intel.com) uploaded a new patch: https://review.whamcloud.com/28702
Subject: LU-9909 lnet: fix memory leak and lnet_interfaces_max
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 52ea572d5e828584ba49f703f0161407384323a2

Comment by Gerrit Updater [ 31/Aug/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28702/
Subject: LU-9909 lnet: fix memory leak and lnet_interfaces_max
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 81d4f7a253193ebfe559f675d3c0975c0899d592

Comment by Peter Jones [ 31/Aug/17 ]

Landed for 2.11

Generated at Sat Feb 10 02:30:23 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.