[LU-7441] Memory leak in nrs_tbf_*_startup Created: 17/Nov/15 Updated: 07/Jul/17 Resolved: 07/Jul/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.10.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Li Xi (Inactive) | Assignee: | Emoly Liu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
nrs_tbf_*_startup should free hash table if futher process fails. |
| Comments |
| Comment by Gerrit Updater [ 17/Nov/15 ] |
|
Li Xi (lixi@ddn.com) uploaded a new patch: http://review.whamcloud.com/17224 |
| Comment by Peter Jones [ 17/Nov/15 ] |
|
Emoly Could you please take care of this patch? Thanks Peter |
| Comment by Li Xi (Inactive) [ 20/Nov/15 ] |
|
This problem is easy to produce: With following patch: memset(&start, 0, sizeof(start)); start.tc_rpc_rate = tbf_rate; [root@QYJ tests]# lctl set_param ost.OSS.ost_io.nrs_policies="tbf jobid" LNetError: 19587:0:(module.c:412:exit_libcfs_module()) Portals memory leaked: 131600 bytes |
| Comment by Emoly Liu [ 20/Nov/15 ] |
|
Lixi, thanks for your reproducer! As we discussed, this problem exists in all NRS policies code, not only TBF, so I think it's related to NRS state setting. Your patch does work, but we need to know the root cause and then fix it. |
| Comment by Li Xi (Inactive) [ 20/Nov/15 ] |
|
I don't think it is NRS state setting problem. If all funcitons of op_policy_start() always free all memory they allocats when failure happends, no memory will be leaked. And I checked all nrs_*_start functions. I didn't find any problem. |
| Comment by Gerrit Updater [ 08/Feb/17 ] |
|
Emoly Liu (emoly.liu@intel.com) uploaded a new patch: https://review.whamcloud.com/25319 |
| Comment by Gerrit Updater [ 01/Mar/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/17224/ |
| Comment by Joseph Gmitter (Inactive) [ 07/Jul/17 ] |
|
Issue here is resolved and landed for 2.10.0. The code cleanup patch is moved to a separate ticket: |