[LU-7441] Memory leak in nrs_tbf_*_startup Created: 17/Nov/15  Updated: 07/Jul/17  Resolved: 07/Jul/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Major
Reporter: Li Xi (Inactive) Assignee: Emoly Liu
Resolution: Fixed Votes: 0
Labels: patch

Issue Links:
Related
is related to LU-9750 misc code cleanups in nrs policy code Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

nrs_tbf_*_startup should free hash table if futher process fails.



 Comments   
Comment by Gerrit Updater [ 17/Nov/15 ]

Li Xi (lixi@ddn.com) uploaded a new patch: http://review.whamcloud.com/17224
Subject: LU-7441 nrs: fix memory leak in nrs_tbf_*_startup
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f1b173201d137a8acd924084d03a76d2bdcab0a1

Comment by Peter Jones [ 17/Nov/15 ]

Emoly

Could you please take care of this patch?

Thanks

Peter

Comment by Li Xi (Inactive) [ 20/Nov/15 ]

This problem is easy to produce:

With following patch:

memset(&start, 0, sizeof(start));
start.tc_jobids_str = "*";

start.tc_rpc_rate = tbf_rate;
start.tc_rule_flags = NTRS_DEFAULT;
start.tc_name = NRS_TBF_DEFAULT_RULE;
INIT_LIST_HEAD(&start.tc_jobids);
//rc = nrs_tbf_rule_start(policy, head, &start);
rc = -EINVAL;
return rc;

[root@QYJ tests]# lctl set_param ost.OSS.ost_io.nrs_policies="tbf jobid"
ost.OSS.ost_io.nrs_policies=tbf jobid
error: set_param: setting /proc/fs/lustre/ost/OSS/ost_io/nrs_policies=tbf jobid: Invalid argument
[root@QYJ tests]# sh llmountcleanup.sh
Stopping clients: QYJ /mnt/lustre (opts:-f)
Stopping client QYJ /mnt/lustre opts:-f
Stopping clients: QYJ /mnt/lustre2 (opts:-f)
Stopping /mnt/mds1 (opts:-f) on QYJ
Stopping /mnt/ost1 (opts:-f) on QYJ
Stopping /mnt/ost2 (opts:-f) on QYJ

LNetError: 19587:0:(module.c:412:exit_libcfs_module()) Portals memory leaked: 131600 bytes
Memory leaks detected

Comment by Emoly Liu [ 20/Nov/15 ]

Lixi, thanks for your reproducer!

As we discussed, this problem exists in all NRS policies code, not only TBF, so I think it's related to NRS state setting. Your patch does work, but we need to know the root cause and then fix it.

Comment by Li Xi (Inactive) [ 20/Nov/15 ]

I don't think it is NRS state setting problem. If all funcitons of op_policy_start() always free all memory they allocats when failure happends, no memory will be leaked. And I checked all nrs_*_start functions. I didn't find any problem.

Comment by Gerrit Updater [ 08/Feb/17 ]

Emoly Liu (emoly.liu@intel.com) uploaded a new patch: https://review.whamcloud.com/25319
Subject: LU-7441 nrs: some code cleanup in NRS policies
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 740df75fe6952bada2b505a7bd751aff09a07e94

Comment by Gerrit Updater [ 01/Mar/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/17224/
Subject: LU-7441 nrs: Free hash table if failed to start a nrs policy
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: cd362fa9186a3e4de34c7c68908e6d3d429bb087

Comment by Joseph Gmitter (Inactive) [ 07/Jul/17 ]

Issue here is resolved and landed for 2.10.0. The code cleanup patch is moved to a separate ticket: LU-9750.

Generated at Sat Feb 10 02:08:56 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.