[LU-15440] Memory leak in discovery Created: 11/Jan/22 Updated: 29/Jul/23 Resolved: 07/Feb/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Chris Horn | Assignee: | Chris Horn |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Unlikely to be hit in real world, but there's a potential memory leak in lnet_peer_data_present. If the ping buffer has nnis <= 1 then function is exited without dropping the ref on the ping buffer causing this memory to leak: if (pbuf->pb_info.pi_nnis <= 1)
goto out;
|
| Comments |
| Comment by Gerrit Updater [ 11/Jan/22 ] |
|
"Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/46052 |
| Comment by Chris Horn [ 31/Jan/22 ] |
|
Test report for Build/execute test case from patch: [hornc@ct7-adm lustre-filesystem]$ git le HEAD^..HEAD
fbbc1258a0 (HEAD) LU-15478 lnet: Check LNET_NID_IS_ANY in LNET_NID_NET
[hornc@ct7-adm lustre-filesystem]$ git fetch https://review.whamcloud.com/fs/lustre-release refs/changes/52/46052/2 && git cherry-pick FETCH_HEAD
From https://review.whamcloud.com/fs/lustre-release
* branch refs/changes/52/46052/2 -> FETCH_HEAD
Auto-merging lnet/lnet/peer.c
[detached HEAD 6c7815e9e1] LU-15440 lnet: lnet_peer_data_present() memory leak
Date: Tue Jan 11 16:19:16 2022 -0600
2 files changed, 16 insertions(+), 1 deletion(-)
[hornc@ct7-adm lustre-filesystem]$ git reset --soft HEAD^
[hornc@ct7-adm lustre-filesystem]$ git status
HEAD detached from fb5d7036ec
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
modified: lnet/lnet/peer.c
modified: lustre/tests/sanity-lnet.sh
Untracked files:
(use "git add <file>..." to include in what will be committed)
lustre/tests/lutf/Makefile.in
lustre/tests/lutf/src/Makefile.in
[hornc@ct7-adm lustre-filesystem]$ git reset HEAD lnet/lnet/peer.c
Unstaged changes after reset:
M lnet/lnet/peer.c
[hornc@ct7-adm lustre-filesystem]$ g co lnet/lnet/peer.c
Updated 1 path from the index
[hornc@ct7-adm lustre-filesystem]$ git --no-pager diff --cached
diff --git a/lustre/tests/sanity-lnet.sh b/lustre/tests/sanity-lnet.sh
index 860f712da7..72e28eb497 100755
--- a/lustre/tests/sanity-lnet.sh
+++ b/lustre/tests/sanity-lnet.sh
@@ -2335,6 +2335,19 @@ test_216() {
}
run_test 216 "Failed send to peer NI owned by local host should not trigger peer NI recovery"
+test_217() {
+ reinit_dlc || return $?
+
+ [[ $($LNETCTL net show | grep -c nid) -ne 1 ]] &&
+ error "Unexpected number of NIs after initalizing DLC"
+
+ do_lnetctl discover 0@lo ||
+ error "Failed to discover 0@lo"
+
+ unload_modules
+}
+run_test 217 "Don't leak memory when discovering peer with nnis <= 1"
+
test_230() {
# LU-12815
echo "Check valid values; Should succeed"
[hornc@ct7-adm lustre-filesystem]$ make -j 32
...
[root@ct7-adm tests]# ./auster -N -v sanity-lnet --only 217
Started at Sat Jan 29 01:55:10 UTC 2022
ct7-adm: executing check_logdir /tmp/test_logs/2022-01-29/015510
Logging to shared log directory: /tmp/test_logs/2022-01-29/015510
ct7-adm: executing yml_node
IOC_LIBCFS_GET_NI error 22: Invalid argument
Client: 2.14.57.60
MDS: 2.14.57.60
OSS: 2.14.57.60
running: sanity-lnet ONLY=217
run_suite sanity-lnet /home/hornc/lustre-filesystem/lustre/tests/sanity-lnet.sh
-----============= acceptance-small: sanity-lnet ============----- Sat Jan 29 01:55:12 UTC 2022
Running: bash /home/hornc/lustre-filesystem/lustre/tests/sanity-lnet.sh
excepting tests:
opening /dev/obd failed: No such file or directory
hint: the kernel modules may not be loaded
Stopping clients: ct7-adm /mnt/lustre (opts:-f)
Stopping clients: ct7-adm /mnt/lustre2 (opts:-f)
modules unloaded.
ip netns exec test_ns ip addr add 10.1.2.3/31 dev test1pg
ip netns exec test_ns ip link set test1pg up
Loading modules from /home/hornc/lustre-filesystem/lustre
detected 2 online CPUs by sysfs
Force libcfs to create 2 CPU partitions
../libcfs/libcfs/libcfs options: 'cpu_npartitions=2'
../lnet/lnet/lnet options: 'networks=tcp(eth1) accept=all'
ptlrpc/ptlrpc options: 'lbug_on_grant_miscount=1'
quota/lquota options: 'hash_lqs_cur_bits=3'
/home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl net show
net:
- net type: lo
local NI(s):
- nid: 0@lo
status: up
- net type: tcp
local NI(s):
- nid: 10.73.10.10@tcp
status: up
interfaces:
0: eth1
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:4d:77:d3 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.15/24 brd 10.0.2.255 scope global noprefixroute dynamic eth0
valid_lft 73891sec preferred_lft 73891sec
inet6 fe80::5054:ff:fe4d:77d3/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:27:de:86 brd ff:ff:ff:ff:ff:ff
inet 10.73.10.10/24 brd 10.73.10.255 scope global noprefixroute eth1
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fe27:de86/64 scope link
valid_lft forever preferred_lft forever
6: test1pl@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 92:fb:87:4d:80:ea brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::90fb:87ff:fe4d:80ea/64 scope link
valid_lft forever preferred_lft forever
Cleaning up LNet
modules unloaded.
== sanity-lnet test 217: Don't leak memory when discovering peer with nnis <= 1 ========================================================== 01:55:17 (1643421317)
Loading LNet and configuring DLC
../lnet/lnet/lnet options: 'networks=tcp(eth1) accept=all'
/home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl lnet configure
/home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl discover 0@lo
discover:
- primary nid: 0@lo
Multi-Rail: True
peer ni:
opening /dev/obd failed: No such file or directory
hint: the kernel modules may not be loaded
[13367.438649] LNetError: 12268:0:(module.c:919:libcfs_exit()) Portals memory leaked: 322 bytes
mv: cannot stat '/tmp/debug': No such file or directory
Memory leaks detected
sanity-lnet test_217: @@@@@@ FAIL: test_217 failed with 254
Trace dump:
= /home/hornc/lustre-filesystem/lustre/tests/test-framework.sh:6386:error()
= /home/hornc/lustre-filesystem/lustre/tests/test-framework.sh:6690:run_one()
= /home/hornc/lustre-filesystem/lustre/tests/test-framework.sh:6737:run_one_logged()
= /home/hornc/lustre-filesystem/lustre/tests/test-framework.sh:6563:run_test()
= /home/hornc/lustre-filesystem/lustre/tests/sanity-lnet.sh:2349:main()
Dumping lctl log to /tmp/test_logs/2022-01-29/015510/sanity-lnet.test_217.*.1643421319.log
Dumping logs only on local client.
test_217 returned 1
FAIL 217 (2s)
Cleaning up LNet
[13367.438649] LNetError: 12268:0:(module.c:919:libcfs_exit()) Portals memory leaked: 322 bytes
mv: cannot stat '/tmp/debug': No such file or directory
Memory leaks detected
sanity-lnet returned 254
Finished at Sat Jan 29 01:55:19 UTC 2022 in 9s
./auster: completed with rc 0
[root@ct7-adm tests]#
Apply fix and re-test: [hornc@ct7-adm lustre-filesystem]$ git fetch https://review.whamcloud.com/fs/lustre-release refs/changes/52/46052/2 && git reset --hard FETCH_HEAD
From https://review.whamcloud.com/fs/lustre-release
* branch refs/changes/52/46052/2 -> FETCH_HEAD
HEAD is now at 1eecd524de LU-15440 lnet: lnet_peer_data_present() memory leak
[hornc@ct7-adm lustre-filesystem]$ make -j 32
...
[root@ct7-adm tests]# ./auster -N -v sanity-lnet --only 217
Started at Sat Jan 29 02:00:49 UTC 2022
ct7-adm: executing check_logdir /tmp/test_logs/2022-01-29/020048
Logging to shared log directory: /tmp/test_logs/2022-01-29/020048
ct7-adm: executing yml_node
IOC_LIBCFS_GET_NI error 22: Invalid argument
Client: 2.14.57.60
MDS: 2.14.57.60
OSS: 2.14.57.60
running: sanity-lnet ONLY=217
run_suite sanity-lnet /home/hornc/lustre-filesystem/lustre/tests/sanity-lnet.sh
-----============= acceptance-small: sanity-lnet ============----- Sat Jan 29 02:00:51 UTC 2022
Running: bash /home/hornc/lustre-filesystem/lustre/tests/sanity-lnet.sh
excepting tests:
opening /dev/obd failed: No such file or directory
hint: the kernel modules may not be loaded
Stopping clients: ct7-adm /mnt/lustre (opts:-f)
Stopping clients: ct7-adm /mnt/lustre2 (opts:-f)
modules unloaded.
ip netns exec test_ns ip addr add 10.1.2.3/31 dev test1pg
ip netns exec test_ns ip link set test1pg up
Loading modules from /home/hornc/lustre-filesystem/lustre
detected 2 online CPUs by sysfs
Force libcfs to create 2 CPU partitions
../libcfs/libcfs/libcfs options: 'cpu_npartitions=2'
../lnet/lnet/lnet options: 'networks=tcp(eth1) accept=all'
ptlrpc/ptlrpc options: 'lbug_on_grant_miscount=1'
quota/lquota options: 'hash_lqs_cur_bits=3'
/home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl net show
net:
- net type: lo
local NI(s):
- nid: 0@lo
status: up
- net type: tcp
local NI(s):
- nid: 10.73.10.10@tcp
status: up
interfaces:
0: eth1
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:4d:77:d3 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.15/24 brd 10.0.2.255 scope global noprefixroute dynamic eth0
valid_lft 73552sec preferred_lft 73552sec
inet6 fe80::5054:ff:fe4d:77d3/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:27:de:86 brd ff:ff:ff:ff:ff:ff
inet 10.73.10.10/24 brd 10.73.10.255 scope global noprefixroute eth1
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fe27:de86/64 scope link
valid_lft forever preferred_lft forever
7: test1pl@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 5e:ee:6e:41:3f:ff brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::5cee:6eff:fe41:3fff/64 scope link
valid_lft forever preferred_lft forever
Cleaning up LNet
modules unloaded.
== sanity-lnet test 217: Don't leak memory when discovering peer with nnis <= 1 ========================================================== 02:00:55 (1643421655)
Loading LNet and configuring DLC
../lnet/lnet/lnet options: 'networks=tcp(eth1) accept=all'
/home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl lnet configure
/home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl discover 0@lo
discover:
- primary nid: 0@lo
Multi-Rail: True
peer ni:
opening /dev/obd failed: No such file or directory
hint: the kernel modules may not be loaded
modules unloaded.
PASS 217 (2s)
== sanity-lnet test complete, duration 6 sec ============= 02:00:57 (1643421657)
Cleaning up LNet
modules unloaded.
sanity-lnet returned 0
Finished at Sat Jan 29 02:00:58 UTC 2022 in 10s
./auster: completed with rc 0
[root@ct7-adm tests]#
|
| Comment by Gerrit Updater [ 07/Feb/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46052/ |
| Comment by Peter Jones [ 07/Feb/22 ] |
|
Landed for 2.15 |