Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
Unlikely to be hit in real world, but there's a potential memory leak in lnet_peer_data_present. If the ping buffer has nnis <= 1 then function is exited without dropping the ref on the ping buffer causing this memory to leak:
if (pbuf->pb_info.pi_nnis <= 1) goto out;
Attachments
Activity
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46052/
Subject: LU-15440 lnet: lnet_peer_data_present() memory leak
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 56384a4fc39ff99c8abb3538f93d303f2be6ab45
Test report for LU-15440:
Build/execute test case from patch:
[hornc@ct7-adm lustre-filesystem]$ git le HEAD^..HEAD fbbc1258a0 (HEAD) LU-15478 lnet: Check LNET_NID_IS_ANY in LNET_NID_NET [hornc@ct7-adm lustre-filesystem]$ git fetch https://review.whamcloud.com/fs/lustre-release refs/changes/52/46052/2 && git cherry-pick FETCH_HEAD From https://review.whamcloud.com/fs/lustre-release * branch refs/changes/52/46052/2 -> FETCH_HEAD Auto-merging lnet/lnet/peer.c [detached HEAD 6c7815e9e1] LU-15440 lnet: lnet_peer_data_present() memory leak Date: Tue Jan 11 16:19:16 2022 -0600 2 files changed, 16 insertions(+), 1 deletion(-) [hornc@ct7-adm lustre-filesystem]$ git reset --soft HEAD^ [hornc@ct7-adm lustre-filesystem]$ git status HEAD detached from fb5d7036ec Changes to be committed: (use "git restore --staged <file>..." to unstage) modified: lnet/lnet/peer.c modified: lustre/tests/sanity-lnet.sh Untracked files: (use "git add <file>..." to include in what will be committed) lustre/tests/lutf/Makefile.in lustre/tests/lutf/src/Makefile.in [hornc@ct7-adm lustre-filesystem]$ git reset HEAD lnet/lnet/peer.c Unstaged changes after reset: M lnet/lnet/peer.c [hornc@ct7-adm lustre-filesystem]$ g co lnet/lnet/peer.c Updated 1 path from the index [hornc@ct7-adm lustre-filesystem]$ git --no-pager diff --cached diff --git a/lustre/tests/sanity-lnet.sh b/lustre/tests/sanity-lnet.sh index 860f712da7..72e28eb497 100755 --- a/lustre/tests/sanity-lnet.sh +++ b/lustre/tests/sanity-lnet.sh @@ -2335,6 +2335,19 @@ test_216() { } run_test 216 "Failed send to peer NI owned by local host should not trigger peer NI recovery" +test_217() { + reinit_dlc || return $? + + [[ $($LNETCTL net show | grep -c nid) -ne 1 ]] && + error "Unexpected number of NIs after initalizing DLC" + + do_lnetctl discover 0@lo || + error "Failed to discover 0@lo" + + unload_modules +} +run_test 217 "Don't leak memory when discovering peer with nnis <= 1" + test_230() { # LU-12815 echo "Check valid values; Should succeed" [hornc@ct7-adm lustre-filesystem]$ make -j 32 ... [root@ct7-adm tests]# ./auster -N -v sanity-lnet --only 217 Started at Sat Jan 29 01:55:10 UTC 2022 ct7-adm: executing check_logdir /tmp/test_logs/2022-01-29/015510 Logging to shared log directory: /tmp/test_logs/2022-01-29/015510 ct7-adm: executing yml_node IOC_LIBCFS_GET_NI error 22: Invalid argument Client: 2.14.57.60 MDS: 2.14.57.60 OSS: 2.14.57.60 running: sanity-lnet ONLY=217 run_suite sanity-lnet /home/hornc/lustre-filesystem/lustre/tests/sanity-lnet.sh -----============= acceptance-small: sanity-lnet ============----- Sat Jan 29 01:55:12 UTC 2022 Running: bash /home/hornc/lustre-filesystem/lustre/tests/sanity-lnet.sh excepting tests: opening /dev/obd failed: No such file or directory hint: the kernel modules may not be loaded Stopping clients: ct7-adm /mnt/lustre (opts:-f) Stopping clients: ct7-adm /mnt/lustre2 (opts:-f) modules unloaded. ip netns exec test_ns ip addr add 10.1.2.3/31 dev test1pg ip netns exec test_ns ip link set test1pg up Loading modules from /home/hornc/lustre-filesystem/lustre detected 2 online CPUs by sysfs Force libcfs to create 2 CPU partitions ../libcfs/libcfs/libcfs options: 'cpu_npartitions=2' ../lnet/lnet/lnet options: 'networks=tcp(eth1) accept=all' ptlrpc/ptlrpc options: 'lbug_on_grant_miscount=1' quota/lquota options: 'hash_lqs_cur_bits=3' /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl net show net: - net type: lo local NI(s): - nid: 0@lo status: up - net type: tcp local NI(s): - nid: 10.73.10.10@tcp status: up interfaces: 0: eth1 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 52:54:00:4d:77:d3 brd ff:ff:ff:ff:ff:ff inet 10.0.2.15/24 brd 10.0.2.255 scope global noprefixroute dynamic eth0 valid_lft 73891sec preferred_lft 73891sec inet6 fe80::5054:ff:fe4d:77d3/64 scope link valid_lft forever preferred_lft forever 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 08:00:27:27:de:86 brd ff:ff:ff:ff:ff:ff inet 10.73.10.10/24 brd 10.73.10.255 scope global noprefixroute eth1 valid_lft forever preferred_lft forever inet6 fe80::a00:27ff:fe27:de86/64 scope link valid_lft forever preferred_lft forever 6: test1pl@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 92:fb:87:4d:80:ea brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet6 fe80::90fb:87ff:fe4d:80ea/64 scope link valid_lft forever preferred_lft forever Cleaning up LNet modules unloaded. == sanity-lnet test 217: Don't leak memory when discovering peer with nnis <= 1 ========================================================== 01:55:17 (1643421317) Loading LNet and configuring DLC ../lnet/lnet/lnet options: 'networks=tcp(eth1) accept=all' /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl lnet configure /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl discover 0@lo discover: - primary nid: 0@lo Multi-Rail: True peer ni: opening /dev/obd failed: No such file or directory hint: the kernel modules may not be loaded [13367.438649] LNetError: 12268:0:(module.c:919:libcfs_exit()) Portals memory leaked: 322 bytes mv: cannot stat '/tmp/debug': No such file or directory Memory leaks detected sanity-lnet test_217: @@@@@@ FAIL: test_217 failed with 254 Trace dump: = /home/hornc/lustre-filesystem/lustre/tests/test-framework.sh:6386:error() = /home/hornc/lustre-filesystem/lustre/tests/test-framework.sh:6690:run_one() = /home/hornc/lustre-filesystem/lustre/tests/test-framework.sh:6737:run_one_logged() = /home/hornc/lustre-filesystem/lustre/tests/test-framework.sh:6563:run_test() = /home/hornc/lustre-filesystem/lustre/tests/sanity-lnet.sh:2349:main() Dumping lctl log to /tmp/test_logs/2022-01-29/015510/sanity-lnet.test_217.*.1643421319.log Dumping logs only on local client. test_217 returned 1 FAIL 217 (2s) Cleaning up LNet [13367.438649] LNetError: 12268:0:(module.c:919:libcfs_exit()) Portals memory leaked: 322 bytes mv: cannot stat '/tmp/debug': No such file or directory Memory leaks detected sanity-lnet returned 254 Finished at Sat Jan 29 01:55:19 UTC 2022 in 9s ./auster: completed with rc 0 [root@ct7-adm tests]#
Apply fix and re-test:
[hornc@ct7-adm lustre-filesystem]$ git fetch https://review.whamcloud.com/fs/lustre-release refs/changes/52/46052/2 && git reset --hard FETCH_HEAD From https://review.whamcloud.com/fs/lustre-release * branch refs/changes/52/46052/2 -> FETCH_HEAD HEAD is now at 1eecd524de LU-15440 lnet: lnet_peer_data_present() memory leak [hornc@ct7-adm lustre-filesystem]$ make -j 32 ... [root@ct7-adm tests]# ./auster -N -v sanity-lnet --only 217 Started at Sat Jan 29 02:00:49 UTC 2022 ct7-adm: executing check_logdir /tmp/test_logs/2022-01-29/020048 Logging to shared log directory: /tmp/test_logs/2022-01-29/020048 ct7-adm: executing yml_node IOC_LIBCFS_GET_NI error 22: Invalid argument Client: 2.14.57.60 MDS: 2.14.57.60 OSS: 2.14.57.60 running: sanity-lnet ONLY=217 run_suite sanity-lnet /home/hornc/lustre-filesystem/lustre/tests/sanity-lnet.sh -----============= acceptance-small: sanity-lnet ============----- Sat Jan 29 02:00:51 UTC 2022 Running: bash /home/hornc/lustre-filesystem/lustre/tests/sanity-lnet.sh excepting tests: opening /dev/obd failed: No such file or directory hint: the kernel modules may not be loaded Stopping clients: ct7-adm /mnt/lustre (opts:-f) Stopping clients: ct7-adm /mnt/lustre2 (opts:-f) modules unloaded. ip netns exec test_ns ip addr add 10.1.2.3/31 dev test1pg ip netns exec test_ns ip link set test1pg up Loading modules from /home/hornc/lustre-filesystem/lustre detected 2 online CPUs by sysfs Force libcfs to create 2 CPU partitions ../libcfs/libcfs/libcfs options: 'cpu_npartitions=2' ../lnet/lnet/lnet options: 'networks=tcp(eth1) accept=all' ptlrpc/ptlrpc options: 'lbug_on_grant_miscount=1' quota/lquota options: 'hash_lqs_cur_bits=3' /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl net show net: - net type: lo local NI(s): - nid: 0@lo status: up - net type: tcp local NI(s): - nid: 10.73.10.10@tcp status: up interfaces: 0: eth1 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 52:54:00:4d:77:d3 brd ff:ff:ff:ff:ff:ff inet 10.0.2.15/24 brd 10.0.2.255 scope global noprefixroute dynamic eth0 valid_lft 73552sec preferred_lft 73552sec inet6 fe80::5054:ff:fe4d:77d3/64 scope link valid_lft forever preferred_lft forever 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 08:00:27:27:de:86 brd ff:ff:ff:ff:ff:ff inet 10.73.10.10/24 brd 10.73.10.255 scope global noprefixroute eth1 valid_lft forever preferred_lft forever inet6 fe80::a00:27ff:fe27:de86/64 scope link valid_lft forever preferred_lft forever 7: test1pl@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 5e:ee:6e:41:3f:ff brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet6 fe80::5cee:6eff:fe41:3fff/64 scope link valid_lft forever preferred_lft forever Cleaning up LNet modules unloaded. == sanity-lnet test 217: Don't leak memory when discovering peer with nnis <= 1 ========================================================== 02:00:55 (1643421655) Loading LNet and configuring DLC ../lnet/lnet/lnet options: 'networks=tcp(eth1) accept=all' /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl lnet configure /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl discover 0@lo discover: - primary nid: 0@lo Multi-Rail: True peer ni: opening /dev/obd failed: No such file or directory hint: the kernel modules may not be loaded modules unloaded. PASS 217 (2s) == sanity-lnet test complete, duration 6 sec ============= 02:00:57 (1643421657) Cleaning up LNet modules unloaded. sanity-lnet returned 0 Finished at Sat Jan 29 02:00:58 UTC 2022 in 10s ./auster: completed with rc 0 [root@ct7-adm tests]#
"Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/46052
Subject: LU-15440 lnet: lnet_peer_data_present() memory leak
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b7a11fde4d687a92d31313f4c815599c6bcdbf2a
Landed for 2.15