Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.11.0
    • Lustre 2.11.0
    • None
    • 9223372036854775807

    Description

      In lnet_discover() the buffer allocated by LIBCFS_ALLOC(buf, n_ids * sizeof(*buf)) is never freed.

      Attachments

        Issue Links

          Activity

            [LU-9909] memory leak in lnet_discover()
            pjones Peter Jones added a comment -

            Landed for 2.11

            pjones Peter Jones added a comment - Landed for 2.11

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28702/
            Subject: LU-9909 lnet: fix memory leak and lnet_interfaces_max
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 81d4f7a253193ebfe559f675d3c0975c0899d592

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28702/ Subject: LU-9909 lnet: fix memory leak and lnet_interfaces_max Project: fs/lustre-release Branch: master Current Patch Set: Commit: 81d4f7a253193ebfe559f675d3c0975c0899d592

            Amir Shehata (amir.shehata@intel.com) uploaded a new patch: https://review.whamcloud.com/28702
            Subject: LU-9909 lnet: fix memory leak and lnet_interfaces_max
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 52ea572d5e828584ba49f703f0161407384323a2

            gerrit Gerrit Updater added a comment - Amir Shehata (amir.shehata@intel.com) uploaded a new patch: https://review.whamcloud.com/28702 Subject: LU-9909 lnet: fix memory leak and lnet_interfaces_max Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 52ea572d5e828584ba49f703f0161407384323a2

            I created LU-9913 to capture the lnet_peer memory leak error.

            jamesanunez James Nunez (Inactive) added a comment - I created LU-9913 to capture the lnet_peer memory leak error.

            If the leak is an lnet_peer then it differs from the problem for which this LU was opened.

            olaf Olaf Weber (Inactive) added a comment - If the leak is an lnet_peer then it differs from the problem for which this LU was opened.
            jhammond John Hammond added a comment -

            I bisected this locally by running conf-sanity 35a. This was introduced by commit 0f1aaad4c1b4447ee5097b8bb79a49d09eaa23c2 https://review.whamcloud.com/25789 LU-9480 lnet: implement Peer Discovery. Unfortunately leak finder doesn't work for LNet allocations. But the leak is most likely an LNet peer:

            m:lustre-release# gdb lnet/lnet/lnet.ko
            ...
            (gdb) p sizeof(struct lnet_peer)
            $1 = 184
            
            jhammond John Hammond added a comment - I bisected this locally by running conf-sanity 35a. This was introduced by commit 0f1aaad4c1b4447ee5097b8bb79a49d09eaa23c2 https://review.whamcloud.com/25789 LU-9480 lnet: implement Peer Discovery. Unfortunately leak finder doesn't work for LNet allocations. But the leak is most likely an LNet peer: m:lustre-release# gdb lnet/lnet/lnet.ko ... (gdb) p sizeof(struct lnet_peer) $1 = 184

            So far, the earliest date I see the 'Portals memory leaked' error is on August 22. So, yes, they started after the last batch of patch landings to master.

            jamesanunez James Nunez (Inactive) added a comment - So far, the earliest date I see the 'Portals memory leaked' error is on August 22. So, yes, they started after the last batch of patch landings to master.

            Did that just start happening after the latest landing? or has it been happening for a while?

            ashehata Amir Shehata (Inactive) added a comment - Did that just start happening after the latest landing? or has it been happening for a while?

            I'm seeing several of our tests failing with

            [13455.203067] LNetError: 14901:0:(module.c:689:libcfs_exit()) Portals memory leaked: 184 bytes
            mv: cannot stat '/tmp/debug': No such file or directory
            Memory leaks detected
            

            Is this the same issue?

            Logs for this failure can be found at
            https://testing.hpdd.intel.com/test_sets/972a44c6-87e0-11e7-b4b0-5254006e85c2

            jamesanunez James Nunez (Inactive) added a comment - I'm seeing several of our tests failing with [13455.203067] LNetError: 14901:0:(module.c:689:libcfs_exit()) Portals memory leaked: 184 bytes mv: cannot stat '/tmp/debug': No such file or directory Memory leaks detected Is this the same issue? Logs for this failure can be found at https://testing.hpdd.intel.com/test_sets/972a44c6-87e0-11e7-b4b0-5254006e85c2

            thanks for catching that. Will address it.

            ashehata Amir Shehata (Inactive) added a comment - thanks for catching that. Will address it.

            People

              ashehata Amir Shehata (Inactive)
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: