Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9913

conf-sanity tests 31 and 35a fail with “LNetError: 8653:0:(module.c:689:libcfs_exit()) Portals memory leaked: 184 bytes”

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.11.0
    • Lustre 2.11.0
    • None
    • 3
    • 9223372036854775807

    Description

      A variety of conf-sanity tests are failing with an LNET memory leak.

      conf-sanity test_31 fails with the following error found in the test_log:

      [13800.637071] LNetError: 14072:0:(module.c:689:libcfs_exit()) Portals memory leaked: 184 bytes
      mv: cannot stat '/tmp/debug': No such file or directory
      Memory leaks detected
       conf-sanity test_31: @@@@@@ FAIL: cleanup failed with rc 203 
      

      We also see conf-sanity tests 0, 35a, and 78 fail with the same memory leak error.

      These tests started failing on August 22, 2017. The logs for the first few failures are at
      https://testing.hpdd.intel.com/test_sets/b644f444-87ab-11e7-b4b0-5254006e85c2
      https://testing.hpdd.intel.com/test_sets/17c3eaf0-87c9-11e7-b3ca-5254006e85c2
      https://testing.hpdd.intel.com/test_sets/7c038e52-87ca-11e7-b4b0-5254006e85c2

      Attachments

        Activity

          [LU-9913] conf-sanity tests 31 and 35a fail with “LNetError: 8653:0:(module.c:689:libcfs_exit()) Portals memory leaked: 184 bytes”
          pjones Peter Jones added a comment -

          Landed for 2,11

          pjones Peter Jones added a comment - Landed for 2,11

          Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28695/
          Subject: LU-9913 lnet: balance references in lnet_discover_peer_locked()
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 1c45d9051764e0637ba90b3db06ba8fa37722916

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28695/ Subject: LU-9913 lnet: balance references in lnet_discover_peer_locked() Project: fs/lustre-release Branch: master Current Patch Set: Commit: 1c45d9051764e0637ba90b3db06ba8fa37722916

          It would seem some test suites are not even able to launch due to this same bug: https://testing.hpdd.intel.com/test_sets/e7473514-8915-11e7-b94a-5254006e85c2. The same LNetError happens and fails the test suite early (e.g. https://testing.hpdd.intel.com/test_logs/ebb5656c-8915-11e7-b94a-5254006e85c2/show_text)

          bougetq Quentin Bouget (Inactive) added a comment - It would seem some test suites are not even able to launch due to this same bug: https://testing.hpdd.intel.com/test_sets/e7473514-8915-11e7-b94a-5254006e85c2 . The same LNetError happens and fails the test suite early (e.g. https://testing.hpdd.intel.com/test_logs/ebb5656c-8915-11e7-b94a-5254006e85c2/show_text )
          bougetq Quentin Bouget (Inactive) added a comment - test_90a seems affected too: https://testing.hpdd.intel.com/test_sets/c0ce9d7c-8916-11e7-b50a-5254006e85c2

          I agree with your diagnosis.

          olaf Olaf Weber (Inactive) added a comment - I agree with your diagnosis.
          jhammond John Hammond added a comment -

          Copied over from LU-9909.

          I bisected this locally by running conf-sanity 35a. This was introduced by commit 0f1aaad4c1b4447ee5097b8bb79a49d09eaa23c2 https://review.whamcloud.com/25789 LU-9480 lnet: implement Peer Discovery. Unfortunately leak finder doesn't work for LNet allocations. But the leak is an LNet peer.

          jhammond John Hammond added a comment - Copied over from LU-9909 . I bisected this locally by running conf-sanity 35a. This was introduced by commit 0f1aaad4c1b4447ee5097b8bb79a49d09eaa23c2 https://review.whamcloud.com/25789 LU-9480 lnet: implement Peer Discovery. Unfortunately leak finder doesn't work for LNet allocations. But the leak is an LNet peer.

          John L. Hammond (john.hammond@intel.com) uploaded a new patch: https://review.whamcloud.com/28695
          Subject: LU-9913 lnet: balance references in lnet_discover_peer_locked()
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 966e33c077f504b32df3687b103173f6eb2fb35f

          gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) uploaded a new patch: https://review.whamcloud.com/28695 Subject: LU-9913 lnet: balance references in lnet_discover_peer_locked() Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 966e33c077f504b32df3687b103173f6eb2fb35f

          People

            ashehata Amir Shehata (Inactive)
            jamesanunez James Nunez (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: