Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13923

LNet: lnetctl "lnet unconfigure" or "net del" hangs if executed on a gateway

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Steps to reproduce:

      Configure 3 nodes: PeerA (tcp), PeerB (tcp1) and GW1 (tcp, tcp1).

      Configure GW1 to act as a router and add corresponding routes to PeerA/PeerB.

      Verify connectivity between PeerA and PeerB by executing "lnetctl ping" back and forth.

      Execute "lnetctl lnet unconfigure" on GW1. The command hangs.

      The issue affects tag 2.13.55 

      Attachments

        Issue Links

          Activity

            [LU-13923] LNet: lnetctl "lnet unconfigure" or "net del" hangs if executed on a gateway
            hornc Chris Horn added a comment -

            This ticket is a duplicate of LU-13896.

            hornc Chris Horn added a comment - This ticket is a duplicate of LU-13896 .

            Serguei Smirnov (ssmirnov@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39731
            Subject: LU-13923 lnet: Add missing lnet_peer_ni_decref call
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 8564e47bc2d37102e742f8e688d87291ff39780a

            gerrit Gerrit Updater added a comment - Serguei Smirnov (ssmirnov@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39731 Subject: LU-13923 lnet: Add missing lnet_peer_ni_decref call Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 8564e47bc2d37102e742f8e688d87291ff39780a
            ssmirnov Serguei Smirnov added a comment - - edited

            The issue has been introduced by LU-13606

            https://review.whamcloud.com/38798

            The change added a call to lnet_nid2peerni_locked which is incrementing a reference on a peer_ni object which is never decremented. This results in LNet being unable to cleanup properly on "unconfigure" or "net del" operation, causing lnetctl to hang.

            ssmirnov Serguei Smirnov added a comment - - edited The issue has been introduced by LU-13606 :  https://review.whamcloud.com/38798 The change added a call to lnet_nid2peerni_locked which is incrementing a reference on a peer_ni object which is never decremented. This results in LNet being unable to cleanup properly on "unconfigure" or "net del" operation, causing lnetctl to hang.

            People

              ssmirnov Serguei Smirnov
              ssmirnov Serguei Smirnov
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: