Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17665

LNet: lock primary NID only on a peer constructed by Lustre

Details

    • 3
    • 9223372036854775807

    Description

      Primary NID locking is useful when client is connecting to a server and peer representation is provided to LNet by Lustre via LNetAddPeer API - as interpreted from the mount command string or llog.

      When the server responds to a client, there's no need to lock the client's primary NID.

      Attachments

        Activity

          [LU-17665] LNet: lock primary NID only on a peer constructed by Lustre
          pjones Peter Jones added a comment -

          Merged for 2.16

          pjones Peter Jones added a comment - Merged for 2.16

          "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/54539/
          Subject: LU-17665 lnet: lock primary NID only on lustre-built peer
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 90ec7361b756e685572bd168643d61bb2f4a85c4

          gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/54539/ Subject: LU-17665 lnet: lock primary NID only on lustre-built peer Project: fs/lustre-release Branch: master Current Patch Set: Commit: 90ec7361b756e685572bd168643d61bb2f4a85c4
          ssmirnov Serguei Smirnov added a comment - - edited

          The client can go away and come back later with a different set of NIDs. With the server locking client NIDs, it is possible that NID which used to belong to the client and got locked as primary on the server now belongs to a different client. The server will attempt to merge peer records, but may be unable to do it properly because deleting of the primary NID is not allowed.

          Let square brackets designate actual node configuration and round brackets - LNet view of the peer. For example, suppose the client with NIDs [A,B] connects to the server. The server creates peer record (pA,B) where NID A is primary. Then the same client goes away and comes back only with NID B configured. This results in "[B] -  (pA,B)" because the server won't delete the locked NID A for the client. If another client comes up with NID A configured, there will be a conflict because the server will delete NID B from the existing peer record, so the original peer is kind of hijacked for the new client: "[A] - (pA), [B] - ?".

          Not sure exactly what this may cause from the FS point of view, but it looks wrong. The 54539 patch is trying to avoid such confusion by making sure that the server discovers the client NIDs before using them.

          ssmirnov Serguei Smirnov added a comment - - edited The client can go away and come back later with a different set of NIDs. With the server locking client NIDs, it is possible that NID which used to belong to the client and got locked as primary on the server now belongs to a different client. The server will attempt to merge peer records, but may be unable to do it properly because deleting of the primary NID is not allowed. Let square brackets designate actual node configuration and round brackets - LNet view of the peer. For example, suppose the client with NIDs [A,B] connects to the server. The server creates peer record (pA,B) where NID A is primary. Then the same client goes away and comes back only with NID B configured. This results in " [B] -  (pA,B)" because the server won't delete the locked NID A for the client. If another client comes up with NID A configured, there will be a conflict because the server will delete NID B from the existing peer record, so the original peer is kind of hijacked for the new client: " [A] - (pA), [B] - ?". Not sure exactly what this may cause from the FS point of view, but it looks wrong. The 54539 patch is trying to avoid such confusion by making sure that the server discovers the client NIDs before using them.

          "Serguei Smirnov <ssmirnov@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54539
          Subject: LU-17665 lnet: lock primary NID only on lustre-built peer
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 39595d9fbf64d8b2d16e3ddd9db7dbd05abc5e40

          gerrit Gerrit Updater added a comment - "Serguei Smirnov <ssmirnov@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/54539 Subject: LU-17665 lnet: lock primary NID only on lustre-built peer Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 39595d9fbf64d8b2d16e3ddd9db7dbd05abc5e40

          People

            ssmirnov Serguei Smirnov
            ssmirnov Serguei Smirnov
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: