Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.15.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      When I try to mount lustre with GSS (SSK) enabled I receive checksum errors using a multi-rail client where I do not when using only a single interface.  My guess is the NID is encoded in the checksum though I haven't dug into the cause yet.  I also had lots of errors when using GSS on multi-rail servers although the errors were different.

      [154311.786639] LustreError: 194908:0:(gss_sk_mech.c:388:sk_verify_hmac()) checksum mismatch
      [154311.798154] LustreError: 194908:0:(sec_gss.c:242:gss_verify_msg()) mic verify error: 00060000
      [154311.810015] LustreError: 194908:0:(sec_gss.c:2125:gss_svc_verify_request()) failed to verify request: 60000
      

      Attachments

        Activity

          [LU-15047] GSS and multi-rail incompatibility
          pjones Peter Jones added a comment -

          Landed for 2.15

          pjones Peter Jones added a comment - Landed for 2.15

          "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45277/
          Subject: LU-15047 gss: gss integrity check with multi-rail
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: c8301a65c5672a1d081669343466746df983eabc

          gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45277/ Subject: LU-15047 gss: gss integrity check with multi-rail Project: fs/lustre-release Branch: master Current Patch Set: Commit: c8301a65c5672a1d081669343466746df983eabc

          If the primary NID was deleted by a user command I think it would trigger (lnet_peer_del_nid()).  In the code it looks like it could replace the lp_primary_nid but when I just ran "lnetctl peer del" it seemed to delete the whole peer not just the primary NID.  Not sure if that is the intended way things work but this was my concern.

          jfilizetti Jeremy Filizetti added a comment - If the primary NID was deleted by a user command I think it would trigger (lnet_peer_del_nid()).  In the code it looks like it could replace the lp_primary_nid but when I just ran "lnetctl peer del" it seemed to delete the whole peer not just the primary NID.  Not sure if that is the intended way things work but this was my concern.

          ssmirnov ashehata in patch https://review.whamcloud.com/45277 Jeremy is asking about the possibility that the primary NID for a node is changed. What events could trigger such a change?

          sebastien Sebastien Buisson added a comment - ssmirnov ashehata in patch https://review.whamcloud.com/45277 Jeremy is asking about the possibility that the primary NID for a node is changed. What events could trigger such a change?

          "Sebastien Buisson <sbuisson@ddn.com>" uploaded a new patch: https://review.whamcloud.com/45277
          Subject: LU-15047 gss: gss integrity check with multi-rail
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: bcc1cc38a2286b39c78464f7fd34f237a66fd2be

          gerrit Gerrit Updater added a comment - "Sebastien Buisson <sbuisson@ddn.com>" uploaded a new patch: https://review.whamcloud.com/45277 Subject: LU-15047 gss: gss integrity check with multi-rail Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: bcc1cc38a2286b39c78464f7fd34f237a66fd2be

          Indeed, GSS must make use of primary NIDs on both ends of the communication channel, so that the computed HMAC is based on these unique identifiers rather than the actual NIDs being used for the current request.

          sebastien Sebastien Buisson added a comment - Indeed, GSS must make use of primary NIDs on both ends of the communication channel, so that the computed HMAC is based on these unique identifiers rather than the actual NIDs being used for the current request.

          People

            sebastien Sebastien Buisson
            jfilizetti Jeremy Filizetti
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: