Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9073

SSK: lgss_sk generates keys with invalid HMAC and Crypto algorithms

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.10.0
    • Lustre 2.10.0
    • None
    • 3
    • 9223372036854775807

    Description

      With the landing of commit c6f5e8121366be05765dabe0008165166d3f431c for LU-8602, lgss_sk now generates keys with invalid HMAC and Crypto algorithms. The HMAC and Crypto algorithms are being swapped.

      == Master HEAD at c6f5e8121366be05765dabe0008165166d3f431c ==

      1. lgss_sk -t server -f testfs -w testfs_test_with_LU-8602.key -d /dev/urandom
        Reading random data for shared key from '/dev/urandom'
      1. lgss_sk -r testfs_test_with_LU-8602.key
        warning: secret key 'testfs_test_with_LU-8602.key' has insecure file mode 0100400
        Version: 1
        Type: server
        HMAC alg: AES-256-CTR
        Crypto alg: sha256
        Ctx Expiration: 604800 seconds
        Shared keylen: 256 bits
        Prime length: 2048 bits
        File system: testfs
        MGS NIDs:
        Nodemap name: default

      == LU-8602 reverted ==

      1. lgss_sk -t server -f testfs -w testfs_test_without_LU-8602.key -d /dev/urandom
        Reading random data for shared key from '/dev/urandom'
      1. lgss_sk -r testfs_test_without_LU-8602.key
        warning: secret key 'testfs_test_without_LU-8602.key' has insecure file mode 0100400
        Version: 1
        Type: server
        HMAC alg: SHA256
        Crypto alg: AES-256-CTR
        Ctx Expiration: 604800 seconds
        Shared keylen: 256 bits
        Prime length: 2048 bits
        File system: testfs
        MGS NIDs:
        Nodemap name: default
      1. lgss_sk -r testfs_test_with_LU-8602.key
        warning: secret key 'testfs_test_with_LU-8602.key' has insecure file mode 0100400
        Invalid HMAC algorithm
        error: key configuration failed validation

      The problem manifests itself by logging the following when secure contexts are being instantiated:

      kernel: LustreError: 2559:0:(gss_sk_mech.c:172:sk_fill_context()) Invalid hmac type: 65541
      kernel: LustreError: 2559:0:(gss_sk_mech.c:172:sk_fill_context()) Skipped 1 previous similar message
      kernel: LustreError: 2559:0:(gss_svc_upcall.c:668:rsc_parse()) parse rsc error -22
      kernel: LustreError: 2559:0:(gss_svc_upcall.c:668:rsc_parse()) Skipped 1 previous similar message
      kernel: LustreError: 2450:0:(gss_svc_upcall.c:1018:gss_svc_upcall_handle_init()) authentication failed

      Attachments

        Issue Links

          Activity

            [LU-9073] SSK: lgss_sk generates keys with invalid HMAC and Crypto algorithms
            pjones Peter Jones added a comment -

            After some discussion we elected to land the reversion option for 2.10.0 and then aim to sort out GSS support for new kernel versions post-2.10.0. The rationale is that this keeps consistent behaviour with 2.9 for the officially supported distros.

            pjones Peter Jones added a comment - After some discussion we elected to land the reversion option for 2.10.0 and then aim to sort out GSS support for new kernel versions post-2.10.0. The rationale is that this keeps consistent behaviour with 2.9 for the officially supported distros.

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27823/
            Subject: LU-9073 gss: remove newer kernel support
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 2c27b194121665061cc0527e8bef35886ec7fea8

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27823/ Subject: LU-9073 gss: remove newer kernel support Project: fs/lustre-release Branch: master Current Patch Set: Commit: 2c27b194121665061cc0527e8bef35886ec7fea8

            Chris, did you +1 on the right patch?

            sbuisson Sebastien Buisson (Inactive) added a comment - Chris, did you +1 on the right patch?

            Sebastien, Chris, could you please mark +1 on the patch if it is working for you, so that we can land it.

            adilger Andreas Dilger added a comment - Sebastien, Chris, could you please mark +1 on the patch if it is working for you, so that we can land it.

            After spending some time with the latest reversion from James, it appears to have fixed the issue. We were able to run sanity-sec and sanity for some time. There were some errors, but SSK was engaged any many tests did pass. I would proceed with the reversion if it is holding you back, and the errors can be diagnosed when time permits.

            hannac Chris Hanna (Inactive) added a comment - After spending some time with the latest reversion from James, it appears to have fixed the issue. We were able to run sanity-sec and sanity for some time. There were some errors, but SSK was engaged any many tests did pass. I would proceed with the reversion if it is holding you back, and the errors can be diagnosed when time permits.

            With the patch 'LU-9073 gss: remove newer kernel support' at https://review.whamcloud.com/27823, I do not get any error message on server side when a client running krb5n flavor is unmounted:

            juin 27 05:30:51 ltest-vm4 kernel: Lustre: 11812:0:(sec_gss.c:2323:gss_svc_handle_destroy()) destroy svc ctx ffff8803fbd6bc40 idx 0x31cb7c0f1b298198 (0->10.128.11.159@tcp)
            juin 27 05:30:56 ltest-vm4 kernel: Lustre: 11764:0:(sec_gss.c:1222:gss_cli_ctx_fini_common()) reverse sec ffff8800364a4500: destroy ctx ffff880036603780
            

            That being said, the problem mentioned earlier is not blocking, and could be tackled in another Jira. I would support the idea of having patch at https://review.whamcloud.com/25199 landed, in order not to block support for GSS with newer kernels, knowing that it gives 'checksum mismatch' errors on server side when clients are unmounted.

            What do you think?

            sbuisson Sebastien Buisson (Inactive) added a comment - With the patch ' LU-9073 gss: remove newer kernel support' at https://review.whamcloud.com/27823 , I do not get any error message on server side when a client running krb5n flavor is unmounted: juin 27 05:30:51 ltest-vm4 kernel: Lustre: 11812:0:(sec_gss.c:2323:gss_svc_handle_destroy()) destroy svc ctx ffff8803fbd6bc40 idx 0x31cb7c0f1b298198 (0->10.128.11.159@tcp) juin 27 05:30:56 ltest-vm4 kernel: Lustre: 11764:0:(sec_gss.c:1222:gss_cli_ctx_fini_common()) reverse sec ffff8800364a4500: destroy ctx ffff880036603780 That being said, the problem mentioned earlier is not blocking, and could be tackled in another Jira. I would support the idea of having patch at https://review.whamcloud.com/25199 landed, in order not to block support for GSS with newer kernels, knowing that it gives 'checksum mismatch' errors on server side when clients are unmounted. What do you think?
            pjones Peter Jones added a comment -

            hannac could you please need out the behaviour on RHEL7.x with James's latest reversion patch? We don't want to release 2.10.0 with a drop in functionality compared to 2.9.

            pjones Peter Jones added a comment - hannac could you please need out the behaviour on RHEL7.x with James's latest reversion patch? We don't want to release 2.10.0 with a drop in functionality compared to 2.9.

            James, Nodemap is needed for SSK (see Lustre manual 24.5) in a multinode setup. It does not necessarily need to be an activated feature, however. The lgssc.conf file should be created by the test-framework.sh script. If the nodemap is not set correctly set up, you definitely will see strange issues.

            SSK did function in a client-to-server mode prior to the initial GSS patches. In the most recent version of James' patch it appeared to work in sanity-sec up to test_15, until it ran into some issues related to fileop in test_16 and did not recover.

            I can't speak to the most recent reversion James just uploaded. I don't expect this feature needs to get in the way of releasing the many other improvements in 2.10.

            hannac Chris Hanna (Inactive) added a comment - James, Nodemap is needed for SSK (see Lustre manual 24.5) in a multinode setup. It does not necessarily need to be an activated feature, however. The lgssc.conf file should be created by the test-framework.sh script. If the nodemap is not set correctly set up, you definitely will see strange issues. SSK did function in a client-to-server mode prior to the initial GSS patches. In the most recent version of James' patch it appeared to work in sanity-sec up to test_15, until it ran into some issues related to fileop in test_16 and did not recover. I can't speak to the most recent reversion James just uploaded. I don't expect this feature needs to get in the way of releasing the many other improvements in 2.10.

            I just pushed a revert in the hopes we are back to the state of lustre 2.9. For the revert I made it so GSS is disabled with newer kernels instead so this will not hinder newer kernel support. This revert is just a very poor band aid. IMHO based on my testing I don't think GSS is ready for production systems. Even with the 2.9 client I found I couldn't get hmac support going with multiple nodes. Sebastien reported for 2.9 that gss null is unstable and it tends to kernel panic. Also I just could never get GSS working without nodemap. Is that suppose to be the case? I found issues like a missing lgssc.conf file will cause phantom keys in the kernel which caused problems.
            So the revert is far from a solution. A lot more work needs to be done for proper GSS support so it can be used in production environments.

            simmonsja James A Simmons added a comment - I just pushed a revert in the hopes we are back to the state of lustre 2.9. For the revert I made it so GSS is disabled with newer kernels instead so this will not hinder newer kernel support. This revert is just a very poor band aid. IMHO based on my testing I don't think GSS is ready for production systems. Even with the 2.9 client I found I couldn't get hmac support going with multiple nodes. Sebastien reported for 2.9 that gss null is unstable and it tends to kernel panic. Also I just could never get GSS working without nodemap. Is that suppose to be the case? I found issues like a missing lgssc.conf file will cause phantom keys in the kernel which caused problems. So the revert is far from a solution. A lot more work needs to be done for proper GSS support so it can be used in production environments.

            James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/27823
            Subject: LU-9073 gss: remove newer kernel support
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 9629cf20033982f49ab327203e5efa8578616872

            gerrit Gerrit Updater added a comment - James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/27823 Subject: LU-9073 gss: remove newer kernel support Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 9629cf20033982f49ab327203e5efa8578616872

            People

              simmonsja James A Simmons
              nblavend Nathan Lavender (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: