Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9073

SSK: lgss_sk generates keys with invalid HMAC and Crypto algorithms

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.10.0
    • Lustre 2.10.0
    • None
    • 3
    • 9223372036854775807

    Description

      With the landing of commit c6f5e8121366be05765dabe0008165166d3f431c for LU-8602, lgss_sk now generates keys with invalid HMAC and Crypto algorithms. The HMAC and Crypto algorithms are being swapped.

      == Master HEAD at c6f5e8121366be05765dabe0008165166d3f431c ==

      1. lgss_sk -t server -f testfs -w testfs_test_with_LU-8602.key -d /dev/urandom
        Reading random data for shared key from '/dev/urandom'
      1. lgss_sk -r testfs_test_with_LU-8602.key
        warning: secret key 'testfs_test_with_LU-8602.key' has insecure file mode 0100400
        Version: 1
        Type: server
        HMAC alg: AES-256-CTR
        Crypto alg: sha256
        Ctx Expiration: 604800 seconds
        Shared keylen: 256 bits
        Prime length: 2048 bits
        File system: testfs
        MGS NIDs:
        Nodemap name: default

      == LU-8602 reverted ==

      1. lgss_sk -t server -f testfs -w testfs_test_without_LU-8602.key -d /dev/urandom
        Reading random data for shared key from '/dev/urandom'
      1. lgss_sk -r testfs_test_without_LU-8602.key
        warning: secret key 'testfs_test_without_LU-8602.key' has insecure file mode 0100400
        Version: 1
        Type: server
        HMAC alg: SHA256
        Crypto alg: AES-256-CTR
        Ctx Expiration: 604800 seconds
        Shared keylen: 256 bits
        Prime length: 2048 bits
        File system: testfs
        MGS NIDs:
        Nodemap name: default
      1. lgss_sk -r testfs_test_with_LU-8602.key
        warning: secret key 'testfs_test_with_LU-8602.key' has insecure file mode 0100400
        Invalid HMAC algorithm
        error: key configuration failed validation

      The problem manifests itself by logging the following when secure contexts are being instantiated:

      kernel: LustreError: 2559:0:(gss_sk_mech.c:172:sk_fill_context()) Invalid hmac type: 65541
      kernel: LustreError: 2559:0:(gss_sk_mech.c:172:sk_fill_context()) Skipped 1 previous similar message
      kernel: LustreError: 2559:0:(gss_svc_upcall.c:668:rsc_parse()) parse rsc error -22
      kernel: LustreError: 2559:0:(gss_svc_upcall.c:668:rsc_parse()) Skipped 1 previous similar message
      kernel: LustreError: 2450:0:(gss_svc_upcall.c:1018:gss_svc_upcall_handle_init()) authentication failed

      Attachments

        Issue Links

          Activity

            [LU-9073] SSK: lgss_sk generates keys with invalid HMAC and Crypto algorithms

            After spending some time with the latest reversion from James, it appears to have fixed the issue. We were able to run sanity-sec and sanity for some time. There were some errors, but SSK was engaged any many tests did pass. I would proceed with the reversion if it is holding you back, and the errors can be diagnosed when time permits.

            hannac Chris Hanna (Inactive) added a comment - After spending some time with the latest reversion from James, it appears to have fixed the issue. We were able to run sanity-sec and sanity for some time. There were some errors, but SSK was engaged any many tests did pass. I would proceed with the reversion if it is holding you back, and the errors can be diagnosed when time permits.

            With the patch 'LU-9073 gss: remove newer kernel support' at https://review.whamcloud.com/27823, I do not get any error message on server side when a client running krb5n flavor is unmounted:

            juin 27 05:30:51 ltest-vm4 kernel: Lustre: 11812:0:(sec_gss.c:2323:gss_svc_handle_destroy()) destroy svc ctx ffff8803fbd6bc40 idx 0x31cb7c0f1b298198 (0->10.128.11.159@tcp)
            juin 27 05:30:56 ltest-vm4 kernel: Lustre: 11764:0:(sec_gss.c:1222:gss_cli_ctx_fini_common()) reverse sec ffff8800364a4500: destroy ctx ffff880036603780
            

            That being said, the problem mentioned earlier is not blocking, and could be tackled in another Jira. I would support the idea of having patch at https://review.whamcloud.com/25199 landed, in order not to block support for GSS with newer kernels, knowing that it gives 'checksum mismatch' errors on server side when clients are unmounted.

            What do you think?

            sbuisson Sebastien Buisson (Inactive) added a comment - With the patch ' LU-9073 gss: remove newer kernel support' at https://review.whamcloud.com/27823 , I do not get any error message on server side when a client running krb5n flavor is unmounted: juin 27 05:30:51 ltest-vm4 kernel: Lustre: 11812:0:(sec_gss.c:2323:gss_svc_handle_destroy()) destroy svc ctx ffff8803fbd6bc40 idx 0x31cb7c0f1b298198 (0->10.128.11.159@tcp) juin 27 05:30:56 ltest-vm4 kernel: Lustre: 11764:0:(sec_gss.c:1222:gss_cli_ctx_fini_common()) reverse sec ffff8800364a4500: destroy ctx ffff880036603780 That being said, the problem mentioned earlier is not blocking, and could be tackled in another Jira. I would support the idea of having patch at https://review.whamcloud.com/25199 landed, in order not to block support for GSS with newer kernels, knowing that it gives 'checksum mismatch' errors on server side when clients are unmounted. What do you think?
            pjones Peter Jones added a comment -

            hannac could you please need out the behaviour on RHEL7.x with James's latest reversion patch? We don't want to release 2.10.0 with a drop in functionality compared to 2.9.

            pjones Peter Jones added a comment - hannac could you please need out the behaviour on RHEL7.x with James's latest reversion patch? We don't want to release 2.10.0 with a drop in functionality compared to 2.9.

            James, Nodemap is needed for SSK (see Lustre manual 24.5) in a multinode setup. It does not necessarily need to be an activated feature, however. The lgssc.conf file should be created by the test-framework.sh script. If the nodemap is not set correctly set up, you definitely will see strange issues.

            SSK did function in a client-to-server mode prior to the initial GSS patches. In the most recent version of James' patch it appeared to work in sanity-sec up to test_15, until it ran into some issues related to fileop in test_16 and did not recover.

            I can't speak to the most recent reversion James just uploaded. I don't expect this feature needs to get in the way of releasing the many other improvements in 2.10.

            hannac Chris Hanna (Inactive) added a comment - James, Nodemap is needed for SSK (see Lustre manual 24.5) in a multinode setup. It does not necessarily need to be an activated feature, however. The lgssc.conf file should be created by the test-framework.sh script. If the nodemap is not set correctly set up, you definitely will see strange issues. SSK did function in a client-to-server mode prior to the initial GSS patches. In the most recent version of James' patch it appeared to work in sanity-sec up to test_15, until it ran into some issues related to fileop in test_16 and did not recover. I can't speak to the most recent reversion James just uploaded. I don't expect this feature needs to get in the way of releasing the many other improvements in 2.10.

            I just pushed a revert in the hopes we are back to the state of lustre 2.9. For the revert I made it so GSS is disabled with newer kernels instead so this will not hinder newer kernel support. This revert is just a very poor band aid. IMHO based on my testing I don't think GSS is ready for production systems. Even with the 2.9 client I found I couldn't get hmac support going with multiple nodes. Sebastien reported for 2.9 that gss null is unstable and it tends to kernel panic. Also I just could never get GSS working without nodemap. Is that suppose to be the case? I found issues like a missing lgssc.conf file will cause phantom keys in the kernel which caused problems.
            So the revert is far from a solution. A lot more work needs to be done for proper GSS support so it can be used in production environments.

            simmonsja James A Simmons added a comment - I just pushed a revert in the hopes we are back to the state of lustre 2.9. For the revert I made it so GSS is disabled with newer kernels instead so this will not hinder newer kernel support. This revert is just a very poor band aid. IMHO based on my testing I don't think GSS is ready for production systems. Even with the 2.9 client I found I couldn't get hmac support going with multiple nodes. Sebastien reported for 2.9 that gss null is unstable and it tends to kernel panic. Also I just could never get GSS working without nodemap. Is that suppose to be the case? I found issues like a missing lgssc.conf file will cause phantom keys in the kernel which caused problems. So the revert is far from a solution. A lot more work needs to be done for proper GSS support so it can be used in production environments.

            James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/27823
            Subject: LU-9073 gss: remove newer kernel support
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 9629cf20033982f49ab327203e5efa8578616872

            gerrit Gerrit Updater added a comment - James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/27823 Subject: LU-9073 gss: remove newer kernel support Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 9629cf20033982f49ab327203e5efa8578616872

            What is the current state of affairs with and without James' patch for Kerberos and SSK?

            Sebastien, you report problems at unmount, but does Kerberos work for normal usage before that time with James' patch applied? Is it "more" broken without the patch, or is it working without the patch and the patch breaks Kerberos?

            Chris Hanna or Nathan Lavender, does James' patch fix SSK again?

            I'm trying to figure out if we should land James' patch to fix the problem(s) found so far and then get a later patch to fix the remaining problem(s)? It seems like there are multiple issues here, and I'd like to move forward with getting fixes landed, since this is the last problem holding up the 2.10 release.

            adilger Andreas Dilger added a comment - What is the current state of affairs with and without James' patch for Kerberos and SSK? Sebastien, you report problems at unmount, but does Kerberos work for normal usage before that time with James' patch applied? Is it "more" broken without the patch, or is it working without the patch and the patch breaks Kerberos? Chris Hanna or Nathan Lavender, does James' patch fix SSK again? I'm trying to figure out if we should land James' patch to fix the problem(s) found so far and then get a later patch to fix the remaining problem(s)? It seems like there are multiple issues here, and I'd like to move forward with getting fixes landed, since this is the last problem holding up the 2.10 release.

            I have to ask does it work in 2.9 for you? I found with hmac the 2.9 clients doesn't work for me so I'm wondering it the GSS code works at all for anyone?

            simmonsja James A Simmons added a comment - I have to ask does it work in 2.9 for you? I found with hmac the 2.9 clients doesn't work for me so I'm wondering it the GSS code works at all for anyone?

            Still 'checksum mismatch' errors on server side at client unmount when testing patchset 21 with krb5n flavor.

            juin 26 16:39:58 ltest-vm4 kernel: LustreError: 6544:0:(gss_krb5_mech.c:640:gss_verify_mic_kerberos()) checksum mismatch
            juin 26 16:39:58 ltest-vm4 kernel: LustreError: 6544:0:(sec_gss.c:243:gss_verify_msg()) mic verify error: 00060000
            juin 26 16:39:58 ltest-vm4 kernel: LustreError: 6544:0:(sec_gss.c:2122:gss_svc_verify_request()) failed to verify request: 60000
            juin 26 16:40:04 ltest-vm4 kernel: Lustre: 6494:0:(sec_gss.c:1222:gss_cli_ctx_fini_common()) reverse sec ffff8803f9680d00: destroy ctx
            

            It seems patchset 21 only fixes sk_utils.c, so not related to krb5.

            sbuisson Sebastien Buisson (Inactive) added a comment - Still 'checksum mismatch' errors on server side at client unmount when testing patchset 21 with krb5n flavor. juin 26 16:39:58 ltest-vm4 kernel: LustreError: 6544:0:(gss_krb5_mech.c:640:gss_verify_mic_kerberos()) checksum mismatch juin 26 16:39:58 ltest-vm4 kernel: LustreError: 6544:0:(sec_gss.c:243:gss_verify_msg()) mic verify error: 00060000 juin 26 16:39:58 ltest-vm4 kernel: LustreError: 6544:0:(sec_gss.c:2122:gss_svc_verify_request()) failed to verify request: 60000 juin 26 16:40:04 ltest-vm4 kernel: Lustre: 6494:0:(sec_gss.c:1222:gss_cli_ctx_fini_common()) reverse sec ffff8803f9680d00: destroy ctx It seems patchset 21 only fixes sk_utils.c, so not related to krb5.
            simmonsja James A Simmons added a comment - - edited

            Found the reason for your error. The bug was in the user land utility. The bug was located in sk_session_kdf() and sk_compute_keys(). Before we had an entry for "ctr(aes)" in the libcfs crypto abstraction but I removed it. That was causing the mentioned functions to generate the incorrect size keys for the keyring. Currently I just hard coded the key size since we only use "ctr(aes)". I just discovered /proc/crypto has all the info we need. Will create a latter patch to handle that. Please give it a try. Thanks for you patience. It was a crash course in krb5 the last few days.

            simmonsja James A Simmons added a comment - - edited Found the reason for your error. The bug was in the user land utility. The bug was located in sk_session_kdf() and sk_compute_keys(). Before we had an entry for "ctr(aes)" in the libcfs crypto abstraction but I removed it. That was causing the mentioned functions to generate the incorrect size keys for the keyring. Currently I just hard coded the key size since we only use "ctr(aes)". I just discovered /proc/crypto has all the info we need. Will create a latter patch to handle that. Please give it a try. Thanks for you patience. It was a crash course in krb5 the last few days.

            With patchset 20, on server side when client is unmounted:

            juin 23 03:27:19 ltest-vm4 kernel: LustreError: 3437:0:(gss_krb5_mech.c:470:krb5_make_checksum()) cksum->len 20, ke_hash_size 12 for ke_hash_name sha1
            juin 23 03:27:19 ltest-vm4 kernel: LustreError: 3437:0:(gss_krb5_mech.c:643:gss_verify_mic_kerberos()) checksum mismatch
            juin 23 03:27:19 ltest-vm4 kernel: LustreError: 3437:0:(sec_gss.c:243:gss_verify_msg()) mic verify error: 00060000
            juin 23 03:27:19 ltest-vm4 kernel: LustreError: 3437:0:(sec_gss.c:2122:gss_svc_verify_request()) failed to verify request: 60000
            juin 23 03:27:25 ltest-vm4 kernel: Lustre: 3390:0:(sec_gss.c:1222:gss_cli_ctx_fini_common()) reverse sec ffff8804058cf300: destroy ctx ffff8804062433c0
            
            sbuisson Sebastien Buisson (Inactive) added a comment - With patchset 20, on server side when client is unmounted: juin 23 03:27:19 ltest-vm4 kernel: LustreError: 3437:0:(gss_krb5_mech.c:470:krb5_make_checksum()) cksum->len 20, ke_hash_size 12 for ke_hash_name sha1 juin 23 03:27:19 ltest-vm4 kernel: LustreError: 3437:0:(gss_krb5_mech.c:643:gss_verify_mic_kerberos()) checksum mismatch juin 23 03:27:19 ltest-vm4 kernel: LustreError: 3437:0:(sec_gss.c:243:gss_verify_msg()) mic verify error: 00060000 juin 23 03:27:19 ltest-vm4 kernel: LustreError: 3437:0:(sec_gss.c:2122:gss_svc_verify_request()) failed to verify request: 60000 juin 23 03:27:25 ltest-vm4 kernel: Lustre: 3390:0:(sec_gss.c:1222:gss_cli_ctx_fini_common()) reverse sec ffff8804058cf300: destroy ctx ffff8804062433c0

            People

              simmonsja James A Simmons
              nblavend Nathan Lavender (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: