Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.3.0
    • None
    • None
    • 4572

    Description

      Lustre use crypto hashes algo in two ways: PTLRPC (ptlrpc/sec_bulk.c), OST(ost/ost_handler.c)/OSC (osc/osc_request.c).
      OST,OSC use crc32, crc32c, adler for checksumming (compute_checksum() function)
      PTLRPC uses crc32, adler, md5, sha1-512 ( kernel crypto api for all, excluding crc32 adler)
      All subsystems go through bulk pages and update checksum.
      To resolve conflicts with different implementation of checksumming, a new crypto hash interface is needed at libcfs. It should use kernel crypto api for hash calculation for kernel modules, and lustre implementation for user mode. Previus checksum calculation should be changed to the new libcfs crypto hash api. And adding new hash algo would be a simple task.

      Attachments

        Issue Links

          Activity

            [LU-1201] Lustre crypto hash cleanup

            Ihara, have you checksumming benchmark results without this patch(for the same hw and configuration)?

            aboyko Alexander Boyko added a comment - Ihara, have you checksumming benchmark results without this patch(for the same hw and configuration)?

            It looks like there is already a bug LU-744 for tracking the 2.x performance degradation.

            adilger Andreas Dilger added a comment - It looks like there is already a bug LU-744 for tracking the 2.x performance degradation.

            Ihara, yes this spreadsheet should be attached to a new bug, along with details of the test being run. Are these results from testing with 24x ramdisk OSTs, as before? Jinshan is looking at this problem, please assign it to him.

            The new results are contrary to the previous test results in this bug, which showed 2.2 being faster than 2.1 for 8x OSTs.

            adilger Andreas Dilger added a comment - Ihara, yes this spreadsheet should be attached to a new bug, along with details of the test being run. Are these results from testing with 24x ramdisk OSTs, as before? Jinshan is looking at this problem, please assign it to him. The new results are contrary to the previous test results in this bug, which showed 2.2 being faster than 2.1 for 8x OSTs.

            Andreas,
            attached is an checksum comparison on various lustre version. most of 2.2 numbers (weather checksum is enabled or not) were lower than 2.1 and 1.8 except read performance. we might be having some regressions on 2.2 client. need to open new ticket of this?

            ihara Shuichi Ihara (Inactive) added a comment - Andreas, attached is an checksum comparison on various lustre version. most of 2.2 numbers (weather checksum is enabled or not) were lower than 2.1 and 1.8 except read performance. we might be having some regressions on 2.2 client. need to open new ticket of this?

            ok, the problem is fixed with correct config.h.

            ihara Shuichi Ihara (Inactive) added a comment - ok, the problem is fixed with correct config.h.
            aboyko Alexander Boyko added a comment - - edited

            Can you attach your config.h? Looks like invalid configure, I can`t reproduce issue on the same kernel.

            aboyko Alexander Boyko added a comment - - edited Can you attach your config.h? Looks like invalid configure, I can`t reproduce issue on the same kernel.

            Hit kernel panic with this patches on Sandybridge server. I just filed on LU-1379.

            ihara Shuichi Ihara (Inactive) added a comment - Hit kernel panic with this patches on Sandybridge server. I just filed on LU-1379 .

            I had couple of benchmarks on 1.8 and 2.2 for comparison, but not yet with this patches. I'm thinking to do same benchmarking on 1.8 and 2.2 with this patches in a couple of days.
            btw, we are seeing single client's performance regressions on 2.x due to LU-744. So, at this moment, we might need to run with file size < client's memory size for more fair comparison.. However, In the case of file size > client's memory size, we still see less client's performance regardless checksum=on/off.

            ihara Shuichi Ihara (Inactive) added a comment - I had couple of benchmarks on 1.8 and 2.2 for comparison, but not yet with this patches. I'm thinking to do same benchmarking on 1.8 and 2.2 with this patches in a couple of days. btw, we are seeing single client's performance regressions on 2.x due to LU-744 . So, at this moment, we might need to run with file size < client's memory size for more fair comparison.. However, In the case of file size > client's memory size, we still see less client's performance regardless checksum=on/off.

            Alexander, Ihara,
            have you done any performance comparison with checksums enabled on 1.8 for comparison?

            adilger Andreas Dilger added a comment - Alexander, Ihara, have you done any performance comparison with checksums enabled on 1.8 for comparison?

            I do test on RH5 with kernel version 2.6.18-238.19.1, it support crc32c hw (kernel compile), and have better crc32c performance than 2.6.32-... kernels.

            aboyko Alexander Boyko added a comment - I do test on RH5 with kernel version 2.6.18-238.19.1, it support crc32c hw (kernel compile), and have better crc32c performance than 2.6.32-... kernels.

            Ihara, while it is true that there may be some minor degradation in the case of RHEL 5 clients, based on the test results you posted this performance loss will be minor (or still an improvement over earlier versions of Lustre) due to the multi-threaded ptlrpcd speedups.

            Also, the chance of users wanting to stick with RHEL5 for stanility, but moving to a new development version of Lustre is not very likely.

            adilger Andreas Dilger added a comment - Ihara, while it is true that there may be some minor degradation in the case of RHEL 5 clients, based on the test results you posted this performance loss will be minor (or still an improvement over earlier versions of Lustre) due to the multi-threaded ptlrpcd speedups. Also, the chance of users wanting to stick with RHEL5 for stanility, but moving to a new development version of Lustre is not very likely.

            People

              wc-triage WC Triage
              aboyko Alexander Boyko
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: