Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.3.0
    • None
    • None
    • 4572

    Description

      Lustre use crypto hashes algo in two ways: PTLRPC (ptlrpc/sec_bulk.c), OST(ost/ost_handler.c)/OSC (osc/osc_request.c).
      OST,OSC use crc32, crc32c, adler for checksumming (compute_checksum() function)
      PTLRPC uses crc32, adler, md5, sha1-512 ( kernel crypto api for all, excluding crc32 adler)
      All subsystems go through bulk pages and update checksum.
      To resolve conflicts with different implementation of checksumming, a new crypto hash interface is needed at libcfs. It should use kernel crypto api for hash calculation for kernel modules, and lustre implementation for user mode. Previus checksum calculation should be changed to the new libcfs crypto hash api. And adding new hash algo would be a simple task.

      Attachments

        Issue Links

          Activity

            [LU-1201] Lustre crypto hash cleanup

            Hi Ihara, in LU-744 you saw 2.2 clients performed better than 2.1, however, here you have seen the opposite. Is this only because the file size is less than memory size?

            jay Jinshan Xiong (Inactive) added a comment - Hi Ihara, in LU-744 you saw 2.2 clients performed better than 2.1, however, here you have seen the opposite. Is this only because the file size is less than memory size?

            No big performance differences between 2.2 and 2.2/w this patches, but still lower than 2.1.2.

            LU-744, I've filed before, but I think this is another regression compared to what we saw here.
            As far as I tested on 2.2, the performance goes down when write/read size exceed client's memory size. This is LU-744 and same problem happens on 2.1.x.

            In order to avoid this regression for this checksum comparison among on 1.8.7, 2.1.2 and 2.2, I used file size < client's memory size.

            Anyway, I don't think this regression is related to LU-1201, I will open the new ticket.

            ihara Shuichi Ihara (Inactive) added a comment - No big performance differences between 2.2 and 2.2/w this patches, but still lower than 2.1.2. LU-744 , I've filed before, but I think this is another regression compared to what we saw here. As far as I tested on 2.2, the performance goes down when write/read size exceed client's memory size. This is LU-744 and same problem happens on 2.1.x. In order to avoid this regression for this checksum comparison among on 1.8.7, 2.1.2 and 2.2, I used file size < client's memory size. Anyway, I don't think this regression is related to LU-1201 , I will open the new ticket.

            Ihara, have you checksumming benchmark results without this patch(for the same hw and configuration)?

            aboyko Alexander Boyko added a comment - Ihara, have you checksumming benchmark results without this patch(for the same hw and configuration)?

            It looks like there is already a bug LU-744 for tracking the 2.x performance degradation.

            adilger Andreas Dilger added a comment - It looks like there is already a bug LU-744 for tracking the 2.x performance degradation.

            Ihara, yes this spreadsheet should be attached to a new bug, along with details of the test being run. Are these results from testing with 24x ramdisk OSTs, as before? Jinshan is looking at this problem, please assign it to him.

            The new results are contrary to the previous test results in this bug, which showed 2.2 being faster than 2.1 for 8x OSTs.

            adilger Andreas Dilger added a comment - Ihara, yes this spreadsheet should be attached to a new bug, along with details of the test being run. Are these results from testing with 24x ramdisk OSTs, as before? Jinshan is looking at this problem, please assign it to him. The new results are contrary to the previous test results in this bug, which showed 2.2 being faster than 2.1 for 8x OSTs.

            Andreas,
            attached is an checksum comparison on various lustre version. most of 2.2 numbers (weather checksum is enabled or not) were lower than 2.1 and 1.8 except read performance. we might be having some regressions on 2.2 client. need to open new ticket of this?

            ihara Shuichi Ihara (Inactive) added a comment - Andreas, attached is an checksum comparison on various lustre version. most of 2.2 numbers (weather checksum is enabled or not) were lower than 2.1 and 1.8 except read performance. we might be having some regressions on 2.2 client. need to open new ticket of this?

            ok, the problem is fixed with correct config.h.

            ihara Shuichi Ihara (Inactive) added a comment - ok, the problem is fixed with correct config.h.
            aboyko Alexander Boyko added a comment - - edited

            Can you attach your config.h? Looks like invalid configure, I can`t reproduce issue on the same kernel.

            aboyko Alexander Boyko added a comment - - edited Can you attach your config.h? Looks like invalid configure, I can`t reproduce issue on the same kernel.

            Hit kernel panic with this patches on Sandybridge server. I just filed on LU-1379.

            ihara Shuichi Ihara (Inactive) added a comment - Hit kernel panic with this patches on Sandybridge server. I just filed on LU-1379 .

            I had couple of benchmarks on 1.8 and 2.2 for comparison, but not yet with this patches. I'm thinking to do same benchmarking on 1.8 and 2.2 with this patches in a couple of days.
            btw, we are seeing single client's performance regressions on 2.x due to LU-744. So, at this moment, we might need to run with file size < client's memory size for more fair comparison.. However, In the case of file size > client's memory size, we still see less client's performance regardless checksum=on/off.

            ihara Shuichi Ihara (Inactive) added a comment - I had couple of benchmarks on 1.8 and 2.2 for comparison, but not yet with this patches. I'm thinking to do same benchmarking on 1.8 and 2.2 with this patches in a couple of days. btw, we are seeing single client's performance regressions on 2.x due to LU-744 . So, at this moment, we might need to run with file size < client's memory size for more fair comparison.. However, In the case of file size > client's memory size, we still see less client's performance regardless checksum=on/off.

            Alexander, Ihara,
            have you done any performance comparison with checksums enabled on 1.8 for comparison?

            adilger Andreas Dilger added a comment - Alexander, Ihara, have you done any performance comparison with checksums enabled on 1.8 for comparison?

            People

              wc-triage WC Triage
              aboyko Alexander Boyko
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: