[LU-1201] Lustre crypto hash cleanup - Whamcloud Community JIRA

Details

Type: Improvement
Resolution: Fixed
Priority: Minor
Fix Version/s: Lustre 2.3.0
Affects Version/s: None
Labels:
None

Rank (Obsolete):
4572

Description

Lustre use crypto hashes algo in two ways: PTLRPC (ptlrpc/sec_bulk.c), OST(ost/ost_handler.c)/OSC (osc/osc_request.c).
OST,OSC use crc32, crc32c, adler for checksumming (compute_checksum() function)
PTLRPC uses crc32, adler, md5, sha1-512 ( kernel crypto api for all, excluding crc32 adler)
All subsystems go through bulk pages and update checksum.
To resolve conflicts with different implementation of checksumming, a new crypto hash interface is needed at libcfs. It should use kernel crypto api for hash calculation for kernel modules, and lustre implementation for user mode. Previus checksum calculation should be changed to the new libcfs crypto hash api. And adding new hash algo would be a simple task.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

lustre-LU1201.xlsx
48 kB
13/May/12 12:13 AM
lustre-singleclient-comparison.xlsx
55 kB
07/May/12 1:39 AM

Issue Links

is related to

LU-744 Single client's performance degradation on 2.1

Resolved

Activity

[LU-1201] Lustre crypto hash cleanup

Jinshan Xiong (Inactive) added a comment - 14/May/12 4:00 PM

Hi Ihara, in ~~LU-744~~ you saw 2.2 clients performed better than 2.1, however, here you have seen the opposite. Is this only because the file size is less than memory size?

Jinshan Xiong (Inactive) added a comment - 14/May/12 4:00 PM Hi Ihara, in LU-744 you saw 2.2 clients performed better than 2.1, however, here you have seen the opposite. Is this only because the file size is less than memory size?

Shuichi Ihara (Inactive) added a comment - 13/May/12 12:13 AM

No big performance differences between 2.2 and 2.2/w this patches, but still lower than 2.1.2.

~~LU-744~~, I've filed before, but I think this is another regression compared to what we saw here.
As far as I tested on 2.2, the performance goes down when write/read size exceed client's memory size. This is ~~LU-744~~ and same problem happens on 2.1.x.

In order to avoid this regression for this checksum comparison among on 1.8.7, 2.1.2 and 2.2, I used file size < client's memory size.

Anyway, I don't think this regression is related to ~~LU-1201~~, I will open the new ticket.

Shuichi Ihara (Inactive) added a comment - 13/May/12 12:13 AM No big performance differences between 2.2 and 2.2/w this patches, but still lower than 2.1.2. LU-744 , I've filed before, but I think this is another regression compared to what we saw here. As far as I tested on 2.2, the performance goes down when write/read size exceed client's memory size. This is LU-744 and same problem happens on 2.1.x. In order to avoid this regression for this checksum comparison among on 1.8.7, 2.1.2 and 2.2, I used file size < client's memory size. Anyway, I don't think this regression is related to LU-1201 , I will open the new ticket.

Alexander Boyko added a comment - 11/May/12 12:56 AM

Ihara, have you checksumming benchmark results without this patch(for the same hw and configuration)?

Alexander Boyko added a comment - 11/May/12 12:56 AM Ihara, have you checksumming benchmark results without this patch(for the same hw and configuration)?

Andreas Dilger added a comment - 10/May/12 4:48 PM

It looks like there is already a bug ~~LU-744~~ for tracking the 2.x performance degradation.

Andreas Dilger added a comment - 10/May/12 4:48 PM It looks like there is already a bug LU-744 for tracking the 2.x performance degradation.

Andreas Dilger added a comment - 07/May/12 2:29 AM

Ihara, yes this spreadsheet should be attached to a new bug, along with details of the test being run. Are these results from testing with 24x ramdisk OSTs, as before? Jinshan is looking at this problem, please assign it to him.

The new results are contrary to the previous test results in this bug, which showed 2.2 being faster than 2.1 for 8x OSTs.

Andreas Dilger added a comment - 07/May/12 2:29 AM Ihara, yes this spreadsheet should be attached to a new bug, along with details of the test being run. Are these results from testing with 24x ramdisk OSTs, as before? Jinshan is looking at this problem, please assign it to him. The new results are contrary to the previous test results in this bug, which showed 2.2 being faster than 2.1 for 8x OSTs.

Shuichi Ihara (Inactive) added a comment - 07/May/12 1:39 AM

Andreas,
attached is an checksum comparison on various lustre version. most of 2.2 numbers (weather checksum is enabled or not) were lower than 2.1 and 1.8 except read performance. we might be having some regressions on 2.2 client. need to open new ticket of this?

Shuichi Ihara (Inactive) added a comment - 07/May/12 1:39 AM Andreas, attached is an checksum comparison on various lustre version. most of 2.2 numbers (weather checksum is enabled or not) were lower than 2.1 and 1.8 except read performance. we might be having some regressions on 2.2 client. need to open new ticket of this?

Shuichi Ihara (Inactive) added a comment - 05/May/12 9:26 PM

ok, the problem is fixed with correct config.h.

Shuichi Ihara (Inactive) added a comment - 05/May/12 9:26 PM ok, the problem is fixed with correct config.h.

Alexander Boyko added a comment - 05/May/12 5:19 PM - edited

Can you attach your config.h? Looks like invalid configure, I can`t reproduce issue on the same kernel.

Alexander Boyko added a comment - 05/May/12 5:19 PM - edited Can you attach your config.h? Looks like invalid configure, I can`t reproduce issue on the same kernel.

Shuichi Ihara (Inactive) added a comment - 05/May/12 11:12 AM

Hit kernel panic with this patches on Sandybridge server. I just filed on ~~LU-1379~~.

Shuichi Ihara (Inactive) added a comment - 05/May/12 11:12 AM Hit kernel panic with this patches on Sandybridge server. I just filed on LU-1379 .

Shuichi Ihara (Inactive) added a comment - 01/May/12 4:39 AM

I had couple of benchmarks on 1.8 and 2.2 for comparison, but not yet with this patches. I'm thinking to do same benchmarking on 1.8 and 2.2 with this patches in a couple of days.
btw, we are seeing single client's performance regressions on 2.x due to ~~LU-744~~. So, at this moment, we might need to run with file size < client's memory size for more fair comparison.. However, In the case of file size > client's memory size, we still see less client's performance regardless checksum=on/off.

Shuichi Ihara (Inactive) added a comment - 01/May/12 4:39 AM I had couple of benchmarks on 1.8 and 2.2 for comparison, but not yet with this patches. I'm thinking to do same benchmarking on 1.8 and 2.2 with this patches in a couple of days. btw, we are seeing single client's performance regressions on 2.x due to LU-744 . So, at this moment, we might need to run with file size < client's memory size for more fair comparison.. However, In the case of file size > client's memory size, we still see less client's performance regardless checksum=on/off.

Andreas Dilger added a comment - 30/Apr/12 4:34 PM

Alexander, Ihara,
have you done any performance comparison with checksums enabled on 1.8 for comparison?

Andreas Dilger added a comment - 30/Apr/12 4:34 PM Alexander, Ihara, have you done any performance comparison with checksums enabled on 1.8 for comparison?

People

Assignee:: WC Triage

Reporter:: Alexander Boyko

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 09/Mar/12 10:48 AM

Updated:: 21/Nov/12 7:30 PM

Resolved:: 01/Jul/12 3:00 AM