[LU-2488] crc t10 dif pclmulqdq implementation Created: 13/Dec/12 Updated: 26/Sep/13 Resolved: 26/Sep/13 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Minor |
| Reporter: | Alexander Boyko | Assignee: | Keith Mannthey (Inactive) |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | patch | ||
| Issue Links: |
|
||||||||
| Rank (Obsolete): | 5841 | ||||||||
| Comments |
| Comment by Alexander Boyko [ 13/Dec/12 ] |
|
This patch adds crc t10 dif pclmulqdq implementation to libcfs. Result from t10 unit test: Lustre: t10crc UT Lustre: Checking block size 512 loops 4096 ... Lustre: Speed: libcfs 1351MB/s kernel 259MB/s Lustre: PASS Lustre: Checking block size 4096 loops 4096 ... Lustre: Speed: libcfs 2111MB/s kernel 251MB/s Lustre: PASS kernel - linux kernel table implementation. |
| Comment by Andreas Dilger [ 13/Dec/12 ] |
|
Presumably the intent of this is to add support for a t10 DIF mode for bulk RPCs? It would be good to include some background in the bug about how this code is planned to be used. |
| Comment by Alexander Boyko [ 13/Dec/12 ] |
|
This code will be used by client to calculate crc t10 dif, and provide this checksum to server with t10 dif/dix capable storage. |
| Comment by Andreas Dilger [ 21/Dec/12 ] |
|
So presumably you are going to add a new T10 bulk data checksum mode to the BRW protocol? I recall seeing some discussion to this effect, but I don't recall if there was a conclusion about what method would be used to send all of the T10 information in the RPC? This would increase the size of the BRW RPC by 256 * 8 bytes = 2kB for a 1MB RPC, and by 8kB for a 4MB RPC, which is pretty significant. Do you have any kind of HLD for the T10 DIF changes to the RPC? |
| Comment by Keith Mannthey (Inactive) [ 04/Jan/13 ] |
|
A good amount of kernels have T10 support at this point. Is the kernels T10 implementation not usable for your needs? |
| Comment by Alexander Boyko [ 05/Jan/13 ] |
|
Keith, if you are talking about crc T10, the kernel implementation is very slow and take much cpu resources. Andreas, you are right size of BRW RPC will be increased by sizeof(crc) * number of sectors for the bulk. |
| Comment by Nathan Rutman [ 07/Jan/13 ] |
|
Xyratex MRP-511 |
| Comment by Nathan Rutman [ 07/Jan/13 ] |
|
While this particular patch provides hardware support for the T10-DIF CRC algorithm for general use in the libcfs library, and is not concerned with any actual T10-DIF/DIX usage, I understand the reluctance to use apparently unnecessary code. |
| Comment by Nathan Rutman [ 07/Jan/13 ] |
|
|
| Comment by Keith Mannthey (Inactive) [ 26/Apr/13 ] |
|
It is possible to accelerate the Kernels T10 rather than re-implement the T10 framework in Lustre? Do we know why the T10 is not ASM accelerated in the kernel or the status of mainline in this regard? |
| Comment by Alexander Boyko [ 26/Apr/13 ] |
|
Keith, it will take time. The current kernel version is 3.9, but Lustre base on 2.6.32-279. May be I need to create and push t10 code to kernel in parallel. The T10 for Lustre has specific restrictions - only 512 and 4096 sector size and aligned data, using this the code is more productive than the general implementation. |
| Comment by Keith Mannthey (Inactive) [ 21/Jun/13 ] |
|
It seem the feature freeze for 2.5 is the end of July. It would be good to refresh this patch if the T10 code is targeted for the 2.5 release. |
| Comment by Alexander Boyko [ 28/Aug/13 ] |
|
Pleas, close this issue. This feature is not needed without main T10 code, I don`t see any progress on T10, and I will not spend time to update this patch. |
| Comment by Andreas Dilger [ 26/Sep/13 ] |
|
Close bug per request. |