[LU-2212] 2.3RC2 don't set the checksum algorithm correctly Created: 20/Oct/12 Updated: 18/Jun/13 Resolved: 18/Jun/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.3.0, Lustre 2.4.1, Lustre 2.5.0 |
| Fix Version/s: | Lustre 2.4.1, Lustre 2.5.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Supporto Lustre Jnet2000 (Inactive) | Assignee: | Peter Jones |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Nehalem / Sandy Bridge CPU |
||
| Severity: | 3 |
| Rank (Obsolete): | 5270 |
| Comments |
| Comment by Supporto Lustre Jnet2000 (Inactive) [ 20/Oct/12 ] |
|
cat /proc/cpuinfo processor : 15
this is the /var/log/messages output Oct 20 09:57:53 virgo1 kernel: LNet: HW CPU cores: 16, npartitions: 4 How could I select the crc32c algorithm with the sse4_2 hardware accelleartion? thanks in advance |
| Comment by Alexander Boyko [ 20/Oct/12 ] |
|
You need to do modprobe crc32c before load Lustre modules, and then after libcfs hash self test would done, you will get valid performance result for each algo. |
| Comment by Peter Jones [ 20/Oct/12 ] |
|
Thanks Alex. Could you please open an LUDOC ticket to get information about this added to the Lustre manual? |
| Comment by Andreas Dilger [ 20/Oct/12 ] |
|
The module loading should be done automatically, otherwise few users will detect that this optimization is not present. If there is not a symbol dependency that forces the crc32c module to be loaded automatically, then the libcfs/linux-crypto.c should call cfs_request_module("crc32c") once only, or in mount_lustre.c system("modprobe crc32c 2> /dev/null"). If the hardware CRC32C is not available by default for users with appropriate CPUs, then this could represent a fairly serious performance regression for single-client IO for users of Lustre 2.3 over the default behaviour of Lustre 2.2 (which had hardware crc32c support available on all supported CPUs). My local testing with a 2.3.53 client reports "[adler] crc32c" for the checksum_type file without having pre-loaded the crc32c module, but the CPU is old enough that it doesn't have the hardware crc32c support, but it doesn't make sense that crc32c support wouldn't be available at all on newer kernels. The above logs also reported errors with the crc32-table, crc32-pclmul, and adler32 algorithms. I also see on my console at every boot the following warnings, though I guess they do not directly reflect a problem, only that these crypto modules were loaded and do not contain a .test method: alg: No test for crc32 (crc32-table) alg: No test for adler32 (adler32-zlib) It would be great if these crypto modules could be updated to include a test to quiet the warnings for these kernels (RHEL 6.2 2.6.32-279.5.1.el6), or is this only available for in-kernel crypto algorithms? |
| Comment by Alexander Boyko [ 22/Oct/12 ] |
|
>It would be great if these crypto modules could be updated to include a test to quiet the warnings for these kernels (RHEL 6.2 2.6.32-279.5.1.el6), or is this only available for in-kernel crypto algorithms? No, tests exist at kernel for all algorithms linux/crypto/testmgr. {c|h}, so we need to patch kernel to add new test. |
| Comment by Supporto Lustre Jnet2000 (Inactive) [ 22/Oct/12 ] |
|
Hi Alexander, [root@virgo1 ~]# modprobe crc32c [root@virgo1 ~]# lsmod |grep crc32c [root@virgo1 ~]# modprobe lnet Oct 22 11:26:15 virgo1 kernel: LNet: HW CPU cores: 16, npartitions: 4 |
| Comment by Andreas Dilger [ 22/Oct/12 ] |
|
The console error messages are just misleading messages from the kernel. Check lctl get_param osc.*.checksum_type to see which checksum algorithms are available/used. |
| Comment by Alexander Boyko [ 22/Oct/12 ] |
|
Andreas, with cfs_request_module, I can add something like this if it will be useful lctl get_param osc.*.checksum_type osc.lustre-OST0000-osc-ffff88003cec3400.checksum_type= crc32 2104MB/s adler 2117MB/s [crc32c 2352MB/s] What do you think? |
| Comment by Supporto Lustre Jnet2000 (Inactive) [ 22/Oct/12 ] |
|
this is the strange output [root@virgo1 /]# lctl get_param osc.*.checksum_type |
| Comment by Supporto Lustre Jnet2000 (Inactive) [ 22/Oct/12 ] |
|
this two ost are on a Intel Nehalem server: osc.test-OST0000-osc-ffff881cc8b5a400.checksum_type=adler [crc32c] this two ost are on an other Intel Nehalem server osc.test-OST0002-osc-ffff881cc8b5a400.checksum_type=[adler] if I mount all the 4 ost on the first server I see this: osc.test-OST0000-osc-ffff880fd1842800.checksum_type=adler [crc32c] ops..... |
| Comment by Alexander Boyko [ 22/Oct/12 ] |
|
If you see my previous comment, there was speed of each algo. Adler and hw crc32c have the same number. Now about your OST with adler only. |
| Comment by Andreas Dilger [ 22/Oct/12 ] |
|
Alexander, checksum_stats:
- crc32: { speed: 3568 MB/s, active: no }
- adler32: { speed: 4356 MB/s, active: yes }
- crc32c: { speed: 3668 MB/s, active: no }
This allows scripts/tools to parse the data in a structured way, but it is still human readable. |
| Comment by Supporto Lustre Jnet2000 (Inactive) [ 22/Oct/12 ] |
|
okay, the correct procedure to enable crc32c in my lab is this: and finally: [root@virgo1 mdtest-1.8.3]# lctl get_param osc.*.checksum_type |
| Comment by Andreas Dilger [ 22/Oct/12 ] |
|
I've uploaded a simple patch which will hopefully resolve this problem: http://review.whamcloud.com/4371 It isn't necessarily the most elegant solution, but it would be great if you could give it a try. Please note that patch is completely untested (not even compiled yet) since I'm just getting onto a plane. |
| Comment by Alexander Boyko [ 22/Oct/12 ] |
|
patch |
| Comment by Alexander Boyko [ 22/Oct/12 ] |
|
no need to load libcrc32c |
| Comment by Supporto Lustre Jnet2000 (Inactive) [ 24/Oct/12 ] |
|
Okay thank you for the support. Please close |
| Comment by Peter Jones [ 24/Oct/12 ] |
|
JNET2000 are you confirming that the proposed fix works in your tests or did you just use the workaround? |
| Comment by Supporto Lustre Jnet2000 (Inactive) [ 24/Oct/12 ] |
|
I use the workaround, I'm not able to test the patch right now. |
| Comment by Peter Jones [ 24/Oct/12 ] |
|
ok well then I suggest that we keep this ticket open until the proposed patch has been validated and landed to master. It's quite alright if you are too busy to spend any further time on this but keeping the ticket open helps us for tracking purposes. |
| Comment by James A Simmons [ 10/Jun/13 ] |
|
I did test this patch. Recently I have been working on adding check summing support to the Gemini LND driver. On Cray computes the modules are loaded only once and then removed so more memory can be used for applications. This results in not being able to load modules later. Well the crc32 module do not load automatically so we had to do a painful work around but this patch automatically solves this problem by automatically loading the crc32c at boot time saving us a lot of pain. |
| Comment by James A Simmons [ 17/Jun/13 ] |
|
This patch has landed to 2.4 and master so I think we can close this ticket now. |
| Comment by Peter Jones [ 18/Jun/13 ] |
|
Thanks James! |