[LU-2212] 2.3RC2 don't set the checksum algorithm correctly Created: 20/Oct/12  Updated: 18/Jun/13  Resolved: 18/Jun/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0, Lustre 2.4.1, Lustre 2.5.0
Fix Version/s: Lustre 2.4.1, Lustre 2.5.0

Type: Bug Priority: Major
Reporter: Supporto Lustre Jnet2000 (Inactive) Assignee: Peter Jones
Resolution: Fixed Votes: 0
Labels: None
Environment:

Nehalem / Sandy Bridge CPU


Severity: 3
Rank (Obsolete): 5270

 Comments   
Comment by Supporto Lustre Jnet2000 (Inactive) [ 20/Oct/12 ]

cat /proc/cpuinfo

processor : 15
vendor_id : GenuineIntel
cpu family : 6
model : 45
model name : Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz
stepping : 7
cpu MHz : 1200.000
cache size : 20480 KB
physical id : 1
siblings : 8
core id : 7
cpu cores : 8
apicid : 46
initial apicid : 46
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
bogomips : 4802.69
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:

  1. cat /proc/fs/lustre/osc/prova-OST0000-osc-ffff881056290400/checksums
    1
  1. cat /proc/fs/lustre/osc/prova-OST0000-osc-ffff881056290400/checksum_type
    [adler]

this is the /var/log/messages output

Oct 20 09:57:53 virgo1 kernel: LNet: HW CPU cores: 16, npartitions: 4
Oct 20 09:57:53 virgo1 kernel: alg: No test for crc32 (crc32-table)
Oct 20 09:57:53 virgo1 kernel: alg: No test for adler32 (adler32-zlib)
Oct 20 09:57:53 virgo1 kernel: alg: No test for crc32 (crc32-pclmul)
Oct 20 09:57:57 virgo1 kernel: padlock: VIA PadLock Hash Engine not detected.
Oct 20 09:57:57 virgo1 modprobe: FATAL: Error inserting padlock_sha (/lib/modules/2.6.32-279.5.1.el6.x86_64/kernel/drivers/crypto/padlock-sha.ko): No such device
Oct 20 09:58:06 virgo1 kernel: Lustre: Lustre: Build Version: 2.3.0-RC2--PRISTINE-2.6.32-279.5.1.el6.x86_64

How could I select the crc32c algorithm with the sse4_2 hardware accelleartion?

thanks in advance

Comment by Alexander Boyko [ 20/Oct/12 ]

You need to do modprobe crc32c before load Lustre modules, and then after libcfs hash self test would done, you will get valid performance result for each algo.
Anyway you can change to crc32c like
for i in /proc/fs/lustre/osc/*/checksum_type; do echo crc32c > $i;done

Comment by Peter Jones [ 20/Oct/12 ]

Thanks Alex. Could you please open an LUDOC ticket to get information about this added to the Lustre manual?

Comment by Andreas Dilger [ 20/Oct/12 ]

The module loading should be done automatically, otherwise few users will detect that this optimization is not present. If there is not a symbol dependency that forces the crc32c module to be loaded automatically, then the libcfs/linux-crypto.c should call cfs_request_module("crc32c") once only, or in mount_lustre.c system("modprobe crc32c 2> /dev/null"). If the hardware CRC32C is not available by default for users with appropriate CPUs, then this could represent a fairly serious performance regression for single-client IO for users of Lustre 2.3 over the default behaviour of Lustre 2.2 (which had hardware crc32c support available on all supported CPUs).

My local testing with a 2.3.53 client reports "[adler] crc32c" for the checksum_type file without having pre-loaded the crc32c module, but the CPU is old enough that it doesn't have the hardware crc32c support, but it doesn't make sense that crc32c support wouldn't be available at all on newer kernels.

The above logs also reported errors with the crc32-table, crc32-pclmul, and adler32 algorithms. I also see on my console at every boot the following warnings, though I guess they do not directly reflect a problem, only that these crypto modules were loaded and do not contain a .test method:

alg: No test for crc32 (crc32-table)
alg: No test for adler32 (adler32-zlib)

It would be great if these crypto modules could be updated to include a test to quiet the warnings for these kernels (RHEL 6.2 2.6.32-279.5.1.el6), or is this only available for in-kernel crypto algorithms?

Comment by Alexander Boyko [ 22/Oct/12 ]

>It would be great if these crypto modules could be updated to include a test to quiet the warnings for these kernels (RHEL 6.2 2.6.32-279.5.1.el6), or is this only available for in-kernel crypto algorithms?

No, tests exist at kernel for all algorithms linux/crypto/testmgr.

{c|h}

, so we need to patch kernel to add new test.
I agree with automatically module load.

Comment by Supporto Lustre Jnet2000 (Inactive) [ 22/Oct/12 ]

Hi Alexander,
same error in the /var/log/messages.....

[root@virgo1 ~]# modprobe crc32c

[root@virgo1 ~]# lsmod |grep crc32c
crc32c_intel 2068 0
libcrc32c 1246 1 iw_nes

[root@virgo1 ~]# modprobe lnet

Oct 22 11:26:15 virgo1 kernel: LNet: HW CPU cores: 16, npartitions: 4
Oct 22 11:26:15 virgo1 kernel: alg: No test for crc32 (crc32-table)
Oct 22 11:26:15 virgo1 kernel: alg: No test for adler32 (adler32-zlib)
Oct 22 11:26:15 virgo1 kernel: alg: No test for crc32 (crc32-pclmul)
Oct 22 11:26:19 virgo1 kernel: padlock: VIA PadLock Hash Engine not detected.
Oct 22 11:26:19 virgo1 modprobe: FATAL: Error inserting padlock_sha (/lib/modules/2.6.32-279.5.1.el6.x86_64/kernel/drivers/crypto/padlock-sha.ko): No such device
Oct 22 11:26:23 virgo1 kernel: Lustre: Lustre: Build Version: 2.3.0-RC2--PRISTINE-2.6.32-279.5.1.el6.x86_64

Comment by Andreas Dilger [ 22/Oct/12 ]

The console error messages are just misleading messages from the kernel. Check lctl get_param osc.*.checksum_type to see which checksum algorithms are available/used.

Comment by Alexander Boyko [ 22/Oct/12 ]

Andreas, with cfs_request_module, I can add something like this if it will be useful

lctl get_param osc.*.checksum_type
osc.lustre-OST0000-osc-ffff88003cec3400.checksum_type=
crc32 2104MB/s adler 2117MB/s [crc32c 2352MB/s]

What do you think?

Comment by Supporto Lustre Jnet2000 (Inactive) [ 22/Oct/12 ]

this is the strange output

[root@virgo1 /]# lctl get_param osc.*.checksum_type
osc.test-OST0000-osc-ffff881cc8b5a400.checksum_type=adler [crc32c]
osc.test-OST0001-osc-ffff881cc8b5a400.checksum_type=adler [crc32c]
osc.test-OST0002-osc-ffff881cc8b5a400.checksum_type=[adler]
osc.test-OST0003-osc-ffff881cc8b5a400.checksum_type=[adler]

Comment by Supporto Lustre Jnet2000 (Inactive) [ 22/Oct/12 ]

this two ost are on a Intel Nehalem server:

osc.test-OST0000-osc-ffff881cc8b5a400.checksum_type=adler [crc32c]
osc.test-OST0001-osc-ffff881cc8b5a400.checksum_type=adler [crc32c]

this two ost are on an other Intel Nehalem server

osc.test-OST0002-osc-ffff881cc8b5a400.checksum_type=[adler]
osc.test-OST0003-osc-ffff881cc8b5a400.checksum_type=[adler]

if I mount all the 4 ost on the first server I see this:

osc.test-OST0000-osc-ffff880fd1842800.checksum_type=adler [crc32c]
osc.test-OST0001-osc-ffff880fd1842800.checksum_type=adler [crc32c]
osc.test-OST0002-osc-ffff880fd1842800.checksum_type=adler [crc32c]
osc.test-OST0003-osc-ffff880fd1842800.checksum_type=adler [crc32c]

ops.....

Comment by Alexander Boyko [ 22/Oct/12 ]

If you see my previous comment, there was speed of each algo. Adler and hw crc32c have the same number. Now about your OST with adler only.
Lustre server tell to a clients which algo is supported by the server. The base algo is adler, if algo speed is better then the half adler speed it is supported too. So your situation looks like the server has crc32c-table when the libcfs was loaded and the server reply doesn`t contains crc32c because speed of crc32c-table is about 400Mb/s and adler has 2000Mb/s.
Try to restart Lustre(unload/load modules) at this server after modprobe crc32c, I think this resolve the issue.

Comment by Andreas Dilger [ 22/Oct/12 ]

Alexander,
Regarding the checksum_type parameter, it would be great to get the checksum performance on both the client and server. However, the format you propose is complex to parse and use. Better is to have a separate checksum_stats file that uses YAML format (see http://www.yaml.org/ and http://online-yaml-parser.appspot.org/ for examples) as all new complex /proc files do, for easy parsing by both humans and tools. Something like:

checksum_stats:
- crc32: { speed: 3568 MB/s, active: no }
- adler32: { speed: 4356 MB/s, active: yes }
- crc32c: { speed: 3668 MB/s, active: no }

This allows scripts/tools to parse the data in a structured way, but it is still human readable.

Comment by Supporto Lustre Jnet2000 (Inactive) [ 22/Oct/12 ]

okay, the correct procedure to enable crc32c in my lab is this:
#modprobe crc32c
#modprobe libcrc32c
#modprobe lustre

and finally:

[root@virgo1 mdtest-1.8.3]# lctl get_param osc.*.checksum_type
osc.test-OST0000-osc-ffff882052d89400.checksum_type=adler [crc32c]
osc.test-OST0001-osc-ffff882052d89400.checksum_type=adler [crc32c]
osc.test-OST0002-osc-ffff882052d89400.checksum_type=adler [crc32c]
osc.test-OST0003-osc-ffff882052d89400.checksum_type=adler [crc32c]

Comment by Andreas Dilger [ 22/Oct/12 ]

I've uploaded a simple patch which will hopefully resolve this problem: http://review.whamcloud.com/4371

It isn't necessarily the most elegant solution, but it would be great if you could give it a try. Please note that patch is completely untested (not even compiled yet) since I'm just getting onto a plane.

Comment by Alexander Boyko [ 22/Oct/12 ]

patch
b2_3 http://review.whamcloud.com/4373
master http://review.whamcloud.com/4372

Comment by Alexander Boyko [ 22/Oct/12 ]

no need to load libcrc32c

Comment by Supporto Lustre Jnet2000 (Inactive) [ 24/Oct/12 ]

Okay thank you for the support. Please close

Comment by Peter Jones [ 24/Oct/12 ]

JNET2000 are you confirming that the proposed fix works in your tests or did you just use the workaround?

Comment by Supporto Lustre Jnet2000 (Inactive) [ 24/Oct/12 ]

I use the workaround, I'm not able to test the patch right now.

Comment by Peter Jones [ 24/Oct/12 ]

ok well then I suggest that we keep this ticket open until the proposed patch has been validated and landed to master. It's quite alright if you are too busy to spend any further time on this but keeping the ticket open helps us for tracking purposes.

Comment by James A Simmons [ 10/Jun/13 ]

I did test this patch. Recently I have been working on adding check summing support to the Gemini LND driver. On Cray computes the modules are loaded only once and then removed so more memory can be used for applications. This results in not being able to load modules later. Well the crc32 module do not load automatically so we had to do a painful work around but this patch automatically solves this problem by automatically loading the crc32c at boot time saving us a lot of pain.

Comment by James A Simmons [ 17/Jun/13 ]

This patch has landed to 2.4 and master so I think we can close this ticket now.

Comment by Peter Jones [ 18/Jun/13 ]

Thanks James!

Generated at Sat Feb 10 01:23:18 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.