[LU-11011] checksum type can not be selected permanently Created: 10/May/18  Updated: 31/Aug/21  Resolved: 21/Aug/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.13.0

Type: Improvement Priority: Major
Reporter: Li Xi (Inactive) Assignee: Li Xi
Resolution: Fixed Votes: 0
Labels: patch

Issue Links:
Related
is related to LU-14912 client picking other checksum type ov... Resolved
is related to LU-10906 checksums parameter not persistent af... Resolved
Rank (Obsolete): 9223372036854775807

 Description   

Some checksum types might not work correctly even though they are available
options and have the best speeds during test. In these circumstances, users
might want to use a certain checksum type which is known to be functional.
However, "lctl conf_param XXX-YYY.osc.checksum_type=ZZZ" won't help to enforce
a certain checksum type, because the selected checksum type is determined
during OSC connection, which will overwrite the LLOG parameter.

Following is the design of solving the problem:

To solve this problem, whenever a valid checksum type is set by "lctl
conf_param" or "lctl set_param", it is remembered as the perferred checksum
type for the OSC. During connection process, if that checksum type is
available, that checksum type will be selected as the RPC checksum type
regardless of its speed.

The semantics of interface /proc/fs/lustre/osc/*/checksum_type is changed for
a little bit. If a wrong checksum name is being written into this entry,
-EINVAL will be returned as before. If the written string is a valid checksum
name, even though the checksum type is not supported by this OSC/OST pair, the
checksum type will still be remembered as the perferred checksum type, and
return value will be -ENOTSUPP. Whenever connecting/reconnecting happens, if
perferred checksum type is availabe, it will be used for the RPC checksum.



 Comments   
Comment by Gerrit Updater [ 10/May/18 ]

Li Xi (lixi@ddn.com) uploaded a new patch: https://review.whamcloud.com/32349
Subject: LU-11011 osc: add preferred checksum type support
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 96d577f0b5f782ee2faf6932f07c8785661de2ed

Comment by Li Xi (Inactive) [ 10/May/18 ]

Example before applying patch:

[root@server17-el7-vm2 ~]# lctl get_param osc.*.checksum_type
osc.969362ae-OST0000-osc-ffff88007a226800.checksum_type=crc32 adler [crc32c]
[root@server17-el7-vm1 ~]# lctl conf_param 969362ae-OST0000.osc.checksum_type=adler
[root@server17-el7-vm2 ~]# lctl get_param osc.*.checksum_type
osc.969362ae-OST0000-osc-ffff88007a226800.checksum_type=crc32 [adler] crc32c
[root@server17-el7-vm2 ~]# umount /mnt/lustre/
[root@server17-el7-vm2 ~]# mount -t lustre 10.0.1.148@tcp:/969362ae /mnt/lustre/
[root@server17-el7-vm2 ~]# lctl get_param osc.*.checksum_type
osc.969362ae-OST0000-osc-ffff880070120000.checksum_type=crc32 adler [crc32c]

                                                                                                                           ^ checksum change back to crc32c even "lctl conf_param" want to change it to adler.

 

After patch:

[root@server17-el7-vm2 ~]# lctl get_param osc.*.checksum_type
osc.969362ae-OST0000-osc-ffff88007abd1000.checksum_type=crc32 [adler] crc32c
[root@server17-el7-vm1 ~]# lctl conf_param 969362ae-OST0000.osc.checksum_type=crc32
[root@server17-el7-vm2 ~]# lctl get_param osc.*.checksum_type
osc.969362ae-OST0000-osc-ffff88007abd1000.checksum_type=[crc32] adler crc32c
[root@server17-el7-vm2 ~]# umount /mnt/lustre/
[root@server17-el7-vm2 ~]# mount -t lustre 10.0.1.148@tcp:/969362ae /mnt/lustre/
[root@server17-el7-vm2 ~]# lctl get_param osc.*.checksum_type
osc.969362ae-OST0000-osc-ffff88007a808800.checksum_type=[crc32] adler crc32c

 

Comment by Andreas Dilger [ 13/Jun/18 ]

Some checksum types might not work correctly even though they are available options and have the best speeds during test.

Could you please explain this a bit further? Checksums should not be offered by a server or client if they are not working. If that is the case, it would be better to fix the code not to offer those checksums, rather than forcing users to specify a working checksum manually.

Comment by Li Xi [ 20/Aug/18 ]

> Some checksum types might not work correctly even though they are available options and have the best speeds during test.

This was not caused by Lustre problem.

A user found a problem of the default checksum type which has the best performance. I don't remember the details. I remember the checksum sometimes was calculated wrongly. The root cause might be a bug of the CPU or the kernel. Thus, the user wants to change the checksum type to another one which doesn't have the best performance. And a persistent configuration would be better than changing everytime when restarting the services.

Comment by Gerrit Updater [ 21/Aug/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32349/
Subject: LU-11011 osc: add preferred checksum type support
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 9b6b5e4798281eceb45699431bc871eda6d968c4

Comment by Peter Jones [ 21/Aug/19 ]

Landed for 2.13

Generated at Sat Feb 10 02:40:10 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.