[LU-1256] Question about checksums Created: 23/Mar/12  Updated: 17/Apr/12  Resolved: 17/Apr/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Roger Spellman (Inactive) Assignee: Cliff White (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Rank (Obsolete): 10091

 Description   

I have a few questions about Checksums.

Section 19.5.1 of the manual says:

To check the status of a wire checksum, run:

lctl get_param osc.*.checksums

Does that mean that clearing this parameter only clears the wire checksum?

Does writing each file named /proc/fs/lustre/osc/*/checksums do the same thing?

Is there a way to permanently disable checksums?

Thanks.



 Comments   
Comment by Peter Jones [ 23/Mar/12 ]

Cliff

Could you please help out with this one?

Thanks

Peter

Comment by Cliff White (Inactive) [ 23/Mar/12 ]

Are you using the current Lustre manual? I am checking the 2.0 manual, per 19.5.1.
Does seem to be a bit unclear, perhaps should be rewritten.

First, in general, the old way of parameter access is directly to the /proc files, the new
modern way is using get/set commands, if the path is the same both (atm) should do the same thing, but the get/set method
is future-compatible. You should be able to set a persistent value with lctl conf_param.

Both types of checksums are controlled by "lctl set_param llite.*.checksum_pages" which is a client-only parameter.

Network (wire) checksums can be set/unset with the osc.*.checksums parameter. This is done per-client and per ost connection.
Again, you can set/get with lctl or by direct file access.

So, you can have network and memory checksums, or memory only, or none.

Comment by Roger Spellman (Inactive) [ 26/Mar/12 ]

Thanks, Cliff.

Is there a way to permanently disable checksums?

There is CRC on the wire, and ECC on the RAM.
Exactly what errors are the checksums supposed to catch?

Comment by Cliff White (Inactive) [ 03/Apr/12 ]

As I said, you should be able to turn them off persistently with lctl conf_param, there is no other switch. CRC and ECC are hardware checksums, the Lustre checksums include the Lustre stack and are supposed to catch any issues in that area.

Comment by Peter Jones [ 17/Apr/12 ]

As per Peter Piela this ticket can be closed

Generated at Sat Feb 10 01:15:01 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.