[LU-8376] Enhance debugging infos available for Lustre checksum errors Created: 07/Jul/16  Updated: 13/Oct/23  Resolved: 09/May/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Major
Reporter: Bruno Faccini (Inactive) Assignee: Bruno Faccini (Inactive)
Resolution: Fixed Votes: 0
Labels: cea

Issue Links:
Related
is related to LU-10316 Interop 2.7.x <->2.10.2 sanity test_7... Resolved
is related to LU-17195 Add option to dump log on checksum error Closed
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

To dump all of the data from the bad RPCs on the client and server, that would be enabled by a /proc control (off by default), like /proc/fs/lustre/osc/<target>/checksum_dump and /proc/fs/lustre/ost/<target>/checksum_dump so that we get both sides of the xfer to compare.

When a bad checksum is hit it would (in a manner similar to how we dump logs on lbug) write a file like /tmp/[fid]:[offset-range]-clientcksum-servercksum on both server and client, if this file does not yet exist (so there will only be one file per node no matter how many retransmits there were).
The file will get the page content from the RPC, and then we can compare the RPC data on server and client and see what changed in between them to perhaps gain better insight into what's going on.

Per-page Intermediate/partial cksums will also be printed during error breakdown and on both sides to help determine where starts the drift.



 Comments   
Comment by Bruno Faccini (Inactive) [ 11/Jul/16 ]

I am almost done with the patch, and should start to test it soon.

Comment by Gerrit Updater [ 25/Nov/16 ]

Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/23960
Subject: LU-8376 ost: enhance end to end bulk cksum error report
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ff5cd2a39e2a2206785194d0d218d99037d4cc19

Comment by Gerrit Updater [ 09/May/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23960/
Subject: LU-8376 ost: enhance end to end bulk cksum error report
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 672986cbae63e90262d55bf277643ea046bfa8b2

Comment by Peter Jones [ 09/May/17 ]

Landed for 2.10

Generated at Sat Feb 10 02:17:00 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.