[LU-17195] Add option to dump log on checksum error Created: 13/Oct/23  Updated: 16/Oct/23  Resolved: 16/Oct/23

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Chris Horn Assignee: Chris Horn
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-8376 Enhance debugging infos available for... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Add an option to dump debug log on checksum failure. Similar to dump_on_eviction/timeout



 Comments   
Comment by Gerrit Updater [ 13/Oct/23 ]

"Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52691
Subject: LU-17195 obd: Add dump_on_checksum parameter
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: e59ff6b4fb98e3e5b3f673bf09c5598b02ef343b

Comment by Andreas Dilger [ 13/Oct/23 ]

Chris, how does this differ from the checksum_dump functionality added by patch https://review.whamcloud.com/23960 "LU-8376 ost: enhance end to end bulk cksum error report"? That will already dump the checksum data to a file when there is an error detected.

Comment by Chris Horn [ 13/Oct/23 ]

AFAICT, the LU-8376 patch dumps the page content from the bulk xfer that failed the checksum. It does this on both client and server so that the content can be compared after the fact to see what has changed. My patch just dumps the lustre debug log to /tmp in the same manner as dump_on_eviction and dump_on_timeout. So I think these two patches are complimentary.

Comment by Chris Horn [ 16/Oct/23 ]

I was mistaken. The existing checksum_dump feature already dumps the debug log, so this patch is redundant.

Generated at Sat Feb 10 03:33:25 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.