[LU-14895] dump T10 guard tags on checksum error and flush pages - Whamcloud Community JIRA

Details

Type: Improvement
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: None
Labels:
None

Rank (Obsolete):
9223372036854775807

Description

When there is a server-side corruption of pages in the read cache, particularly with T10-PI, it appears that we do not properly handle this case. The client will detect the corruption due to the RPC checksum mismatch, and will resend the RPC, but will re-read the same data from the cache each time. If the server is using the (incorrect) GRD tags on the pages to generate the RPC checksum, the RPC checksum will consistently be incorrect:

nbp17-OST0065: BAD READ CHECKSUM: from [10.151.27.142@o2ib] inode [0x20000948a:0x3:0x0] object 0x0:141666 extent [503316480-1509953535], client 73006b, server 10500b1, cksum_type 80 
nbp17-OST0065: BAD READ CHECKSUM: from [10.151.27.142@o2ib] inode [0x20000948a:0x3:0x0] object 0x0:141666 extent [503316480-1509953535], client 73006b, server 10500b1, cksum_type 80 
nbp17-OST0065: BAD READ CHECKSUM: from [10.151.27.142@o2ib] inode [0x20000948a:0x3:0x0] object 0x0:141666 extent [503316480-1509953535], client 73006b, server 10500b1, cksum_type 80 
nbp17-OST0065: BAD READ CHECKSUM: from [10.151.27.142@o2ib] inode [0x20000948a:0x3:0x0] object 0x0:141666 extent [503316480-1509953535], client 73006b, server 10500b1, cksum_type 80 
nbp17-OST0065: BAD READ CHECKSUM: from [10.151.27.142@o2ib] inode [0x20000948a:0x3:0x0] object 0x0:141666 extent [503316480-1509953535], client 73006b, server 10500b1, cksum_type 80 
nbp17-OST0065: BAD READ CHECKSUM: from [10.151.27.142@o2ib] inode [0x20000948a:0x3:0x0] object 0x0:141666 extent [503316480-1509953535], client 73006b, server 10500b1, cksum_type 80

What should happen in this case is that if the client sends the OBD_FL_RECOV_RESEND flag in the OST_READ RPC, then the server should discard any cached pages in that range from cache, re-read the pages/sectors from the underlying storage (without using the cache), and then verify the GRD tags for each sector locally (calculate in osd-ldiskfs and compare to the GRD tags returned by the kernel), and print an error immediately about which sector(s) do not match, instead of depending on the client to do this again.

It would be useful to be able to (somehow) send a block command (FUA?) to also flush the SFA cache in this case, but that would need some help from the SFA team, and still depends on Lustre handling this correctly.

Attachments

Issue Links

is related to

LU-14912 client picking other checksum type over T10PI

Resolved

is related to

LU-14924 LustreError: 133-1: nbp17-OST0064-osc-ffff9bf24bc99800: BAD READ CHECKSUM

Resolved

Activity

People

Assignee:: Dongyang Li

Reporter:: Dongyang Li

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 30/Jul/21 6:14 AM

Updated:: 14/Jan/22 7:11 AM