Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14895

dump T10 guard tags on checksum error and flush pages

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 9223372036854775807

    Description

      When there is a server-side corruption of pages in the read cache, particularly with T10-PI, it appears that we do not properly handle this case. The client will detect the corruption due to the RPC checksum mismatch, and will resend the RPC, but will re-read the same data from the cache each time. If the server is using the (incorrect) GRD tags on the pages to generate the RPC checksum, the RPC checksum will consistently be incorrect:

      nbp17-OST0065: BAD READ CHECKSUM: from [10.151.27.142@o2ib] inode [0x20000948a:0x3:0x0] object 0x0:141666 extent [503316480-1509953535], client 73006b, server 10500b1, cksum_type 80 
      nbp17-OST0065: BAD READ CHECKSUM: from [10.151.27.142@o2ib] inode [0x20000948a:0x3:0x0] object 0x0:141666 extent [503316480-1509953535], client 73006b, server 10500b1, cksum_type 80 
      nbp17-OST0065: BAD READ CHECKSUM: from [10.151.27.142@o2ib] inode [0x20000948a:0x3:0x0] object 0x0:141666 extent [503316480-1509953535], client 73006b, server 10500b1, cksum_type 80 
      nbp17-OST0065: BAD READ CHECKSUM: from [10.151.27.142@o2ib] inode [0x20000948a:0x3:0x0] object 0x0:141666 extent [503316480-1509953535], client 73006b, server 10500b1, cksum_type 80 
      nbp17-OST0065: BAD READ CHECKSUM: from [10.151.27.142@o2ib] inode [0x20000948a:0x3:0x0] object 0x0:141666 extent [503316480-1509953535], client 73006b, server 10500b1, cksum_type 80 
      nbp17-OST0065: BAD READ CHECKSUM: from [10.151.27.142@o2ib] inode [0x20000948a:0x3:0x0] object 0x0:141666 extent [503316480-1509953535], client 73006b, server 10500b1, cksum_type 80 
      

      What should happen in this case is that if the client sends the OBD_FL_RECOV_RESEND flag in the OST_READ RPC, then the server should discard any cached pages in that range from cache, re-read the pages/sectors from the underlying storage (without using the cache), and then verify the GRD tags for each sector locally (calculate in osd-ldiskfs and compare to the GRD tags returned by the kernel), and print an error immediately about which sector(s) do not match, instead of depending on the client to do this again.

      It would be useful to be able to (somehow) send a block command (FUA?) to also flush the SFA cache in this case, but that would need some help from the SFA team, and still depends on Lustre handling this correctly.

      Attachments

        Issue Links

          Activity

            People

              dongyang Dongyang Li
              dongyang Dongyang Li
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: