Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8376

Enhance debugging infos available for Lustre checksum errors

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.10.0
    • None
    • 3
    • 9223372036854775807

    Description

      To dump all of the data from the bad RPCs on the client and server, that would be enabled by a /proc control (off by default), like /proc/fs/lustre/osc/<target>/checksum_dump and /proc/fs/lustre/ost/<target>/checksum_dump so that we get both sides of the xfer to compare.

      When a bad checksum is hit it would (in a manner similar to how we dump logs on lbug) write a file like /tmp/[fid]:[offset-range]-clientcksum-servercksum on both server and client, if this file does not yet exist (so there will only be one file per node no matter how many retransmits there were).
      The file will get the page content from the RPC, and then we can compare the RPC data on server and client and see what changed in between them to perhaps gain better insight into what's going on.

      Per-page Intermediate/partial cksums will also be printed during error breakdown and on both sides to help determine where starts the drift.

      Attachments

        Issue Links

          Activity

            [LU-8376] Enhance debugging infos available for Lustre checksum errors
            pjones Peter Jones added a comment -

            Landed for 2.10

            pjones Peter Jones added a comment - Landed for 2.10

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23960/
            Subject: LU-8376 ost: enhance end to end bulk cksum error report
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 672986cbae63e90262d55bf277643ea046bfa8b2

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23960/ Subject: LU-8376 ost: enhance end to end bulk cksum error report Project: fs/lustre-release Branch: master Current Patch Set: Commit: 672986cbae63e90262d55bf277643ea046bfa8b2

            Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/23960
            Subject: LU-8376 ost: enhance end to end bulk cksum error report
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: ff5cd2a39e2a2206785194d0d218d99037d4cc19

            gerrit Gerrit Updater added a comment - Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/23960 Subject: LU-8376 ost: enhance end to end bulk cksum error report Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: ff5cd2a39e2a2206785194d0d218d99037d4cc19

            I am almost done with the patch, and should start to test it soon.

            bfaccini Bruno Faccini (Inactive) added a comment - I am almost done with the patch, and should start to test it soon.

            People

              bfaccini Bruno Faccini (Inactive)
              bfaccini Bruno Faccini (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: