Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11697

BAD WRITE CHECKSUM with t10ip4K and t10ip512 checksums

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.12.0
    • Lustre 2.12.0
    • None
    • 3
    • 9223372036854775807

    Description

      I'm running racer and get the following messages quite often. I think Oleg can confirm this.
      if I set checksum_type to adler, then racer runs fine with no single bad checksum.

      LustreError: 168-f: lustre-OST0000: BAD WRITE CHECKSUM: from 12345-0@lo inode [0x200000402:0x3b89:0x0] object 0x0:1921 extent [326-4194303]: client csum 53f1f04e, server csum 9bf3f057
      LustreError: 132-0: lustre-OST0000-osc-ffff8801d25e8800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 0@lo inode [0x200000402:0x3b89:0x0] object 0x0:1921 extent [326-4194303], original client csum 53f1f04e (type 20), server csum 9bf3f057 (type 20), client csum now 53f1f04e

      Attachments

        Issue Links

          Activity

            [LU-11697] BAD WRITE CHECKSUM with t10ip4K and t10ip512 checksums
            lixi_wc Li Xi added a comment -

            Please check whether the patch 33752 on top of 33727 works well. I am setuping an environment to test it. And also, setuping a environment with T10PI hardware to check whether everything works.

            lixi_wc Li Xi added a comment - Please check whether the patch 33752 on top of 33727 works well. I am setuping an environment to test it. And also, setuping a environment with T10PI hardware to check whether everything works.

            Li Xi (lixi@ddn.com) uploaded a new patch: https://review.whamcloud.com/33752
            Subject: LU-11697 obdclass: generate T10PI correctly for unaligned buffer
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 267bc620de222b23ad250f300b8ac0ad796e52f2

            gerrit Gerrit Updater added a comment - Li Xi (lixi@ddn.com) uploaded a new patch: https://review.whamcloud.com/33752 Subject: LU-11697 obdclass: generate T10PI correctly for unaligned buffer Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 267bc620de222b23ad250f300b8ac0ad796e52f2
            lixi_wc Li Xi added a comment -

            obd_page_dif_generate_buffer() doesn't handle the non-aligned buffer well. I am checking how T10PI handles the unaligned buffer.

            lixi_wc Li Xi added a comment - obd_page_dif_generate_buffer() doesn't handle the non-aligned buffer well. I am checking how T10PI handles the unaligned buffer.

            lixi_wc, can you please explain why it fails depending on page offset? if we put data (2K, iirc) at 2K offset, then the checksums match, but if we put data at 0 offset, then the checksums don't match. probably I missed something, of course..

            bzzz Alex Zhuravlev added a comment - lixi_wc , can you please explain why it fails depending on page offset? if we put data (2K, iirc) at 2K offset, then the checksums match, but if we put data at 0 offset, then the checksums don't match. probably I missed something, of course..
            lixi_wc Li Xi added a comment -

            The sanity:810 test that https://review.whamcloud.com/#/c/33726/ adds is still 512 bytes aligned. I guess that is why T10PI512 passes the test, but T10PI4096 doesn't. If the size is not 512 bytes aliged, I guess T10PI512 will fail too.

            lixi_wc Li Xi added a comment - The sanity:810 test that https://review.whamcloud.com/#/c/33726/ adds is still 512 bytes aligned. I guess that is why T10PI512 passes the test, but T10PI4096 doesn't. If the size is not 512 bytes aliged, I guess T10PI512 will fail too.

            adilger, yes it still fails with 4K version, but passes with 512.

            bzzz Alex Zhuravlev added a comment - adilger , yes it still fails with 4K version, but passes with 512.

            Alex, is that still failing with the 33727 patch applied on top of your 33726 patch?

            adilger Andreas Dilger added a comment - Alex, is that still failing with the 33727 patch applied on top of your 33726 patch?

            so that test from LU-11663 passes with t10ip512, but not with 4K. how 4K version is supposed to work with partial pages?

            bzzz Alex Zhuravlev added a comment - so that test from LU-11663 passes with t10ip512, but not with 4K. how 4K version is supposed to work with partial pages?
            bzzz Alex Zhuravlev added a comment - - edited

            lixi_wc then please try to run the test mentioned above. basically the client sends partial non-aligned page, but the server puts data with page offset=0.

            bzzz Alex Zhuravlev added a comment - - edited lixi_wc then please try to run the test mentioned above. basically the client sends partial non-aligned page, but the server puts data with page offset=0.
            lixi_wc Li Xi added a comment -

            Hi Alex, page offset is not used in T10 algorithm.

            lixi_wc Li Xi added a comment - Hi Alex, page offset is not used in T10 algorithm.
            bzzz Alex Zhuravlev added a comment - - edited

            lixi_wc, is page offset used in T10 algo?
            it looks like the checksum is different if page offset on the server and the client don't match. other algorithms (e.g. adler) seem to be insensitive to this.

            bzzz Alex Zhuravlev added a comment - - edited lixi_wc , is page offset used in T10 algo? it looks like the checksum is different if page offset on the server and the client don't match. other algorithms (e.g. adler) seem to be insensitive to this.

            People

              lixi_wc Li Xi
              bzzz Alex Zhuravlev
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: