Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11697

BAD WRITE CHECKSUM with t10ip4K and t10ip512 checksums

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.12.0
    • Lustre 2.12.0
    • None
    • 3
    • 9223372036854775807

    Description

      I'm running racer and get the following messages quite often. I think Oleg can confirm this.
      if I set checksum_type to adler, then racer runs fine with no single bad checksum.

      LustreError: 168-f: lustre-OST0000: BAD WRITE CHECKSUM: from 12345-0@lo inode [0x200000402:0x3b89:0x0] object 0x0:1921 extent [326-4194303]: client csum 53f1f04e, server csum 9bf3f057
      LustreError: 132-0: lustre-OST0000-osc-ffff8801d25e8800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 0@lo inode [0x200000402:0x3b89:0x0] object 0x0:1921 extent [326-4194303], original client csum 53f1f04e (type 20), server csum 9bf3f057 (type 20), client csum now 53f1f04e

      Attachments

        Issue Links

          Activity

            [LU-11697] BAD WRITE CHECKSUM with t10ip4K and t10ip512 checksums

            so there should be no issue whether server put data (and write from) offset=1K or offset=0K, right?

            bzzz Alex Zhuravlev added a comment - so there should be no issue whether server put data (and write from) offset=1K or offset=0K, right?
            lixi_wc Li Xi added a comment -

            Hi Alex, good question. I did assume that page sizes are the same on clients and servers. However, even page sizes are different, it is still the same.

            In tgt_brw_write(), the server asked to transfer the data into the page with lnb_page_offset. And thus, the server should use this offset of the page to access the data, including writing to disk and calculating the RPC checksum.

            lixi_wc Li Xi added a comment - Hi Alex, good question. I did assume that page sizes are the same on clients and servers. However, even page sizes are different, it is still the same. In tgt_brw_write(), the server asked to transfer the data into the page with lnb_page_offset. And thus, the server should use this offset of the page to access the data, including writing to disk and calculating the RPC checksum.

            lixi_wc how current approach (which requires matching offsets) works on setups where page size between client and server doesn't match?

            bzzz Alex Zhuravlev added a comment - lixi_wc how current approach (which requires matching offsets) works on setups where page size between client and server doesn't match?
            lixi_wc Li Xi added a comment -

            Hongchao and Alex, I don't understand why LU-10683 should be reverted. If LU-10683 is reverted, it will cause inconsistent offset in page between OSD and OSC, i.e. lnb_page_offset != 'struct brw_page'->off. And that would certainly means checksum error of none-page-aligned data for all checksum types, including crc32, adler and crc32c, because the checksums will be calculated from different start points between OST and OSC. Both osc_checksum_bulk() and tgt_checksum_niobuf() assumes the page offset is properly inited and should be equal to each other.

            That said, I will try to reproduce the probelm without reverting LU-10683. Reverting LU-10683 and reproduce this problem doesn't look correct.

            lixi_wc Li Xi added a comment - Hongchao and Alex, I don't understand why LU-10683 should be reverted. If LU-10683 is reverted, it will cause inconsistent offset in page between OSD and OSC, i.e. lnb_page_offset != 'struct brw_page'->off. And that would certainly means checksum error of none-page-aligned data for all checksum types, including crc32, adler and crc32c, because the checksums will be calculated from different start points between OST and OSC. Both osc_checksum_bulk() and tgt_checksum_niobuf() assumes the page offset is properly inited and should be equal to each other. That said, I will try to reproduce the probelm without reverting LU-10683 . Reverting LU-10683 and reproduce this problem doesn't look correct.
            hongchao.zhang Hongchao Zhang added a comment - - edited

            This issue can be reproduced with checksum type "t10ip512", "t10ip4K", "t10crc512" and "t10crc4K", and won't be
            reproduced by checksum type "crc32", "adler", and "crc32c", the reason is the data offset in pages (in this case, the offset of
            the first page) is taken into account during calculating the "t10*" series checksum.

            the patch from LU-10683 fixed the issue by aligning the data with the client, but it introduced another problem when
            committing the data into ZFS backend.

            hongchao.zhang Hongchao Zhang added a comment - - edited This issue can be reproduced with checksum type "t10ip512", "t10ip4K", "t10crc512" and "t10crc4K", and won't be reproduced by checksum type "crc32", "adler", and "crc32c", the reason is the data offset in pages (in this case, the offset of the first page) is taken into account during calculating the "t10*" series checksum. the patch from LU-10683 fixed the issue by aligning the data with the client, but it introduced another problem when committing the data into ZFS backend.
            lixi_wc Li Xi added a comment -

            I can't reproduce the problem using sanity:810 test even with no patch of this ticket. So, what is the exact process of reproducing the problem?

            lixi_wc Li Xi added a comment - I can't reproduce the problem using sanity:810 test even with no patch of this ticket. So, what is the exact process of reproducing the problem?
            lixi_wc Li Xi added a comment -

            I think multiple problems mixed here: LU-10683 and LU-11663. I am confused how to reproduce the problem.

            lixi_wc Li Xi added a comment - I think multiple problems mixed here: LU-10683 and LU-11663 . I am confused how to reproduce the problem.
            lixi_wc Li Xi added a comment -

            Hi Alex, I am confused. Why reverting LU-10683 helps on reproducingt he problem? I thought the test script of LU-10683 triggers the checksum error, right? RPC checksum should not be related to the OSD level bug.

            lixi_wc Li Xi added a comment - Hi Alex, I am confused. Why reverting LU-10683 helps on reproducingt he problem? I thought the test script of LU-10683 triggers the checksum error, right? RPC checksum should not be related to the OSD level bug.

            lixi_wc, revert 83cb17031913ba2f33a5b67219a03c5605f48f27 (LU-10683) and try that test, please.

            bzzz Alex Zhuravlev added a comment - lixi_wc , revert 83cb17031913ba2f33a5b67219a03c5605f48f27 ( LU-10683 ) and try that test, please.
            lixi_wc Li Xi added a comment -

            @Alex, I can not reproduce the problem even with only patch 33727. Would you please let me know what are the exact steps to reproduce?

            lixi_wc Li Xi added a comment - @Alex, I can not reproduce the problem even with only patch 33727. Would you please let me know what are the exact steps to reproduce?
            lixi_wc Li Xi added a comment - - edited

            With patch 33752 on top of 33727, I can not reproduce the problem. I am not sure I missed anything or not, since I didn't test the original Lustre without these patches.

            # df | grep mnt
            /dev/sdb1                102275928     62196   96970852   1% /mnt/mgs
            /dev/sdb2                120662808     75284  110101768   1% /mnt/mdt
            /dev/sdb3                170773236     61960  162004256   1% /mnt/ost
            10.0.0.37@tcp:/server17  170773236     61960  162004256   1% /mnt/lustre
            # lfs getstripe file
            file
            lmm_stripe_count:  1
            lmm_stripe_size:   1048576
            lmm_pattern:       raid0
            lmm_layout_gen:    0
            lmm_stripe_offset: 0
                    obdidx           objid           objid           group
                         0               2            0x2                0
            
            # cat /proc/fs/lustre/osc/server17-OST0000-osc-ffff8805c03ff000/checksum_type 
            crc32 adler crc32c t10ip512 [t10ip4K] t10crc512 t10crc4K 
            # lctl set_param fail_loc=0x411
            fail_loc=0x411
            # dd if=/dev/urandom of=file bs=10240 count=2
            2+0 records in
            2+0 records out
            20480 bytes (20 kB) copied, 0.0139395 s, 1.5 MB/s
            # md5sum file 
            5f67fa4f3b81beaab254a36276f659ce  file
            # lctl set_param ldlm.namespaces.*osc*.lru_size=clear
            ldlm.namespaces.server17-OST0000-osc-MDT0000.lru_size=clear
            ldlm.namespaces.server17-OST0000-osc-ffff8805c03ff000.lru_size=clear
            # md5sum file 
            5f67fa4f3b81beaab254a36276f659ce  file
            
            lixi_wc Li Xi added a comment - - edited With patch 33752 on top of 33727, I can not reproduce the problem. I am not sure I missed anything or not, since I didn't test the original Lustre without these patches. # df | grep mnt /dev/sdb1 102275928 62196 96970852 1% /mnt/mgs /dev/sdb2 120662808 75284 110101768 1% /mnt/mdt /dev/sdb3 170773236 61960 162004256 1% /mnt/ost 10.0.0.37@tcp:/server17 170773236 61960 162004256 1% /mnt/lustre # lfs getstripe file file lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 0 obdidx objid objid group 0 2 0x2 0 # cat /proc/fs/lustre/osc/server17-OST0000-osc-ffff8805c03ff000/checksum_type crc32 adler crc32c t10ip512 [t10ip4K] t10crc512 t10crc4K # lctl set_param fail_loc=0x411 fail_loc=0x411 # dd if =/dev/urandom of=file bs=10240 count=2 2+0 records in 2+0 records out 20480 bytes (20 kB) copied, 0.0139395 s, 1.5 MB/s # md5sum file 5f67fa4f3b81beaab254a36276f659ce file # lctl set_param ldlm.namespaces.*osc*.lru_size=clear ldlm.namespaces.server17-OST0000-osc-MDT0000.lru_size=clear ldlm.namespaces.server17-OST0000-osc-ffff8805c03ff000.lru_size=clear # md5sum file 5f67fa4f3b81beaab254a36276f659ce file

            People

              lixi_wc Li Xi
              bzzz Alex Zhuravlev
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: