Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11729

ARM: sanity test_810: BAD WRITE CHECKSUM with adler

Details

    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/a298010c-f721-11e8-86c0-52540065bddc

      The test_810 added for LU-11663 fails on ARM on ldiskfs with the following test error:

      osc.lustre-OST0006-osc-ffff800039d7f800.checksum_type=adler
      fail_loc=0x411
      dd: error writing '/mnt/lustre/f810.sanity': Input/output error
      6bf5f3489c417a2e6f9e223278d93278  /mnt/lustre/f810.sanity != d375c4c8a12ae6de34e09e696c3725b1  /mnt/lustre/f810.sanity
      

      The client console logs show the checksums do not match between the client and server so there is still some kind of alignment problem:

      LustreError: 132-0: lustre-OST0000-osc-ffff800039d7f800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 10.9.3.137@tcp inode [0x200006991:0xc:0x0] object 0x0:27458 extent [10240-20479], original client csum 6a00237 (type 20), server csum ab0036d (type 20), client csum now 6a00237
      LustreError: 22024:0:(osc_request.c:1923:osc_brw_redo_request()) @@@ redo for recoverable error -11  req@ffff80003069f300 x1618848577499536/t42949672971(42949672971) o4->lustre-OST0000-osc-ffff800039d7f800@10.9.3.137@tcp:6/4 lens 488/416 e 0 to 0 dl 1543857802 ref 2 fl Interpret:RM/0/0 rc 0/0
      LustreError: 22024:0:(osc_request.c:2048:brw_interpret()) lustre-OST0000-osc-ffff800039d7f800: too many resent retries for object: 0:27458, rc = -11.
      

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity test_810 - 6bf5f3489c417a2e6f9e223278d93278 /mnt/lustre/f810.sanity != d375c4c8a12ae6de34e09e696c3725b1 /mnt/lustre/f810.sanity

      Attachments

        Issue Links

          Activity

            [LU-11729] ARM: sanity test_810: BAD WRITE CHECKSUM with adler

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36205/
            Subject: LU-11729 obdclass: align to T10 sector size when generating guard
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 38e83f124a41b633da02073b76cf20495bef3919

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36205/ Subject: LU-11729 obdclass: align to T10 sector size when generating guard Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 38e83f124a41b633da02073b76cf20495bef3919

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36205
            Subject: LU-11729 obdclass: align to T10 sector size when generating guard
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: c78fa4e0f024d4823c6d867d04c108293f6d0859

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36205 Subject: LU-11729 obdclass: align to T10 sector size when generating guard Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: c78fa4e0f024d4823c6d867d04c108293f6d0859
            pjones Peter Jones added a comment -

            Landed for 2.13

            pjones Peter Jones added a comment - Landed for 2.13

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34043/
            Subject: LU-11729 obdclass: align to T10 sector size when generating guard
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 98ceaf854bb4738305769c5cd1df556ee99aa859

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34043/ Subject: LU-11729 obdclass: align to T10 sector size when generating guard Project: fs/lustre-release Branch: master Current Patch Set: Commit: 98ceaf854bb4738305769c5cd1df556ee99aa859

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34043
            Subject: LU-11729 tests: verify checksum types in sanity test_810
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 6cbddb97cf05cdd2c9d229cf218912e6881cc64a

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34043 Subject: LU-11729 tests: verify checksum types in sanity test_810 Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 6cbddb97cf05cdd2c9d229cf218912e6881cc64a
            pjones Peter Jones added a comment -

            Dongyang

            Could you please investigate?

            Thanks

            Peter

            pjones Peter Jones added a comment - Dongyang Could you please investigate? Thanks Peter

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33864/
            Subject: LU-11729 tests: skip sanity test 810 for ARM
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: a6239c48da38ff0da4564da496766deebc88923f

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33864/ Subject: LU-11729 tests: skip sanity test 810 for ARM Project: fs/lustre-release Branch: master Current Patch Set: Commit: a6239c48da38ff0da4564da496766deebc88923f

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33864
            Subject: LU-11729 tests: skip sanity test 810 for ARM
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: e5498fc90e2b7809a00220b0cf18c1ac9a730a86

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33864 Subject: LU-11729 tests: skip sanity test 810 for ARM Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: e5498fc90e2b7809a00220b0cf18c1ac9a730a86
            adilger Andreas Dilger added a comment - - edited

            You are right, it may be a duplicate of the T10-PI ticket then. It may be that "lfs set_param" is trying to set the checksum type to adler, but this type is not available? It should always be one of the supported checksum types, but possibly this has been lost from the code.

            Before this bug is closed again it would make sense to improve test_810 to test all of the available checksum types listed from "lctl get_param osc.*OST0000*.checksum_type" to ensure they are all working for this test case.

            adilger Andreas Dilger added a comment - - edited You are right, it may be a duplicate of the T10-PI ticket then. It may be that " lfs set_param " is trying to set the checksum type to adler , but this type is not available? It should always be one of the supported checksum types, but possibly this has been lost from the code. Before this bug is closed again it would make sense to improve test_810 to test all of the available checksum types listed from " lctl get_param osc.*OST0000*.checksum_type " to ensure they are all working for this test case.
            lixi_wc Li Xi added a comment - - edited

            > This is definitely not a duplicate of the T10 checksum bug. The sanity test_810 explicitly uses adler as the checksum type.

            Oh, It is strange then, because according the following log, the checksum type used is T10PI4K (type 0x20), not adler (type 0x2).

            I guess the problem might be caused by the delay/failure of setting checksum type?

            LustreError: 132-0: lustre-OST0000-osc-ffff800039d7f800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 10.9.3.137@tcp inode [0x200006991:0xc:0x0] object 0x0:27458 extent [10240-20479], original client csum 6a00237 (type 20), server csum ab0036d (type 20), client csum now 6a00237
            
            lixi_wc Li Xi added a comment - - edited > This is definitely not a duplicate of the T10 checksum bug. The sanity test_810 explicitly uses adler as the checksum type. Oh, It is strange then, because according the following log, the checksum type used is T10PI4K (type 0x20), not adler (type 0x2). I guess the problem might be caused by the delay/failure of setting checksum type? LustreError: 132-0: lustre-OST0000-osc-ffff800039d7f800: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 10.9.3.137@tcp inode [0x200006991:0xc:0x0] object 0x0:27458 extent [10240-20479], original client csum 6a00237 (type 20), server csum ab0036d (type 20), client csum now 6a00237

            People

              dongyang Dongyang Li
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: