Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3438

replay-ost-single test_5 failed with error int check_write_rcs() "Unexpected # bytes transferred"

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.4.0
    • None
    • Lustre master branch
    • 3
    • 8566

    Description

      Our testing system shows, that there is failed test eplay-ost-single.test_5

      Lustre: DEBUG MARKER: == replay-ost-single test 5: Fail OST during iozone == 21:21:13 (1369851673)
      Lustre: Failing over lustre-OST0000
      LustreError: 11-0: an error occurred while communicating with 0@lo. The ost_write operation failed with -19
      LustreError: Skipped 1 previous similar message
      Lustre: lustre-OST0000-osc-ffff8800514d3400: Connection to lustre-OST0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
      Lustre: Skipped 1 previous similar message
      Lustre: lustre-OST0000: shutting down for failover; client state will be preserved.
      Lustre: OST lustre-OST0000 has stopped.
      Lustre: server umount lustre-OST0000 complete
      LustreError: 137-5: UUID 'lustre-OST0000_UUID' is not available for connect (no target)
      LustreError: Skipped 1 previous similar message
      LDISKFS-fs (loop1): mounted filesystem with ordered data mode. Opts: 
      LDISKFS-fs (loop1): mounted filesystem with ordered data mode. Opts: 
      Lustre: 16962:0:(ldlm_lib.c:2195:target_recovery_init()) RECOVERY: service lustre-OST0000, 2 recoverable clients, last_transno 1322
      Lustre: lustre-OST0000: Now serving lustre-OST0000 on /dev/loop1 with recovery enabled
      Lustre: 2398:0:(ldlm_lib.c:1021:target_handle_connect()) lustre-OST0000: connection from lustre-MDT0000-mdtlov_UUID@0@lo recovering/t0 exp ffff88005ca19c00 cur 1369851700 last 1369851697
      Lustre: 2398:0:(ldlm_lib.c:1021:target_handle_connect()) Skipped 3 previous similar messages
      Lustre: lustre-OST0000: Will be in recovery for at least 1:00, or until 2 clients reconnect
      Lustre: lustre-OST0000: Recovery over after 0:01, of 2 clients 2 recovered and 0 were evicted.
      Lustre: lustre-OST0000-osc-MDT0000: Connection restored to lustre-OST0000 (at 0@lo)
      Lustre: Skipped 1 previous similar message
      LustreError: 1716:0:(osc_request.c:1232:check_write_rcs()) Unexpected # bytes transferred: 65536 (requested 32768)
      LustreError: 1716:0:(osc_request.c:1232:check_write_rcs()) Unexpected # bytes transferred: 2097152 (requested 1048576)
      Lustre: lustre-OST0000: received MDS connection from 0@lo
      Lustre: MDS mdd_obd-lustre-MDT0000: lustre-OST0000_UUID now active, resetting orphans
      Lustre: DEBUG MARKER: iozone rc=1
      Lustre: DEBUG MARKER: replay-ost-single test_5: @@@@@@ FAIL: iozone failed
      

      This messages looks related to 4mb IO patch

      LustreError: 1716:0:(osc_request.c:1232:check_write_rcs()) Unexpected # bytes transferred: 65536 (requested 32768)
      LustreError: 1716:0:(osc_request.c:1232:check_write_rcs()) Unexpected # bytes transferred: 2097152 (requested 1048576)
      

      I believe, that this test is failed in master branch, but they skip it as SLOW during testing
      https://maloo.whamcloud.com/test_sets/dd033a98-7264-11e2-aad1-52540035b04c

      test_5	SKIP	0	0	skipping SLOW test 5
      

      Attachments

        Issue Links

          Activity

            [LU-3438] replay-ost-single test_5 failed with error int check_write_rcs() "Unexpected # bytes transferred"

            Artem, in the future, if the problem being reported is not actually matching the release version as tagged in git please at a minimum include a description of the version (e.g. "git describe" in your tree, including the change numbers of relevant patches already applied) in the Environment section of the bug, and ideally a pointer to a git repo with the actual tree being tested.

            To file a bug marked "2.4.0" and described as "master branch" in June after the 2.4.0 release, but missing a patch that was actually landed to master in March before 2.3.62 tag makes it difficult for us to determine what the actual problem is. I'm glad that Keith could isolate this problem so quickly with the Maloo test logs, and that the patch fixed the problem for you, but for other bugs this may not be so easily done.

            adilger Andreas Dilger added a comment - Artem, in the future, if the problem being reported is not actually matching the release version as tagged in git please at a minimum include a description of the version (e.g. "git describe" in your tree, including the change numbers of relevant patches already applied) in the Environment section of the bug, and ideally a pointer to a git repo with the actual tree being tested. To file a bug marked "2.4.0" and described as "master branch" in June after the 2.4.0 release, but missing a patch that was actually landed to master in March before 2.3.62 tag makes it difficult for us to determine what the actual problem is. I'm glad that Keith could isolate this problem so quickly with the Maloo test logs, and that the patch fixed the problem for you, but for other bugs this may not be so easily done.

            Andreas, after we applied http://review.whamcloud.com/#change,5532 this problem is gone. Thanks!
            I think we can close this issue.

            artem_blagodarenko Artem Blagodarenko (Inactive) added a comment - Andreas, after we applied http://review.whamcloud.com/#change,5532 this problem is gone. Thanks! I think we can close this issue.

            Artem, are you actually hitting this with the 2.4.0 release code, or is this a 2.1 branch with the 4MB patch applied?

            adilger Andreas Dilger added a comment - Artem, are you actually hitting this with the 2.4.0 release code, or is this a 2.1 branch with the 4MB patch applied?

            Earlier encounter of this issue.

            keith Keith Mannthey (Inactive) added a comment - Earlier encounter of this issue.

            This test runs just not with every patch review. You can do a subtest search in Maloo (it is a little slow but works). https://maloo.whamcloud.com/sub_tests/query

            This is an 2.4 -RC1 run that passed:
            https://maloo.whamcloud.com/test_sets/92b8f0d2-cdf3-11e2-ba28-52540035b04c

            There was some trouble with this test a while ago. Please see LU-2817: our testing has not failed since http://review.whamcloud.com/#change,5532 landed.

            keith Keith Mannthey (Inactive) added a comment - This test runs just not with every patch review. You can do a subtest search in Maloo (it is a little slow but works). https://maloo.whamcloud.com/sub_tests/query This is an 2.4 -RC1 run that passed: https://maloo.whamcloud.com/test_sets/92b8f0d2-cdf3-11e2-ba28-52540035b04c There was some trouble with this test a while ago. Please see LU-2817 : our testing has not failed since http://review.whamcloud.com/#change,5532 landed.

            Issue related to LU-1431.

            artem_blagodarenko Artem Blagodarenko (Inactive) added a comment - Issue related to LU-1431 .

            Could you, please, start this test (it marked as SLOW) and check if it failed?

            artem_blagodarenko Artem Blagodarenko (Inactive) added a comment - Could you, please, start this test (it marked as SLOW) and check if it failed?

            People

              keith Keith Mannthey (Inactive)
              artem_blagodarenko Artem Blagodarenko (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: