Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Lustre 2.4.0
-
None
-
Lustre master branch
-
3
-
8566
Description
Our testing system shows, that there is failed test eplay-ost-single.test_5
Lustre: DEBUG MARKER: == replay-ost-single test 5: Fail OST during iozone == 21:21:13 (1369851673) Lustre: Failing over lustre-OST0000 LustreError: 11-0: an error occurred while communicating with 0@lo. The ost_write operation failed with -19 LustreError: Skipped 1 previous similar message Lustre: lustre-OST0000-osc-ffff8800514d3400: Connection to lustre-OST0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 1 previous similar message Lustre: lustre-OST0000: shutting down for failover; client state will be preserved. Lustre: OST lustre-OST0000 has stopped. Lustre: server umount lustre-OST0000 complete LustreError: 137-5: UUID 'lustre-OST0000_UUID' is not available for connect (no target) LustreError: Skipped 1 previous similar message LDISKFS-fs (loop1): mounted filesystem with ordered data mode. Opts: LDISKFS-fs (loop1): mounted filesystem with ordered data mode. Opts: Lustre: 16962:0:(ldlm_lib.c:2195:target_recovery_init()) RECOVERY: service lustre-OST0000, 2 recoverable clients, last_transno 1322 Lustre: lustre-OST0000: Now serving lustre-OST0000 on /dev/loop1 with recovery enabled Lustre: 2398:0:(ldlm_lib.c:1021:target_handle_connect()) lustre-OST0000: connection from lustre-MDT0000-mdtlov_UUID@0@lo recovering/t0 exp ffff88005ca19c00 cur 1369851700 last 1369851697 Lustre: 2398:0:(ldlm_lib.c:1021:target_handle_connect()) Skipped 3 previous similar messages Lustre: lustre-OST0000: Will be in recovery for at least 1:00, or until 2 clients reconnect Lustre: lustre-OST0000: Recovery over after 0:01, of 2 clients 2 recovered and 0 were evicted. Lustre: lustre-OST0000-osc-MDT0000: Connection restored to lustre-OST0000 (at 0@lo) Lustre: Skipped 1 previous similar message LustreError: 1716:0:(osc_request.c:1232:check_write_rcs()) Unexpected # bytes transferred: 65536 (requested 32768) LustreError: 1716:0:(osc_request.c:1232:check_write_rcs()) Unexpected # bytes transferred: 2097152 (requested 1048576) Lustre: lustre-OST0000: received MDS connection from 0@lo Lustre: MDS mdd_obd-lustre-MDT0000: lustre-OST0000_UUID now active, resetting orphans Lustre: DEBUG MARKER: iozone rc=1 Lustre: DEBUG MARKER: replay-ost-single test_5: @@@@@@ FAIL: iozone failed
This messages looks related to 4mb IO patch
LustreError: 1716:0:(osc_request.c:1232:check_write_rcs()) Unexpected # bytes transferred: 65536 (requested 32768) LustreError: 1716:0:(osc_request.c:1232:check_write_rcs()) Unexpected # bytes transferred: 2097152 (requested 1048576)
I believe, that this test is failed in master branch, but they skip it as SLOW during testing
https://maloo.whamcloud.com/test_sets/dd033a98-7264-11e2-aad1-52540035b04c
test_5 SKIP 0 0 skipping SLOW test 5
Artem, in the future, if the problem being reported is not actually matching the release version as tagged in git please at a minimum include a description of the version (e.g. "git describe" in your tree, including the change numbers of relevant patches already applied) in the Environment section of the bug, and ideally a pointer to a git repo with the actual tree being tested.
To file a bug marked "2.4.0" and described as "master branch" in June after the 2.4.0 release, but missing a patch that was actually landed to master in March before 2.3.62 tag makes it difficult for us to determine what the actual problem is. I'm glad that Keith could isolate this problem so quickly with the Maloo test logs, and that the patch fixed the problem for you, but for other bugs this may not be so easily done.