[LU-4037] Failure on test suite sanity test_78: rdwr failed Created: 01/Oct/13 Updated: 10/Oct/21 Resolved: 10/Oct/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | zfs | ||
| Environment: |
server and client: lustre-master build # 1687 |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 10846 | ||||||||
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/32766b8c-26c7-11e3-83d1-52540035b04c. The sub-test test_78 failed with the following error:
test log == sanity test 78: handle large O_DIRECT writes correctly ============== 20:40:26 (1380166826) MemFree: 1247, Max file size: 1400000 MemTotal: 1877 Mem to use for directio: 810 Smallest OST: 169728 File size: 512 directIO rdwr round 1 of 5 directio on /mnt/lustre/f.sanity.78 for 102x1048576 bytes PASS directIO rdwr round 2 of 5 directio on /mnt/lustre/f.sanity.78 for 128x1048576 bytes Write error Success (rc = 114294784, len = 134217728) sanity test_78: @@@@@@ FAIL: rdwr failed |
| Comments |
| Comment by Andreas Dilger [ 02/Oct/13 ] |
|
So the system call returned 109 * 1048576 = 114294784 instead of the expected 128 * 1048576 = 134217728. This test has failed a few times in the past month, but is typically skipped because it is marked SLOW. |
| Comment by Andreas Dilger [ 02/Oct/13 ] |
|
My first guess would be that the size of the O_DIRECT call is being limited for some reason, and it is returning a short write to the caller. The returned value is the same in the four test failures that I can check, but there are more failures dating back to 2013-04-10 (https://maloo.whamcloud.com/sub_tests/c637000e-a204-11e2-bdac-52540035b04c) that do not have logs. |
| Comment by Mark Mansk [ 04/Jun/14 ] |
|
We're seeing this start to fail at Cray with 2.5.1. MemFree: 30795, Max file size: 400000 off the console logs: 2014-06-04T02:48:50.946766-05:00 c0-0c0s3n2 LNet: 6646:0:(gnilnd_cb.c:867:kgnilnd_verify_rdma_cksum()) $$ no RDMA payload checksum when enabled from 14@gni4 msg@0xffff8807aa5cd118 m/v/ty/ck/pck/pl b00fbabe/8/16/fae0/0/0 x948646:GNILND_MSG_GET_DONE_REV 2014-06-04T02:48:50.946814-05:00 c0-0c0s3n2 LNet: 6646:0:(gnilnd_cb.c:867:kgnilnd_verify_rdma_cksum()) Skipped 1645 previous similar messages 2014-06-04T02:49:21.361769-05:00 c0-0c0s3n2 LustreError: 11251:0:(ofd_grant.c:255:ofd_grant_space_left()) dal-OST0000: cli 86a844b0-7844-5210-51 0c-2101e0354cd0/ffff880871927c00 left 51163136 < tot_grant 52605696 unstable 0 pending 0 2014-06-04T02:49:21.361821-05:00 c0-0c0s3n2 LustreError: 11251:0:(ofd_grant.c:255:ofd_grant_space_left()) Skipped 5 previous similar messages 2014-06-04T02:49:21.884851-05:00 c0-0c0s2n2 Lustre: DEBUG MARKER: sanity test_78: @@@@@@ FAIL: rdwr failed This test hasn't failed before. |