[LU-11597] sanityn test 16a failed with direct I/O Created: 01/Nov/18 Updated: 16/Dec/22 Resolved: 18/Jan/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0, Lustre 2.13.0, Lustre 2.12.2, Lustre 2.12.3, Lustre 2.14.0, Lustre 2.12.4, Lustre 2.12.5 |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Jian Yu | Assignee: | Jian Yu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | arm, ppc | ||
| Environment: |
Lustre Build: https://build.whamcloud.com/job/lustre-master/3811 |
||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
|
sanity test 241b failed with direct I/O: == sanity test 241b: dio vs dio ====================================================================== 08:45:10 (1540543510) 1+0 records in 1+0 records out 40960 bytes (41 kB) copied, 0.00274397 s, 14.9 MB/s -rw-r--r-- 1 root root 40960 Oct 26 08:45 /mnt/lustre/f241b.sanity sanity test_241b: @@@@@@ FAIL: test_241b failed with 1 Maloo report: https://testing.whamcloud.com/test_sets/88bbf5c2-d9d0-11e8-b46b-52540065bddc sanity tests 270a, 315, and sanityn test 16a also failed with the same issue: |
| Comments |
| Comment by Jian Yu [ 06/Nov/18 ] |
|
Page size of ARM processor is 64kB: # uname -a Linux trevis-79vm36.trevis.whamcloud.com 4.14.0-49.13.1.el7a.aarch64 #1 SMP Thu Sep 27 14:45:52 UTC 2018 aarch64 aarch64 aarch64 GNU/Linux # getconf PAGE_SIZE 65536 while it's 4kB on x86_64 processor: # uname -a Linux trevis-58vm4.trevis.whamcloud.com 3.10.0-862.14.4.el7_lustre.x86_64 #1 SMP Fri Oct 12 14:51:33 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux # getconf PAGE_SIZE 4096 |
| Comment by Jian Yu [ 06/Nov/18 ] |
|
On an ARM client with x86_64 servers, writing a file less than 65536 bytes with direct I/O mode hit "-EINVAL" failure: # yes | dd of=/mnt/lustre/f1 bs=65535 count=1 oflag=direct dd: error writing ‘/mnt/lustre/f1’: Invalid argument 1+0 records in 0+0 records out 0 bytes (0 B) copied, 0.00356834 s, 0.0 kB/s Debug log on client showed that: 00000020:00000001:0.0:1541493387.509116:1776:18933:0:(cl_io.c:558:cl_io_start()) Process entered 00000080:00000001:0.0:1541493387.509118:1936:18933:0:(vvp_io.c:1036:vvp_io_write_start()) Process entered 00000080:00200000:0.0:1541493387.509121:1936:18933:0:(vvp_io.c:1061:vvp_io_write_start()) f1: write [0, 65535) 00000020:00000001:0.0:1541493387.509123:1984:18933:0:(cl_object.c:413:cl_object_maxbytes()) Process entered 00020000:00000002:0.0:1541493387.509124:2096:18933:0:(lov_object.c:1075:lov_conf_freeze()) To take share lov(ffff80002f4e0480) owner (null)/ffff8000300f4400 00020000:00000002:0.0:1541493387.509127:2096:18933:0:(lov_object.c:2092:lov_lsm_addref()) lsm ffff8000329f9a80 addref 2/0 by ffff8000300f4400. 00020000:00000002:0.0:1541493387.509129:2096:18933:0:(lov_object.c:1083:lov_conf_thaw()) To release share lov(ffff80002f4e0480) owner (null)/ffff8000300f4400 00000020:00000001:0.0:1541493387.509131:2016:18933:0:(cl_object.c:421:cl_object_maxbytes()) Process leaving (rc=17592186040320 : 17592186040320 : ffffffff000) 00000080:00000001:0.0:1541493387.509137:1968:18933:0:(vvp_io.c:1133:vvp_io_write_start()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea) 00000020:00000001:0.0:1541493387.509139:1808:18933:0:(cl_io.c:570:cl_io_start()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea) Looking into vvp_io_write_start()->__generic_file_write_iter()... |
| Comment by Jian Yu [ 07/Nov/18 ] |
|
On x86_64 client with 4kB page size, writing a file less than 4096 bytes with direct I/O mode also hit "-EINVAL" failure: # yes | dd of=/mnt/lustre/f2 bs=4095 count=1 oflag=direct dd: error writing ‘/mnt/lustre/f2’: Invalid argument 1+0 records in 0+0 records out 0 bytes (0 B) copied, 0.0014026 s, 0.0 kB/s So, under direct I/O mode, the bytes to be written at a time should not be less than one page size. I'm creating a patch to update the test scripts. |
| Comment by Gerrit Updater [ 09/Nov/18 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33636 |
| Comment by Gerrit Updater [ 13/Nov/18 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33636/ |
| Comment by Peter Jones [ 13/Nov/18 ] |
|
Landed for 2.12 |
| Comment by James Nunez (Inactive) [ 15/Dec/18 ] |
|
I'm reopening this ticket because sanityn test 16a, as mentioned in the description, is still failing for ARM. Logs for recent test failures are at |
| Comment by James Nunez (Inactive) [ 12/Feb/20 ] |
|
Also, PPC client testing fails this test 100% of the time; https://testing.whamcloud.com/test_sets/f25e7616-4a6e-11ea-b69a-52540065bddc. |
| Comment by Gerrit Updater [ 13/Feb/20 ] |
|
James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37561 |
| Comment by Gerrit Updater [ 15/Feb/20 ] |
|
Wang Shilong (wshilong@ddn.com) uploaded a new patch: https://review.whamcloud.com/37589 |
| Comment by Gerrit Updater [ 20/Feb/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37561/ |
| Comment by James Nunez (Inactive) [ 20/Feb/20 ] |
|
The patch that landed, https://review.whamcloud.com/37561/, puts sanityn tests 16a and 71a on the ALWAYS_EXCEPT list for PPC client testing. This ticket should remain open until those tests are fixed and the tests are taken off the list. |
| Comment by Gerrit Updater [ 17/Nov/20 ] |
|
James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40660 |
| Comment by Gerrit Updater [ 04/Mar/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40660/ |
| Comment by Xinliang Liu [ 29/Oct/21 ] |
|
Only sanityn test 16a fails now, change the title. |
| Comment by Gerrit Updater [ 18/Jan/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/37589/ |