Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11597

sanityn test 16a failed with direct I/O

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.15.0
    • Lustre 2.12.0, Lustre 2.13.0, Lustre 2.12.2, Lustre 2.12.3, Lustre 2.14.0, Lustre 2.12.4, Lustre 2.12.5
    • Lustre Build: https://build.whamcloud.com/job/lustre-master/3811
      Distro/Arch: RHEL7.5/aarch64 (client), RHEL7.5/x86_64 (server)
    • 3
    • 9223372036854775807

    Description

      sanity test 241b failed with direct I/O:

      == sanity test 241b: dio vs dio ====================================================================== 08:45:10 (1540543510)
      1+0 records in
      1+0 records out
      40960 bytes (41 kB) copied, 0.00274397 s, 14.9 MB/s
      -rw-r--r-- 1 root root 40960 Oct 26 08:45 /mnt/lustre/f241b.sanity
       sanity test_241b: @@@@@@ FAIL: test_241b failed with 1 
      

      Maloo report: https://testing.whamcloud.com/test_sets/88bbf5c2-d9d0-11e8-b46b-52540065bddc

      sanity tests 270a, 315, and sanityn test 16a also failed with the same issue:
      https://testing.whamcloud.com/test_sets/88bbf5c2-d9d0-11e8-b46b-52540065bddc
      https://testing.whamcloud.com/test_sets/8b2f4282-d9d0-11e8-b46b-52540065bddc

      Attachments

        Issue Links

          Activity

            [LU-11597] sanityn test 16a failed with direct I/O

            Wang Shilong (wshilong@ddn.com) uploaded a new patch: https://review.whamcloud.com/37589
            Subject: LU-11597 test: fix sanityn 16a to align page size
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: c23ab3e322cfc6b585b39c97d6bbec77127e6f2b

            gerrit Gerrit Updater added a comment - Wang Shilong (wshilong@ddn.com) uploaded a new patch: https://review.whamcloud.com/37589 Subject: LU-11597 test: fix sanityn 16a to align page size Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c23ab3e322cfc6b585b39c97d6bbec77127e6f2b

            James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37561
            Subject: LU-11597 tests: skip sanityn tests for PPC
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: bfe402a758e752124a0d081f5bb5bde4b95566bb

            gerrit Gerrit Updater added a comment - James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37561 Subject: LU-11597 tests: skip sanityn tests for PPC Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: bfe402a758e752124a0d081f5bb5bde4b95566bb

            Also, PPC client testing fails this test 100% of the time; https://testing.whamcloud.com/test_sets/f25e7616-4a6e-11ea-b69a-52540065bddc.

            jamesanunez James Nunez (Inactive) added a comment - Also, PPC client testing fails this test 100% of the time; https://testing.whamcloud.com/test_sets/f25e7616-4a6e-11ea-b69a-52540065bddc .

            I'm reopening this ticket because sanityn test 16a, as mentioned in the description, is still failing for ARM.

            Logs for recent test failures are at
            https://testing.whamcloud.com/test_sets/f1121dd4-fdef-11e8-b837-52540065bddc
            https://testing.whamcloud.com/test_sets/1de368ba-fa38-11e8-bb6b-52540065bddc

            jamesanunez James Nunez (Inactive) added a comment - I'm reopening this ticket because sanityn test 16a, as mentioned in the description, is still failing for ARM. Logs for recent test failures are at https://testing.whamcloud.com/test_sets/f1121dd4-fdef-11e8-b837-52540065bddc https://testing.whamcloud.com/test_sets/1de368ba-fa38-11e8-bb6b-52540065bddc
            pjones Peter Jones added a comment -

            Landed for 2.12

            pjones Peter Jones added a comment - Landed for 2.12

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33636/
            Subject: LU-11597 tests: fix O_DIRECT test usage for ARM
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: f602b5ec7f45713122abd615a97a13d7c97d460e

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33636/ Subject: LU-11597 tests: fix O_DIRECT test usage for ARM Project: fs/lustre-release Branch: master Current Patch Set: Commit: f602b5ec7f45713122abd615a97a13d7c97d460e

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33636
            Subject: LU-11597 tests: fix O_DIRECT test usage for ARM
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: e26c7a9c6e747de442d5f37cf14b352ca3e4b365

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33636 Subject: LU-11597 tests: fix O_DIRECT test usage for ARM Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: e26c7a9c6e747de442d5f37cf14b352ca3e4b365
            yujian Jian Yu added a comment -

            On x86_64 client with 4kB page size, writing a file less than 4096 bytes with direct I/O mode also hit "-EINVAL" failure:

            # yes | dd of=/mnt/lustre/f2 bs=4095 count=1 oflag=direct
            dd: error writing ‘/mnt/lustre/f2’: Invalid argument
            1+0 records in
            0+0 records out
            0 bytes (0 B) copied, 0.0014026 s, 0.0 kB/s
            

            So, under direct I/O mode, the bytes to be written at a time should not be less than one page size. I'm creating a patch to update the test scripts.

            yujian Jian Yu added a comment - On x86_64 client with 4kB page size, writing a file less than 4096 bytes with direct I/O mode also hit "-EINVAL" failure: # yes | dd of=/mnt/lustre/f2 bs=4095 count=1 oflag=direct dd: error writing ‘/mnt/lustre/f2’: Invalid argument 1+0 records in 0+0 records out 0 bytes (0 B) copied, 0.0014026 s, 0.0 kB/s So, under direct I/O mode, the bytes to be written at a time should not be less than one page size. I'm creating a patch to update the test scripts.
            yujian Jian Yu added a comment -

            On an ARM client with x86_64 servers, writing a file less than 65536 bytes with direct I/O mode hit "-EINVAL" failure:

            # yes | dd of=/mnt/lustre/f1 bs=65535 count=1 oflag=direct
            dd: error writing ‘/mnt/lustre/f1’: Invalid argument
            1+0 records in
            0+0 records out
            0 bytes (0 B) copied, 0.00356834 s, 0.0 kB/s
            

            Debug log on client showed that:

            00000020:00000001:0.0:1541493387.509116:1776:18933:0:(cl_io.c:558:cl_io_start()) Process entered
            00000080:00000001:0.0:1541493387.509118:1936:18933:0:(vvp_io.c:1036:vvp_io_write_start()) Process entered
            00000080:00200000:0.0:1541493387.509121:1936:18933:0:(vvp_io.c:1061:vvp_io_write_start()) f1: write [0, 65535)
            00000020:00000001:0.0:1541493387.509123:1984:18933:0:(cl_object.c:413:cl_object_maxbytes()) Process entered
            00020000:00000002:0.0:1541493387.509124:2096:18933:0:(lov_object.c:1075:lov_conf_freeze()) To take share lov(ffff80002f4e0480) owner           (null)/ffff8000300f4400
            00020000:00000002:0.0:1541493387.509127:2096:18933:0:(lov_object.c:2092:lov_lsm_addref()) lsm ffff8000329f9a80 addref 2/0 by ffff8000300f4400.
            00020000:00000002:0.0:1541493387.509129:2096:18933:0:(lov_object.c:1083:lov_conf_thaw()) To release share lov(ffff80002f4e0480) owner           (null)/ffff8000300f4400
            00000020:00000001:0.0:1541493387.509131:2016:18933:0:(cl_object.c:421:cl_object_maxbytes()) Process leaving (rc=17592186040320 : 17592186040320 : ffffffff000)
            00000080:00000001:0.0:1541493387.509137:1968:18933:0:(vvp_io.c:1133:vvp_io_write_start()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
            00000020:00000001:0.0:1541493387.509139:1808:18933:0:(cl_io.c:570:cl_io_start()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
            

            Looking into vvp_io_write_start()->__generic_file_write_iter()...

            yujian Jian Yu added a comment - On an ARM client with x86_64 servers, writing a file less than 65536 bytes with direct I/O mode hit "-EINVAL" failure: # yes | dd of=/mnt/lustre/f1 bs=65535 count=1 oflag=direct dd: error writing ‘/mnt/lustre/f1’: Invalid argument 1+0 records in 0+0 records out 0 bytes (0 B) copied, 0.00356834 s, 0.0 kB/s Debug log on client showed that: 00000020:00000001:0.0:1541493387.509116:1776:18933:0:(cl_io.c:558:cl_io_start()) Process entered 00000080:00000001:0.0:1541493387.509118:1936:18933:0:(vvp_io.c:1036:vvp_io_write_start()) Process entered 00000080:00200000:0.0:1541493387.509121:1936:18933:0:(vvp_io.c:1061:vvp_io_write_start()) f1: write [0, 65535) 00000020:00000001:0.0:1541493387.509123:1984:18933:0:(cl_object.c:413:cl_object_maxbytes()) Process entered 00020000:00000002:0.0:1541493387.509124:2096:18933:0:(lov_object.c:1075:lov_conf_freeze()) To take share lov(ffff80002f4e0480) owner (null)/ffff8000300f4400 00020000:00000002:0.0:1541493387.509127:2096:18933:0:(lov_object.c:2092:lov_lsm_addref()) lsm ffff8000329f9a80 addref 2/0 by ffff8000300f4400. 00020000:00000002:0.0:1541493387.509129:2096:18933:0:(lov_object.c:1083:lov_conf_thaw()) To release share lov(ffff80002f4e0480) owner (null)/ffff8000300f4400 00000020:00000001:0.0:1541493387.509131:2016:18933:0:(cl_object.c:421:cl_object_maxbytes()) Process leaving (rc=17592186040320 : 17592186040320 : ffffffff000) 00000080:00000001:0.0:1541493387.509137:1968:18933:0:(vvp_io.c:1133:vvp_io_write_start()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea) 00000020:00000001:0.0:1541493387.509139:1808:18933:0:(cl_io.c:570:cl_io_start()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea) Looking into vvp_io_write_start()->__generic_file_write_iter()...
            yujian Jian Yu added a comment -

            Page size of ARM processor is 64kB:

            # uname -a
            Linux trevis-79vm36.trevis.whamcloud.com 4.14.0-49.13.1.el7a.aarch64 #1 SMP Thu Sep 27 14:45:52 UTC 2018 aarch64 aarch64 aarch64 GNU/Linux
            # getconf PAGE_SIZE
            65536
            

            while it's 4kB on x86_64 processor:

            # uname -a
            Linux trevis-58vm4.trevis.whamcloud.com 3.10.0-862.14.4.el7_lustre.x86_64 #1 SMP Fri Oct 12 14:51:33 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
            # getconf PAGE_SIZE
            4096
            
            yujian Jian Yu added a comment - Page size of ARM processor is 64kB: # uname -a Linux trevis-79vm36.trevis.whamcloud.com 4.14.0-49.13.1.el7a.aarch64 #1 SMP Thu Sep 27 14:45:52 UTC 2018 aarch64 aarch64 aarch64 GNU/Linux # getconf PAGE_SIZE 65536 while it's 4kB on x86_64 processor: # uname -a Linux trevis-58vm4.trevis.whamcloud.com 3.10.0-862.14.4.el7_lustre.x86_64 #1 SMP Fri Oct 12 14:51:33 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux # getconf PAGE_SIZE 4096

            People

              yujian Jian Yu
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: