Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4198

Improve IO performance when using DIRECT IO using libaio

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.14.0
    • Lustre 2.4.1
    • Seen in two environments. AWS cloud (Robert R.) and a dual-OSS setup (3 SSD per OST) over 2x10 GbE.
    • 3
    • 11385

    Description

      Attached to this Jira are some numbers from the direct IO tests. Write operations only.

      It was noticed that setting RPCs in flight to 256 in these tests gives poorer performance. max rpc here is set to 32.

      • A sample FIO output:
        fio.4k.write.1.23499: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
        fio-2.1.2
        Starting 1 process
        fio.4k.write.1.23499: Laying out IO file(s) (1 file(s) / 10MB)
        
        fio.4k.write.1.23499: (groupid=0, jobs=1): err= 0: pid=10709: Fri Nov  1 11:47:29 2013
          write: io=10240KB, bw=2619.7KB/s, iops=654, runt=  3909msec
            clat (usec): min=579, max=5283, avg=1520.43, stdev=1216.20
             lat (usec): min=580, max=5299, avg=1521.37, stdev=1216.22
            clat percentiles (usec):
             |  1.00th=[  604],  5.00th=[  652], 10.00th=[  668], 20.00th=[  708],
             | 30.00th=[  732], 40.00th=[  756], 50.00th=[  796], 60.00th=[  844],
             | 70.00th=[ 1320], 80.00th=[ 3440], 90.00th=[ 3568], 95.00th=[ 3632],
             | 99.00th=[ 3824], 99.50th=[ 5024], 99.90th=[ 5216], 99.95th=[ 5280],
             | 99.99th=[ 5280]
            bw (KB  /s): min= 1224, max= 4366, per=97.64%, avg=2557.14, stdev=1375.64
            lat (usec) : 750=37.50%, 1000=30.12%
            lat (msec) : 2=5.00%, 4=26.76%, 10=0.62%
          cpu          : usr=0.92%, sys=8.70%, ctx=2562, majf=0, minf=25
          IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
             submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
             complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
             issued    : total=r=0/w=2560/d=0, short=r=0/w=0/d=0
        
        Run status group 0 (all jobs):
          WRITE: io=10240KB, aggrb=2619KB/s, minb=2619KB/s, maxb=2619KB/s, mint=3909msec, maxt=3909msec
        

      Attachments

        1. fio.direct.xls
          13 kB
        2. JinshanPatchesTesting.xlsx
          141 kB
        3. LU-4198.png
          LU-4198.png
          129 kB
        4. vvp_io.c.dio_i_size.patch
          2 kB

        Issue Links

          Activity

            [LU-4198] Improve IO performance when using DIRECT IO using libaio

            Patch is still in flight. (Hope this is OK.)

            paf Patrick Farrell (Inactive) added a comment - Patch is still in flight. (Hope this is OK.)
            rread Robert Read added a comment -

            rread Robert Read added a comment -

            Let's reopen this ticket after we have a more convincible solution for this issue.

            jay Jinshan Xiong (Inactive) added a comment - Let's reopen this ticket after we have a more convincible solution for this issue.
            adilger Andreas Dilger added a comment - Patches in Gerrit for this issue: http://review.whamcloud.com/8201 http://review.whamcloud.com/8612

            This ticket isn't directly related to CLIO Simplification work. The ticket relationships on Jira have been updated to reflect this.

            rhenwood Richard Henwood (Inactive) added a comment - This ticket isn't directly related to CLIO Simplification work. The ticket relationships on Jira have been updated to reflect this.

            Jinshan, please update this ticket description to include the reason that this ticket is a dependency for LU-3259.

            rhenwood Richard Henwood (Inactive) added a comment - Jinshan, please update this ticket description to include the reason that this ticket is a dependency for LU-3259 .

            Jinshan - an OST failed on me (each OST is one SATA-II or III disk) and have no other suitable disks. Have ordered a pair of WD 10K RPM Velociraptors (200 MB/s) that will support queue depth up to 32 (NCQ). On hold till then.

            brett Brett Lee (Inactive) added a comment - Jinshan - an OST failed on me (each OST is one SATA-II or III disk) and have no other suitable disks. Have ordered a pair of WD 10K RPM Velociraptors (200 MB/s) that will support queue depth up to 32 (NCQ). On hold till then.

            Better? I thought those results were pretty good already. Will give it a try.

            brett Brett Lee (Inactive) added a comment - Better? I thought those results were pretty good already. Will give it a try.

            Will you please increase iodepth to at least 32 and see if we can get any better results?

            jay Jinshan Xiong (Inactive) added a comment - Will you please increase iodepth to at least 32 and see if we can get any better results?

            Data in the attached spreadsheet seems to make a good case for including the performance improvements. Also, I’ve not seen any further stability issues since the beginning of the test period.

            brett Brett Lee (Inactive) added a comment - Data in the attached spreadsheet seems to make a good case for including the performance improvements. Also, I’ve not seen any further stability issues since the beginning of the test period.

            Update:
            Continuing to run benchmarks against this build.

            No further hung system issues. Oddly, the hung system was on the initial IO, and never seen since.

            The "corrupted" event is reproducible, though I would no longer call it corrupted. Rather, it has to do with stalled fio kernel threads. After killing off the fio user processes, two kernel threads remained. After rebooting to end those threads, the 61% was cleared.

            Note that all fio writes using block size 64M are not completing (though they are on the 2.5 release, as well as the root ext4 file system.

            All other reads/writes (sequential and random) are completing successfully and without incident. Performance data comparisons upcoming.

            brett Brett Lee (Inactive) added a comment - Update: Continuing to run benchmarks against this build. No further hung system issues. Oddly, the hung system was on the initial IO, and never seen since. The "corrupted" event is reproducible, though I would no longer call it corrupted. Rather, it has to do with stalled fio kernel threads. After killing off the fio user processes, two kernel threads remained. After rebooting to end those threads, the 61% was cleared. Note that all fio writes using block size 64M are not completing (though they are on the 2.5 release, as well as the root ext4 file system. All other reads/writes (sequential and random) are completing successfully and without incident. Performance data comparisons upcoming.

            People

              bobijam Zhenyu Xu
              brett Brett Lee (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              25 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: