Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1056

Single-client, single-thread and single-file is limited at 1.5GB/s

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 1.8.6, Lustre 1.8.x (1.8.0 - 1.8.5)
    • 8742

    Description

      A few savvy Lustre veterans from various organizations including NRLreported that they all observed the 1.5GB/s cap on single client with the single thread on the single file.

      "... curious about what might be limiting a Lustre client's (on QDR IB) single-file to single process performance to only be 1.4 to 1.5 GB/s even when a full QDR IB fabric is in play!"

      Since Lustre Architecture does not have such a limit, it would be worthwhile to investigate the root cause of the 1.5GB/s cap. The understanding and enhancement of the high rate IO single threaded sequential IO would enable Lustre to win competition with QFS, CXFS, StorNext.

      Initial Analysis:

      The potential limitation could be:

      • The single thread IO does not push/pull enough throughput to/from Lustre
      • Lustre client does not handle the single thread IO efficiently enough

      We can use a simple experiment to approve whether the single thread IO can push/pull enough throughput to/from Lustre. Since the IO on Lustre would all firstly go through the VFS layer, which is as same as other file system, the single thread IO limit without involving Lustre could be roughly represented via writing/reading from RAM FS.

      [root@client-31 ~]# mkdir /mnt/ramdisk
      [root@client-31 ~]# mount -t ramfs none -o rw,size=10240M,mode=755 /mnt/ramdisk

      Read
      [root@client-31 ~]# dd of=/dev/zero if=/mnt/ramdisk/bigfile bs=1M count=10240
      10240+0 records in
      10240+0 records out
      10737418240 bytes (11 GB) copied, 2.47548 s, 4.3 GB/s

      Write
      [root@client-31 ~]# dd if=/dev/zero of=/mnt/ramdisk/bigfile bs=1M count=10240
      10240+0 records in
      10240+0 records out
      10737418240 bytes (11 GB) copied, 5.45022 s, 2.0 GB/s

      This experiment shows that the single thread IO could write and read data beyond 1.5GB/s.

      Had some discussions with Nasf, for asynchronous IOs, the ack would be sent back to the client process before the RPCs reaches OSTs. So the limitation is more likely to hide between the code of copying striped data to OSC caches.

      Please note that:

      1. The 1.5GB/s limits applies to both Read and Write?

      "Even if data is in the OSS memory (but not the client) I only see more consistent throughput but not higher. So it seems like a client limit from the implementation (somewhere in the code path). If data is in the client's cache then we can see 3-5 GB/s but that's just reading pages from memory and ll_readpage is never called. Because ll_file_read->ll_file_aio_read->generic_file_aio_read->do_generic_file_read never calls the readpage function for the given address_space if the call to find_get_page found it in cache (the radix tree)."

      2. All of our higher IO rates make use of the read ahead or write behind so it would all be asynchronous.

      Please also refer to http://groups.google.com/group/lustre-discuss-list/msg/30ed1fde6ab6e62d

      Attachments

        Issue Links

          Activity

            People

              jay Jinshan Xiong (Inactive)
              zhiqi Zhiqi Tao (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: