Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9574

Large file read performance degradation from multiple OST's

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.11.0, Lustre 2.10.2
    • Lustre 2.9.0
    • None
    • RHEL 7 servers, RHEL 6 and 7 clients.
    • 3
    • 9223372036854775807

    Description

      We recently noticed that the large file read performance on our 2.9 LFS is dramatically worse than it used to be. The attached plot is the result of a test script that uses dd to write a large file (50GB) to disk, read that file and then copy it to a 2nd file to test write, read and read/write speeds for large files for various stripe sizes and counts. The two sets of data on this plot are on the same server and client hardware. The LFS was originally built and formatted with 2.8.0 but we eventually upgraded to 2.9.0 on the servers and clients. The behavior we are used to seeing is increasing performance as you increase the stripe count with a peak in performance around 4 or 6 OST's and a degradation after that as more OST's are used. This is what we saw under 2.8 (red lines in the plots). With 2.9 we still get very good write performance (almost line rate on our 10 GbE clients). But for reads we see extremely good performance with a single OST and significantly degraded performance for multiple OST's – black lines in the plots.  Using a git bisect to compile and test different clients, we were able to isolate it to this commit:

      commit d8467ab8a2ca15fbbd5be3429c9cf9ceb0fa78b8
      LU-7990 clio: revise readahead to support 16MB IO

      There is slightly more info here:

      http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2017-May/014509.html

      Please let me know if you need any other data or info.   

      Attachments

        Activity

          People

            jay Jinshan Xiong (Inactive)
            dvicker Darby Vicker
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: