Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5995

Apparent scale issue with 2.5.2 clients to 2.5.3 servers

    XMLWordPrintable

Details

    • Question/Request
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.4.2, Lustre 2.5.3
    • None

    Description

      I am running into a significant performance issue when running IOR primarily to the file system mentioned in the environment but is being observed on other global lustre file systems as well.

      On the smaller of the two systems with a single router I am getting near wire speed with running IOR on 32 nodes with 8 threads. On the larger system I am getting only approximately 10% of what I expect to see going through the routers and where I'm expecting to see ~10GB/sec to the file system I am only seeing roughly 6% of that performance.

      I have run both netperf and lnet_selftest from the routers to the servers and in the case of netperf am seeing basically wire speed. lnet_selftest show approximately the same result when using a concurrency of 8 but with a concurrency of 1 I am seeing about half.

      Loads on the servers and routers are insignificant. I have increased the credits on both the servers and gateways with no observable impact (although changes to the routers has not been implement only on 2 gateways do to it being a production environment). Credits changes have not been made on the client side (again because this is a production system).

      I am unsure what to try next nor do I know whether there is a known compatibility issue between client and server versions.

      Any help would be greatly appreciated.

      Attachments

        Activity

          People

            green Oleg Drokin
            jamervi Joe Mervini
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: