Details
-
Question/Request
-
Resolution: Unresolved
-
Major
-
None
-
Lustre 2.4.2, Lustre 2.5.3
-
None
-
Compute Clusters - Sun X6275 blades, QDR IB torus
Cluster 1: Toss-2.2-6, Lustre 2.4.2-17 2854 nodes. 12 IB/10gigE Routers Cluster 2: Toss-2.1.1.4, Lustre 2.4.0-21 90 nodes, 1 IB/10gigE Routers
Storage Clustre - Dell 720 servers,NetApp 5524/60 dual homed IB/10gigE Toss2.2.1.1, Lustre 2.5.3-2 - mixed mode ldiskfs MDT/zfs OSTsCompute Clusters - Sun X6275 blades, QDR IB torus Cluster 1: Toss-2.2-6, Lustre 2.4.2-17 2854 nodes. 12 IB/10gigE Routers Cluster 2: Toss-2.1.1.4, Lustre 2.4.0-21 90 nodes, 1 IB/10gigE Routers Storage Clustre - Dell 720 servers,NetApp 5524/60 dual homed IB/10gigE Toss2.2.1.1, Lustre 2.5.3-2 - mixed mode ldiskfs MDT/zfs OSTs
-
16720
Description
I am running into a significant performance issue when running IOR primarily to the file system mentioned in the environment but is being observed on other global lustre file systems as well.
On the smaller of the two systems with a single router I am getting near wire speed with running IOR on 32 nodes with 8 threads. On the larger system I am getting only approximately 10% of what I expect to see going through the routers and where I'm expecting to see ~10GB/sec to the file system I am only seeing roughly 6% of that performance.
I have run both netperf and lnet_selftest from the routers to the servers and in the case of netperf am seeing basically wire speed. lnet_selftest show approximately the same result when using a concurrency of 8 but with a concurrency of 1 I am seeing about half.
Loads on the servers and routers are insignificant. I have increased the credits on both the servers and gateways with no observable impact (although changes to the routers has not been implement only on 2 gateways do to it being a production environment). Credits changes have not been made on the client side (again because this is a production system).
I am unsure what to try next nor do I know whether there is a known compatibility issue between client and server versions.
Any help would be greatly appreciated.