[LU-4720] Help on performance scalability Created: 06/Mar/14  Updated: 12/Mar/14

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Chakravarthy Nagarajan (Inactive) Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None
Environment:

MDS Servers(2nos) - 12cores, 64GB Memory, QDR Single port
OSS Servers (2nos) - 16cores, 64GB Memory, FDR multi rail(2ports from each OSS)
Clients(8nos) - 12cores, 64GB memory, QDR single port
Lustre Version - 2.5
CentOS version - 6.4
No. of OSTs - 16(Configured in RAID-6(8+2)) and load balanced between OSS servers with 8 OSTs on each


Rank (Obsolete): 12974

 Description   

Hi,

With the above environment, need suggestion on performance at the clients, since i'm struck. Appreciate your help.

I'm getting the block device write performance of 9.6GB/s with 16 LUNs which is measured through XDD. I ran obdfilter survey at the OSS machines and getting around 8.4GB/s as write performance. I've measured the LNET performance and getting 9.6GB/s between OSS machines and 8 clients. which But when I run IOR in the clients, i'm getting around 2.6GB/s for write performance with one client. While I run it across two nodes, getting 4.4GB/s of write throughput. But while I scale beyond 2nodes, getting the same performance of 4GB/s only. Could you please help to find the root cause for this performance availability issue.



 Comments   
Comment by Keith Mannthey (Inactive) [ 06/Mar/14 ]

How many numbers for threads are you using per client?

Comment by Chakravarthy Nagarajan (Inactive) [ 06/Mar/14 ]

I've set the no. of threads in OSS as 256. Means allocated 32 threads per client. Please advise.

Comment by Keith Mannthey (Inactive) [ 06/Mar/14 ]

Can you try 6 threads per client?

Comment by Chakravarthy Nagarajan (Inactive) [ 06/Mar/14 ]

It has reduced the performance with 2 clients by 50% and the scalability issue still remains while I run with 4 clients. I've initially set the no.of threads according to the total no. of spindles. But still no luck. Do you think Metadata may an issue, since i've 2 MDts configured in RAID-1 only instead of RAID-1+0.

Comment by Keith Mannthey (Inactive) [ 06/Mar/14 ]

What is your single thread performance when ran on a single client? If you have 6 threads and 1GB/s that seems a little odd. Are you running IOR with one file per process or with a single file?

Are you using DNE to have 2 metadata targets? IOR is not metadata intensive so it should not be a serious factor.

Comment by Chakravarthy Nagarajan (Inactive) [ 06/Mar/14 ]

On a single client i'm getting 2.2GB/s. Yes i'm using IOR with N-N only and also using DNE with 2 MDTs.

Comment by Andreas Dilger [ 12/Mar/14 ]

The MDT doesn't have anything to do with IO performance. The metadata on the MDS has nothing to do with block allocation.

Comment by Chakravarthy Nagarajan (Inactive) [ 12/Mar/14 ]

Thanks and even i've realized the same by monitoring the MDS utilization. I've tried the following, but no luck. Please advise.
1. Disable checksum at the clients
2. Increase RPCs in flight to 32 at the cleints
3. Disable LRu re-sizing at the clients
4. Set readahead_max_file_size to 1M at the oss machines
5. Tested with multiple thread counts upto 512 at oss machines.

The issues is obdfilter ran at oss machines resulting 8.4GB/s, but clients are unable to get the same.

Generated at Sat Feb 10 01:45:15 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.