[LU-7756] oss_num_threads max value is sometimes too low to feed disk controllers Created: 08/Feb/16  Updated: 09/Feb/17  Resolved: 14/Mar/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.3
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Minor
Reporter: Gregoire Pichon Assignee: Bob Glossman (Inactive)
Resolution: Fixed Votes: 0
Labels: patch, performance

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

When submitting read IOs from lustre clients to oss which uses 6 OSTs, we see in iostat that the number of IO requests in progress on each lun does not go beyond 86.

The result is that throughput per lun is limited to ~600 MB/s.

To be able to get the most out of the pun we need at least 100 requests at once.

This is due to the max number of oss_num_threads being limited by OSS_NTHRS_MAX (512 / 6 = ~86).

Raising the limit to 1024 by patching the code allowed to get enough I/Os at once on each lun and get up to 900MB/s per OST.



 Comments   
Comment by Gerrit Updater [ 08/Feb/16 ]

Grégoire Pichon (gregoire.pichon@bull.net) uploaded a new patch: http://review.whamcloud.com/18350
Subject: LU-7756 oss: allow larger number of OSS service threads
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 208ba9b329834bd09ed9f33cdbaf0220c15fe887

Comment by Andreas Dilger [ 08/Feb/16 ]

Have you tried increasing the RPC size to 4MB so that fewer IOS are needed to keep the backend busy?

Comment by Joseph Gmitter (Inactive) [ 08/Feb/16 ]

Hi Bob,
can you have a look at the patch?
Thanks.
Joe

Comment by Bob Glossman (Inactive) [ 08/Feb/16 ]

I see that the patch does what it says, adds a module param in place of a hard coded limit. However I can't speak to if this is a good change or not. I believe in the past having too many service threads actually reduced performance in some cases.

Comment by Andreas Dilger [ 08/Feb/16 ]

I would tend to agree, and I wouldn't want to allow the number of threads to increase by default, but since this is a module parameter that explicitly needs to be set by the admin I think it is fairly safe.

Comment by Gerrit Updater [ 14/Mar/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18350/
Subject: LU-7756 oss: allow larger number of OSS service threads
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: aa84d188641fa95b0e66ada438c2ba79f164c0d0

Comment by Andreas Dilger [ 14/Mar/16 ]

Gregoire, your patch has landed for 2.9.0. Does it need to be backported to an EE release, or are you applying this locally?

Comment by Gregoire Pichon [ 14/Mar/16 ]

Yes, we would need to have the patch backported to b2_7_fe and IEEL 3.0 if possible.
thanks.

Comment by Minh Diep [ 24/Oct/16 ]

b2_7_fe port: http://review.whamcloud.com/#/c/22391/
b2_8_fe port: http://review.whamcloud.com/#/c/22882/

Generated at Sat Feb 10 02:11:39 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.