[LUDOC-40] Create Documentation for multi-threaded ptlrpcd Created: 19/Jan/12  Updated: 05/Apr/12  Due: 30/Mar/12  Resolved: 05/Apr/12

Status: Closed
Project: Lustre Documentation
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major
Reporter: Bryon Neitzel (Inactive) Assignee: Cliff White (Inactive)
Resolution: Fixed Votes: 0
Labels: releases

Attachments: Microsoft Word Multi-threaded_ptlrpcd.docx    
Rank (Obsolete): 7172

 Description   

Please write the documentation necessary for multithreaded ptlrpcd to be understood and used in Lustre 2.2. Please include any tunables that affect this feature. Only the raw content needs to be written - any grammar checking, spell checking, formatting, etc. will be done by a doc writer. The content can be appended to this ticket.



 Comments   
Comment by Andreas Dilger [ 19/Jan/12 ]

This should indicate that multiple OI files is a new feature in Lustre 2.2 and that filesystems formatted with Lustre 2.2 cannot be downgraded to an earlier version of Lustre (this is always true, but it doesn't hurt to state it again).

Comment by nasf (Inactive) [ 01/Feb/12 ]

Draft for multi-threaded ptlrpcd, it should be part of Lustre manual.

Comment by nasf (Inactive) [ 05/Feb/12 ]

Document is done.

Comment by Peter Jones [ 05/Feb/12 ]

Thanks for getting this content created FanYong but we should keep the ticket open until it has actually landed in the manual. We'll get this reassigned now.

Comment by Peter Jones [ 14/Feb/12 ]

Hi Cliff

Please can you integrate this material into the manual

Thanks

Peter

Comment by Cliff White (Inactive) [ 15/Feb/12 ]

Asked these in email, putting in bug for record, or in case anybody else wants to answer them here.
--------

For the max_ptlrpcds parameter:

  • The absolute minimum is 2 per node, regardless of number of cores?
  • The default is one thread per core, including hyperthreading?
  • Is there any limit or maximum in the code?
  • You mention large directory traversal and statahead as operations that are async RPC-intensive,
    are there any other situations a user may need to be aware of?

Somewhat outside question:

  • Is there any tuning of RPC behavior in this area, in other words for a specific type
    of RPC or action, can a user force async or sync behavior?
Comment by Cliff White (Inactive) [ 21/Feb/12 ]

I would appreciate a response at your earliest convience.
--------
New question - what is the parameter name for the ptlrpcd_load_policy? This
is not in the document.
is it "ptlrpcd_load_policy=XX" ??
---------

--------

For the max_ptlrpcds parameter:

  • The absolute minimum is 2 per node, regardless of number of cores?
  • The default is one thread per core, including hyperthreading?
  • Is there any limit or maximum in the code?
  • You mention large directory traversal and statahead as operations that
    are async RPC-intensive,
    are there any other situations a user may need to be aware of?

Somewhat outside question:

  • Is there any tuning of RPC behavior in this area, in other words for a
    specific type
    of RPC or action, can a user force async or sync behavior?


cliffw
Support Guy
WhamCloud, Inc.
www.whamcloud.com

Comment by Peter Jones [ 21/Feb/12 ]

Added Fanyong as a watcher so he sees Cliff's question

Comment by nasf (Inactive) [ 21/Feb/12 ]

Q: The absolute minimum is 2 per node, regardless of number of cores?
A: Yes

Q: The default is one thread per core, including hyperthreading?
A: Yes, the default mode is one thread per hyper-threading.

Q: Is there any limit or maximum in the code?
A: Currently, there is no maximun limit, but I think we should set the maximum as the core count (hyper-threading) on the node.

Q: You mention large directory traversal and statahead as operations that are async RPC-intensive,
are there any other situations a user may need to be aware of?
A: statahead is part of large directory traversal, and async glimpse lock (agl) is also part of large directory traversal. Both of them are usually can be triggered by "ls -l", "du", "find", and similar system commands.
Another often used async RPC case is I/O, in Lustre, most of I/O are async mode.

Q: Is there any tuning of RPC behavior in this area, in other words for a specific type
of RPC or action, can a user force async or sync behavior?
A: There are some existing proc interfaces maybe affect the efficient for async RPC processing, like "max_rpcs_in_flight". But as for whether a RPC is sync or async, depends on Lustre internal implementation, the developer can specify that inside Lustre code, but there is no tunable interface for users to specify whether the RPC is async or sync outside Lustre code.

Q: what is the parameter name for the ptlrpcd_load_policy? is it "ptlrpcd_load_policy=XX" ??
A: ptlrpcd_load_policy is not the name of some parameter. It is the name for a set of parameters used inside Lustre code to specify how to push the async RPC into some ptlrpcd queue. That means only Lustre developer can use such parameters, but invisible outside Lustre code.

Comment by Cliff White (Inactive) [ 22/Feb/12 ]

Would the recommended maximum then be one thread per core (including hyper-threading)?
Is there a point where performance will decrease if the threads per core is >1 ? >5?? etc??

I am a little confused by last response, in section 2.1 you list PDB_POLICY options, and they are set by
"insmod ptlrpcd.ko ptlrpcd_bind_policy=xxx" - which would imply the system admin tunes these.

In section 2.2 you list the PDL_POLICY options, from your response above these are internal-only, never
touched by any other than Lustre developers? Just to confirm, they cannot be tuned by a system admin?

If true, then i think section 2.2 might not go in the general manual, but rather in some developer-focused or Lustre Internals documentation.

Comment by nasf (Inactive) [ 22/Feb/12 ]

The recommended mode is the default mode: one thread per core (including hyper-threading). It is not verified where is the point for the best performance.

The administrator can tune ptlrpcd_bind_policy when insmod ptlrpcd.ko. But ptlrpcd_load_policy is used inside Lustre code only, not tunable for administrator. So the former one should be part of Lustre user manual, the later one is for developer and should be part of Lustre internals documentation.

Comment by Cliff White (Inactive) [ 27/Mar/12 ]

I am still unclear on the real differences between PDB_POLICY_FULL and PDB_POLICY_NEIGHBOR - could you give and example of when either one would be useful.

Comment by Cliff White (Inactive) [ 30/Mar/12 ]

Content now up for review http://review.whamcloud.com/2425

Comment by Richard Henwood (Inactive) [ 05/Apr/12 ]

Merged.

Generated at Sat Feb 10 03:39:46 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.