[LU-1019] general ptlrpcd pool support to process kinds of async RPC efficiently on SMP client Created: 20/Jan/12  Updated: 02/May/12  Resolved: 20/Jan/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.2.0
Fix Version/s: Lustre 2.2.0

Type: New Feature Priority: Minor
Reporter: Andreas Dilger Assignee: nasf (Inactive)
Resolution: Duplicate Votes: 0
Labels: None

Attachments: PDF File Multi-threaded_ptlrpcd.pdf    
Issue Links:
Duplicate
Rank (Obsolete): 4705

 Description   

We want to implement general ptlrpcd pool, the ptlrpcd threads in such pool are shared by all the async RPCs on the client, like BRW request, data checksum, async glimpse lock, statahead, and so on.

Current idea is one ptlrpcd thread per CPU-core (or partition), and bind ptlrpcd thread on the CPU-core.



 Comments   
Comment by Andreas Dilger [ 20/Jan/12 ]

From ORNL-22:

We want multiple CPU cores to share the async RPC load. So we
start many ptlrpcd threads. We also want to reduce the ptlrpcd
overhead caused by data transfer cross-CPU cores. So we bind
ptlrpcd thread to specified CPU core. But binding all ptlrpcd
threads maybe cause response delay because of some CPU core(s)
busy with other loads.

For example: "ls -l", some async RPCs for statahead are assigned
to ptlrpcd_0, and ptlrpcd_0 is bound to CPU_0, but CPU_0 may be
quite busy with other non-ptlrpcd, like "ls -l" itself (we want
to the "ls -l" thread, statahead thread, and ptlrpcd thread can
run in parallel), under such case, the statahead async RPCs can
not be processed in time, it is unexpected. If ptlrpcd_0 can be
re-scheduled on other CPU core, it may be better. But it breaks
former data transfer policy.

So we shouldn't be blind for avoiding the data transfer. We make
some compromise: divide the ptlrpcd threds pool into two parts.
One part is for bound mode, each ptlrpcd thread in this part is
bound to some CPU core. The other part is for free mode, all the
ptlrpcd threads in the part can be scheduled on any CPU core.
We specify some partnership between bound mode ptlrpcd thread(s)
and free mode ptlrpcd thread(s), and the async RPC load within
the partners are shared.

It can partly avoid data transfer corss-CPU (if the bound mode
ptlrpcd thread can be scheduled in time), and try to guarantee
the async RPC processed ASAP (as long as the free mode ptlrpcd
thread can be scheduled on any CPU core).

As for how to specify the partnership between bound mode ptlrpcd
thread(s) and free mode ptlrpcd thread(s), the simplest way is
<free bound> pair. In future, we can specify some more complex
partnership based on the patches for CPU partition. But before
such patches are available, we prefer to use the simplest one.

Comment by Andreas Dilger [ 20/Jan/12 ]

Test Results for the patch (http://review.whamcloud.com/#change,1184) from Cliff White:

Before:

2 clients, 8 threads/client

Operation Max (MiB) Min (MiB) Mean (MiB) Std Dev Max (OPs) Min (OPs) Mean (OPs) Std Dev Mean (s) Op grep #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize

-------- ------- ------- -------- ----- ------- ------- -------- ----- -------

write 1064.69 1045.74 1054.41 7.82 1064.69 1045.74 1054.41 7.82 3.88486 16 8 3 1 1 1 0 0 1 268435456 1048576 4294967296 -1 POSIX EXCEL

read 1114.51 1050.80 1088.79 27.42 1114.51 1050.80 1088.79 27.42 3.76440 16 8 3 1 1 1 0 0 1 268435456 1048576 4294967296 -1 POSIX EXCEL

2 clients, 1 thread/client

0: write 433.76 425.31 428.35 3.84 433.76 425.31 428.35 3.84 1.19537 2 1 3 1 1 1 0 0 1 268435456 1048576 536870912 -1 POSIX EXCEL

0: read 507.27 501.03 503.50 2.71 507.27 501.03 503.50 2.71 1.01691 2 1 3 1 1 1 0 0 1 268435456 1048576 536870912 -1 POSIX EXCEL

After:

2 clients, 8 threads/client

00: write 1110.32 1087.60 1096.13 10.10 1110.32 1087.60 1096.13 10.10 3.73709 16 8 3 1 1 1 0 0 1 268435456 1048576 4294967296 -1 POSIX EXCEL

00: read 1170.60 1115.15 1137.76 23.77 1170.60 1115.15 1137.76 23.77 3.60162 16 8 3 1 1 1 0 0 1 268435456 1048576 4294967296 -1 POSIX EXCEL

2 clients, 1 thread/client

write 591.19 571.50 579.51 8.45 591.19 571.50 579.51 8.45 0.88370 2 1 3 1 1 1 0 0 1 268435456 1048576 536870912 -1 POSIX EXCEL

0: read 594.26 584.95 588.27 4.24 594.26 584.95 588.27 4.24 0.87039 2 1 3 1 1 1 0 0 1 268435456 1048576 536870912 -1 POSIX EXCEL

Test summary for 8 threads

0: Summary:

00: api = POSIX

00: test filename = /p/l_wham/white215/hyperion.18942/ior/iorData

00: access = file-per-process

00: pattern = segmented (1 segment)

00: ordering in a file = sequential offsets

00: ordering inter file=constant task offsets = 1

00: clients = 16 (8 per node)

00: repetitions = 3

00: xfersize = 1 MiB

00: blocksize = 256 MiB

00: aggregate filesize = 4 GiB

one note: before test was done on idle cluster, after test on idle cluster immediately after client reboot.

Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » x86_64,server,el5,ofa #495
LU-1019 ptlrpc: fix ptlrpcd transfer message (Revision 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0)

Result = SUCCESS
Oleg Drokin : 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0
Files :

  • lustre/ptlrpc/ptlrpcd.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #495
LU-1019 ptlrpc: fix ptlrpcd transfer message (Revision 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0)

Result = SUCCESS
Oleg Drokin : 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0
Files :

  • lustre/ptlrpc/ptlrpcd.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #495
LU-1019 ptlrpc: fix ptlrpcd transfer message (Revision 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0)

Result = SUCCESS
Oleg Drokin : 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0
Files :

  • lustre/ptlrpc/ptlrpcd.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » x86_64,client,el5,ofa #495
LU-1019 ptlrpc: fix ptlrpcd transfer message (Revision 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0)

Result = SUCCESS
Oleg Drokin : 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0
Files :

  • lustre/ptlrpc/ptlrpcd.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #495
LU-1019 ptlrpc: fix ptlrpcd transfer message (Revision 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0)

Result = SUCCESS
Oleg Drokin : 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0
Files :

  • lustre/ptlrpc/ptlrpcd.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » i686,client,el5,ofa #495
LU-1019 ptlrpc: fix ptlrpcd transfer message (Revision 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0)

Result = SUCCESS
Oleg Drokin : 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0
Files :

  • lustre/ptlrpc/ptlrpcd.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » i686,server,el5,ofa #495
LU-1019 ptlrpc: fix ptlrpcd transfer message (Revision 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0)

Result = SUCCESS
Oleg Drokin : 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0
Files :

  • lustre/ptlrpc/ptlrpcd.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #495
LU-1019 ptlrpc: fix ptlrpcd transfer message (Revision 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0)

Result = SUCCESS
Oleg Drokin : 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0
Files :

  • lustre/ptlrpc/ptlrpcd.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » x86_64,client,el6,ofa #495
LU-1019 ptlrpc: fix ptlrpcd transfer message (Revision 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0)

Result = SUCCESS
Oleg Drokin : 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0
Files :

  • lustre/ptlrpc/ptlrpcd.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » x86_64,server,el6,ofa #495
LU-1019 ptlrpc: fix ptlrpcd transfer message (Revision 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0)

Result = SUCCESS
Oleg Drokin : 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0
Files :

  • lustre/ptlrpc/ptlrpcd.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » i686,client,el5,inkernel #495
LU-1019 ptlrpc: fix ptlrpcd transfer message (Revision 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0)

Result = SUCCESS
Oleg Drokin : 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0
Files :

  • lustre/ptlrpc/ptlrpcd.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » i686,server,el5,inkernel #495
LU-1019 ptlrpc: fix ptlrpcd transfer message (Revision 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0)

Result = SUCCESS
Oleg Drokin : 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0
Files :

  • lustre/ptlrpc/ptlrpcd.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #495
LU-1019 ptlrpc: fix ptlrpcd transfer message (Revision 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0)

Result = SUCCESS
Oleg Drokin : 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0
Files :

  • lustre/ptlrpc/ptlrpcd.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #495
LU-1019 ptlrpc: fix ptlrpcd transfer message (Revision 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0)

Result = SUCCESS
Oleg Drokin : 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0
Files :

  • lustre/ptlrpc/ptlrpcd.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » i686,client,el6,inkernel #495
LU-1019 ptlrpc: fix ptlrpcd transfer message (Revision 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0)

Result = SUCCESS
Oleg Drokin : 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0
Files :

  • lustre/ptlrpc/ptlrpcd.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » i686,server,el6,inkernel #495
LU-1019 ptlrpc: fix ptlrpcd transfer message (Revision 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0)

Result = SUCCESS
Oleg Drokin : 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0
Files :

  • lustre/ptlrpc/ptlrpcd.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » i686,client,el6,ofa #495
LU-1019 ptlrpc: fix ptlrpcd transfer message (Revision 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0)

Result = SUCCESS
Oleg Drokin : 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0
Files :

  • lustre/ptlrpc/ptlrpcd.c
Comment by Build Master (Inactive) [ 01/Mar/12 ]

Integrated in lustre-master » i686,server,el6,ofa #495
LU-1019 ptlrpc: fix ptlrpcd transfer message (Revision 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0)

Result = SUCCESS
Oleg Drokin : 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0
Files :

  • lustre/ptlrpc/ptlrpcd.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » x86_64,client,el5,inkernel #340
LU-1019 ptlrpc: fix ptlrpcd transfer message (Revision 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0)

Result = SUCCESS
Oleg Drokin : 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0
Files :

  • lustre/ptlrpc/ptlrpcd.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » i686,client,el6,inkernel #340
LU-1019 ptlrpc: fix ptlrpcd transfer message (Revision 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0)

Result = SUCCESS
Oleg Drokin : 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0
Files :

  • lustre/ptlrpc/ptlrpcd.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » i686,server,el5,inkernel #340
LU-1019 ptlrpc: fix ptlrpcd transfer message (Revision 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0)

Result = SUCCESS
Oleg Drokin : 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0
Files :

  • lustre/ptlrpc/ptlrpcd.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » x86_64,server,el6,inkernel #340
LU-1019 ptlrpc: fix ptlrpcd transfer message (Revision 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0)

Result = SUCCESS
Oleg Drokin : 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0
Files :

  • lustre/ptlrpc/ptlrpcd.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » i686,client,el5,inkernel #340
LU-1019 ptlrpc: fix ptlrpcd transfer message (Revision 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0)

Result = SUCCESS
Oleg Drokin : 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0
Files :

  • lustre/ptlrpc/ptlrpcd.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » x86_64,server,el5,inkernel #340
LU-1019 ptlrpc: fix ptlrpcd transfer message (Revision 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0)

Result = SUCCESS
Oleg Drokin : 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0
Files :

  • lustre/ptlrpc/ptlrpcd.c
Comment by Build Master (Inactive) [ 02/May/12 ]

Integrated in lustre-dev » x86_64,client,el6,inkernel #340
LU-1019 ptlrpc: fix ptlrpcd transfer message (Revision 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0)

Result = SUCCESS
Oleg Drokin : 3e39a4218f3f69e67f5c8ef85a92f55a9e351fd0
Files :

  • lustre/ptlrpc/ptlrpcd.c
Generated at Sat Feb 10 01:12:43 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.