[LU-9841] 2.10 don't use 4-16MB rpc at all Created: 07/Aug/17  Updated: 14/Sep/17  Resolved: 28/Aug/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.10.1, Lustre 2.11.0

Type: Bug Priority: Critical
Reporter: Alexey Lyashkov Assignee: Jinshan Xiong (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Lustre 2.10 on the server, single OST, brw_size=16
Lustre 2.10 clients, max_rpcs_in_flight=64, max_dirty_mb=256, max_pages_per_rpc=4096
40 IOR jobs, File-per-process, transfer size 64m, file size 256g, stonewall 120 sec

============================
lctl set_param llite.*.pio=0
============================

DIO write then read
-------------------

                           read      |     write
pages per bulk r/w     rpcs  % cum % |  rpcs        % cum %
256:                343872 100 100   | 220928 100 100

                           read      |     write
discontiguous pages    rpcs  % cum % |  rpcs        % cum %
0:                  343872 100 100   | 220928 100 100

--------------
POSIX buffered
--------------

                           read      |     write
pages per bulk r/w     rpcs  % cum % |  rpcs        % cum %
256:                     0   0   0   |   44   0   0
512:                     0   0   0   |   43   0   0
1K:                      0   0   0   |   63   0   0
2K:                      0   0   0   |  107   0   0
4K:                  40960 100 100   | 68017  99 100

                           read      |     writ​e​
discontiguous pages    rpcs  % cum % |  rpcs    ​​    % cum %
0:                   40960 100 100   | 65412  95  95
1:                       0   0 100   | 2862   4 100

============================
lctl set_param llite.*.pio=1
============================

DIO write then read
-------------------

                           read      |     write
pages per bulk r/w     rpcs  % cum % |  rpcs        % cum %
256:                352337  79  79   | 259706  80  80
512:                 69745  15  94   | 49782  15  95
1K:                  20035   4  99   | 12087   3  99
2K:                   2357   0 100   | 1283   0 100

                           read      |     write
discontiguous pages    rpcs  % cum % |  rpcs        % cum %
0:                  387211  87  87   | 286802  88  88
1:                   50221  11  98   | 32161   9  98
2:                    6541   1  99   | 3670   1  99
3:                     485   0  99   |  222   0  99
4:                      16   0 100   |    3   0 100

--------------
POSIX buffered
--------------

                           read      |     write
pages per bulk r/w     rpcs  % cum % |  rpcs        % cum %
1:                  109334  19  19   |    0   0   0
2:                    9844   1  20   |    1   0   0
4:                    2482   0  21   |    0   0   0
8:                     375   0  21   |    0   0   0
16:                      0   0  21   |    1   0   0
32:                      0   0  21   |    1   0   0
64:                      0   0  21   |    2   0   0
128:                     3   0  21   |   10   0   0
256:                398187  70  91   |  153   0   0
512:                 34233   6  97   |  244   0   0
1K:                   8819   1  99   |  377   0   1
2K:                   2322   0  99   | 1004   1   2
4K:                   2136   0 100   | 65289  97 100

                           read      |     write
discontiguous pages    rpcs  % cum % |  rpcs        % cum %
0:                  516948  91  91   | 56943  84  84
1:                   42268   7  98   | 2582   3  88
2:                    6384   1  99   | 2100   3  91
3:                    1523   0  99   | 1288   1  93
4:                     365   0  99   |  988   1  95
5:                      97   0  99   |  774   1  96

​Same test setup on the server, IEEL clients

--------------------------
​DIO write​ followed by rea​​d
------------------------​

read | write
pages per bulk r/w rpcs % cum % | rpcs % cum %
4K: 40960 100 100 | 46224 100 100

read | write
discontiguous pages rpcs % cum % | rpcs % cum %
0: 40960 100 100 | 46224 100 100

--------------

POSIX buffered

--------------

read | write
pages per bulk r/w rpcs % cum % | rpcs % cum %
4K: 40960 100 100 | 66976 100 100

read | write
discontiguous pages rpcs % cum % | rpcs % cum %
0: 40960 100 100 | 66976 100 100



 Comments   
Comment by Peter Jones [ 07/Aug/17 ]

Jinshan

Could you please advuse on this one?

Thanks

Peter

Comment by Jinshan Xiong (Inactive) [ 09/Aug/17 ]

It seems like this bug is introduced by LU-8964, where even I/O to single striped is split by stripe size that is 1M by default. Patch will be pushed soon.

Comment by Gerrit Updater [ 09/Aug/17 ]

Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: https://review.whamcloud.com/28451
Subject: LU-9841 lov: do not split IO for single striped file
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f35de0b73cd42ead2b3eb609ffeb167c7b8504b5

Comment by Gerrit Updater [ 28/Aug/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28451/
Subject: LU-9841 lov: do not split IO for single striped file
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 078a099d26ef7f5d26131c0e18615855a39f341d

Comment by Peter Jones [ 28/Aug/17 ]

Landed for 2.11

Comment by Gerrit Updater [ 28/Aug/17 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/28760
Subject: LU-9841 lov: do not split IO for single striped file
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 8a7a89b7032d35246a234642fb8f7c3f14f8ad24

Comment by Gerrit Updater [ 14/Sep/17 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28760/
Subject: LU-9841 lov: do not split IO for single striped file
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: e2e67f2c31c5539c852f60124a2ae4b397bf8004

Generated at Sat Feb 10 02:29:45 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.