[LU-14424] write performance regression in Lustre-2.14.0-RC1 Created: 11/Feb/21 Updated: 22/Feb/21 Resolved: 17/Feb/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.14.0 |
| Fix Version/s: | Lustre 2.14.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Shuichi Ihara | Assignee: | Shuichi Ihara |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 2 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
While I was runnng the performance regression tests, a performance regression in Lustre-2.14.0-RC1 was found on the write with 4K transfer size. # mpirun -np $NP -mca btl_openib_if_include mlx5_1:1 -x UCX_NET_DEVICES=mlx5_1:1 --allow-run-as-root /work/tools/bin/ior -w -r -t 4k -b $((256/NP)) -e -v -F -o /ai400x/ior.out/file
|
| Comments |
| Comment by Shuichi Ihara [ 11/Feb/21 ] |
|
I will narrow down and find what patch causes problem. |
| Comment by Shuichi Ihara [ 12/Feb/21 ] |
|
Here is patch we started to see this perf regression in master branch. commit 80e21cce3dd6748fd760786cafe9c26d502fd74f
Author: NeilBrown <neilb@suse.com>
Date: Thu Dec 13 11:32:56 2018 +1100
LU-9679 osc: simplify osc_extent_find()
osc_extent_find() contains some code with the same functionality as
osc_extent_merge(). So replace that code with a call to
osc_extent_merge().
This requires that we set cur->oe_grants earlier, as
osc_extent_merge() needs that.
Also:
- fix a pre-existing bug - osc_extent_merge() should never try to
merge two extends with different ->oe_mppr as later alignment
checks can get confused.
- Remove a redundant list_del_init() which is already included in
__osc_extent_remove().
Linux-Commit: 85ebb57ddc5b ("lustre: osc: simplify osc_extent_find()")
Here is test results before/after commit 80e21cce3d 80e21cce3d LU-9679 osc: simplify osc_extent_find() Max Write: 540.67 MiB/sec (566.93 MB/sec) 9d914f9cc7 LU-13711 build: fix typo on SSL dependency for Ubuntu Max Write: 797.24 MiB/sec (835.96 MB/sec) |
| Comment by Gerrit Updater [ 12/Feb/21 ] |
|
Oleg Drokin (green@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41498 |
| Comment by Andreas Dilger [ 12/Feb/21 ] |
|
Some added notes here from my investigation:
In terms of debugging the source of the slowdown there are a couple things to check:
|
| Comment by Gerrit Updater [ 13/Feb/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/41498/ |
| Comment by James A Simmons [ 13/Feb/21 ] |
|
While a revert works around this issue as Andreas pointed out this is hot code so more improvements could be done here. |
| Comment by Peter Jones [ 17/Feb/21 ] |
|
Landed for 2.14 RC3 |
| Comment by Neil Brown [ 19/Feb/21 ] |
|
I believe the problem was that cur->oe_state wasn't being initialised properly. I've uploaded a revised version of the reverted patch at https://review.whamcloud.com/41691
|
| Comment by Shuichi Ihara [ 22/Feb/21 ] |
|
Lustre-2.14.0-RC2 [root@ec01 ~]# lctl get_param llite.*.stats osc.*OST0001*.stats
llite.ai400x-ffff9f61e8aac000.stats=
snapshot_time 1613969894.227596637 secs.nsecs
read_bytes 67108864 samples [bytes] 4096 4096 274877906944 1125899906842624
write_bytes 67108864 samples [bytes] 4096 4096 274877906944 1125899906842624
read 67108864 samples [usec] 0 19225 6728336 507600924
write 67108864 samples [usec] 3 36876 447558029 554690588441
open 2 samples [usec] 2 152 154 23108
close 2 samples [usec] 157 179 336 56690
seek 134217728 samples [usec] 0 178 95834 1085920
fsync 1 samples [usec] 6210 6210 6210 38564100
getattr 3 samples [usec] 18 385 568 175774
unlink 2 samples [usec] 301 310 611 186701
mknod 1 samples [usec] 309 309 309 95481
statfs 1 samples [usec] 111 111 111 12321
inode_permission 26 samples [usec] 0 257 278 66138
osc.ai400x-OST0001-osc-ffff9f61e8aac000.stats=
snapshot_time 1613969894.227650896 secs.nsecs
req_waittime 73654 samples [usec] 127 50969 207491668 745114450474
req_active 73654 samples [reqs] 1 17 89410 125650
ldlm_extent_enqueue 1 samples [reqs] 1 1 1 1
read_bytes 16385 samples [bytes] 4096 16777216 274877906944 4611685881021988864
write_bytes 57262 samples [bytes] 4096 16777216 274877906944 1330662844261203968
ost_read 16385 samples [usec] 218 50969 83226452 426080555072
ost_write 57262 samples [usec] 176 35894 124263923 319033596483
ost_connect 2 samples [usec] 130 211 341 61421
ost_disconnect 2 samples [usec] 136 137 273 37265
ost_sync 1 samples [usec] 402 402 402 161604
ldlm_cancel 1 samples [usec] 150 150 150 22500
[root@es400nvx1-vm1 ~]# lctl get_param osd-ldiskfs.*OST0001*.brw_stats
osd-ldiskfs.ai400x-OST0001.brw_stats=
snapshot_time: 1613969983.077008582 (secs.nsecs)
read | write
pages per bulk r/w rpcs % cum % | rpcs % cum %
1: 1 0 0 | 41 0 0
2: 0 0 0 | 51 0 0
4: 0 0 0 | 24 0 0
8: 0 0 0 | 8 0 0
16: 0 0 0 | 7 0 0
32: 0 0 0 | 4 0 0
64: 0 0 0 | 8 0 0
128: 0 0 0 | 5 0 0
256: 0 0 0 | 2 0 0
512: 0 0 0 | 6 0 0
1K: 0 0 0 | 7 0 0
2K: 0 0 0 | 57025 99 99
4K: 16384 99 100 | 75 0 100
read | write
discontiguous pages rpcs % cum % | rpcs % cum %
0: 16385 100 100 | 57255 99 99
1: 0 0 100 | 8 0 100
read | write
discontiguous blocks rpcs % cum % | rpcs % cum %
0: 16371 99 99 | 57225 99 99
1: 14 0 100 | 38 0 100
read | write
disk fragmented I/Os ios % cum % | ios % cum %
1: 1 0 0 | 150 0 0
2: 0 0 0 | 6 0 0
3: 0 0 0 | 6 0 0
4: 0 0 0 | 1 0 0
5: 0 0 0 | 57006 99 99
6: 0 0 0 | 16 0 99
7: 0 0 0 | 0 0 99
8: 0 0 0 | 2 0 99
9: 0 0 0 | 3 0 99
10: 0 0 0 | 16 0 99
11: 0 0 0 | 2 0 99
12: 0 0 0 | 2 0 99
13: 0 0 0 | 3 0 99
14: 0 0 0 | 3 0 99
15: 0 0 0 | 2 0 99
16: 16384 99 100 | 44 0 99
17: 0 0 100 | 1 0 100
read | write
disk I/Os in flight ios % cum % | ios % cum %
1: 16336 6 6 | 57148 19 19
2: 16336 6 12 | 57173 19 39
3: 16336 6 18 | 57131 19 59
4: 16336 6 24 | 57103 19 79
5: 16336 6 31 | 57096 19 99
6: 16339 6 37 | 91 0 99
7: 16338 6 43 | 75 0 99
8: 16346 6 49 | 75 0 99
9: 16355 6 56 | 73 0 99
10: 16361 6 62 | 72 0 99
11: 16368 6 68 | 56 0 99
12: 16364 6 74 | 56 0 99
13: 16373 6 81 | 54 0 99
14: 16370 6 87 | 51 0 99
15: 16371 6 93 | 48 0 99
16: 16367 6 99 | 46 0 99
17: 54 0 99 | 6 0 99
18: 54 0 99 | 5 0 99
19: 50 0 99 | 5 0 99
20: 43 0 99 | 4 0 99
21: 39 0 99 | 4 0 99
22: 42 0 99 | 3 0 99
23: 38 0 99 | 3 0 99
24: 30 0 99 | 3 0 99
25: 19 0 99 | 2 0 99
26: 16 0 99 | 2 0 99
27: 16 0 99 | 1 0 99
28: 16 0 99 | 1 0 99
29: 16 0 99 | 1 0 99
30: 16 0 99 | 1 0 99
31: 64 0 100 | 2 0 100
read | write
I/O time (1/1000s) ios % cum % | ios % cum %
1: 96 0 0 | 57179 99 99
2: 16255 99 99 | 63 0 99
4: 25 0 99 | 11 0 99
8: 9 0 100 | 6 0 99
16: 0 0 100 | 4 0 100
read | write
disk I/O size ios % cum % | ios % cum %
4K: 1 0 0 | 41 0 0
8K: 0 0 0 | 51 0 0
16K: 0 0 0 | 26 0 0
32K: 0 0 0 | 10 0 0
64K: 0 0 0 | 10 0 0
128K: 0 0 0 | 7 0 0
256K: 0 0 0 | 30 0 0
512K: 0 0 0 | 34 0 0
1M: 262144 99 100 | 286182 99 100
Lustre-2.14.0-RC3 (after reverted patch) [root@ec01 llite]# lctl get_param llite.*.stats osc.*OST0001*.stats
llite.ai400x-ffff9f67d7c2b800.stats=
snapshot_time 1613971351.373233917 secs.nsecs
read_bytes 67108864 samples [bytes] 4096 4096 274877906944 1125899906842624
write_bytes 67108864 samples [bytes] 4096 4096 274877906944 1125899906842624
read 67108864 samples [usec] 0 17888 7516054 452029440
write 67108864 samples [usec] 3 22195 291816283 14370801533
ioctl 1 samples [reqs]
open 4 samples [usec] 2 182 440 66138
close 4 samples [usec] 78 260 650 122374
seek 134217728 samples [usec] 0 53 130227 1512001
fsync 1 samples [usec] 17794 17794 17794 316626436
readdir 8 samples [usec] 0 293 307 85925
getattr 30 samples [usec] 0 1565 2334 2653018
unlink 1 samples [usec] 366 366 366 133956
mknod 1 samples [usec] 363 363 363 131769
statfs 1 samples [usec] 106 106 106 11236
inode_permission 42 samples [usec] 0 28 72 938
osc.ai400x-OST0001-osc-ffff9f67d7c2b800.stats=
snapshot_time 1613971351.373308591 secs.nsecs
req_waittime 33094 samples [usec] 98 61857 168002250 975823530146
req_active 33094 samples [reqs] 1 18 46781 85103
ldlm_extent_enqueue 1 samples [reqs] 1 1 1 1
read_bytes 16385 samples [bytes] 4096 16777216 274877906944 4611685881021988864
write_bytes 16704 samples [bytes] 4096 16777216 274877906944 4607087317889318912
ost_read 16385 samples [usec] 157 61857 82488488 433515710132
ost_write 16704 samples [usec] 188 60092 85512821 542307585031
ost_connect 1 samples [usec] 115 115 115 13225
ost_disconnect 1 samples [usec] 148 148 148 21904
ost_sync 1 samples [usec] 395 395 395 156025
ldlm_cancel 1 samples [usec] 185 185 185 34225
[root@es400nvx1-vm1 ~]# lctl get_param osd-ldiskfs.*OST0001*.brw_stats
osd-ldiskfs.ai400x-OST0001.brw_stats=
snapshot_time: 1613971438.900406702 (secs.nsecs)
read | write
pages per bulk r/w rpcs % cum % | rpcs % cum %
1: 1 0 0 | 76 0 0
2: 0 0 0 | 80 0 0
4: 0 0 0 | 22 0 1
8: 0 0 0 | 4 0 1
16: 0 0 0 | 22 0 1
32: 0 0 0 | 32 0 1
64: 0 0 0 | 8 0 1
128: 0 0 0 | 6 0 1
256: 0 0 0 | 21 0 1
512: 0 0 0 | 33 0 1
1K: 0 0 0 | 6 0 1
2K: 0 0 0 | 8 0 1
4K: 16384 99 100 | 16390 98 100
read | write
discontiguous pages rpcs % cum % | rpcs % cum %
0: 16385 100 100 | 16680 99 99
1: 0 0 100 | 28 0 100
read | write
discontiguous blocks rpcs % cum % | rpcs % cum %
0: 16333 99 99 | 16637 99 99
1: 52 0 100 | 57 0 99
2: 0 0 100 | 14 0 100
read | write
disk fragmented I/Os ios % cum % | ios % cum %
1: 1 0 0 | 271 1 1
2: 0 0 0 | 33 0 1
3: 0 0 0 | 2 0 1
4: 0 0 0 | 4 0 1
5: 0 0 0 | 1 0 1
6: 0 0 0 | 2 0 1
7: 0 0 0 | 1 0 1
8: 0 0 0 | 3 0 1
9: 0 0 0 | 5 0 1
10: 0 0 0 | 8 0 1
11: 0 0 0 | 9 0 2
12: 0 0 0 | 4 0 2
13: 0 0 0 | 7 0 2
14: 0 0 0 | 9 0 2
15: 0 0 0 | 13 0 2
16: 16384 99 100 | 16336 97 100
read | write
disk I/Os in flight ios % cum % | ios % cum %
1: 16353 6 6 | 16506 6 6
2: 16352 6 12 | 16478 6 12
3: 16357 6 18 | 16439 6 18
4: 16357 6 24 | 16401 6 25
5: 16359 6 31 | 16383 6 31
6: 16357 6 37 | 16382 6 37
7: 16361 6 43 | 16381 6 43
8: 16364 6 49 | 16381 6 50
9: 16363 6 56 | 16384 6 56
10: 16369 6 62 | 16382 6 62
11: 16371 6 68 | 16377 6 68
12: 16368 6 74 | 16369 6 75
13: 16370 6 81 | 16370 6 81
14: 16368 6 87 | 16367 6 87
15: 16368 6 93 | 16357 6 93
16: 16371 6 99 | 16349 6 99
17: 28 0 99 | 21 0 99
18: 32 0 99 | 20 0 99
19: 31 0 99 | 19 0 99
20: 30 0 99 | 15 0 99
21: 32 0 99 | 15 0 99
22: 27 0 99 | 14 0 99
23: 26 0 99 | 13 0 99
24: 22 0 99 | 10 0 99
25: 20 0 99 | 8 0 99
26: 10 0 99 | 6 0 99
27: 10 0 99 | 6 0 99
28: 9 0 99 | 6 0 99
29: 12 0 99 | 4 0 99
30: 8 0 99 | 2 0 99
31: 40 0 100 | 2 0 100
read | write
I/O time (1/1000s) ios % cum % | ios % cum %
1: 32 0 0 | 323 1 1
2: 16325 99 99 | 16278 97 99
4: 21 0 99 | 95 0 99
8: 7 0 100 | 10 0 99
16: 0 0 100 | 2 0 100
read | write
disk I/O size ios % cum % | ios % cum %
4K: 1 0 0 | 77 0 0
8K: 0 0 0 | 80 0 0
16K: 0 0 0 | 22 0 0
32K: 0 0 0 | 4 0 0
64K: 0 0 0 | 28 0 0
128K: 0 0 0 | 43 0 0
256K: 0 0 0 | 29 0 0
512K: 0 0 0 | 49 0 0
1M: 262144 99 100 | 262135 99 100
The average write latency on llite layer in lustre-2.14.0-RC2 was significant higher than after reverted patch (lustre-2.14.0-RC3). Let me test Neil's new patch and udpate results on |