[LU-13905] A single stream performance regression in client 4.18.0-193.14.2.el8_2 kernel Created: 12/Aug/20 Updated: 12/Aug/20 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.14.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Shuichi Ihara | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
RHEL8.2 (kernel kernel 4.18.0-193.14.2.el8_2) |
||
| Issue Links: |
|
||||
| Severity: | 2 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
There is a single stream read performance regression with 4.18.0-193.14.2.el8_2 kernel. Here is test environment and a reproducer. 1 x client (1 x Gold 5218 CPU @ 2.30GHz, 96GB RAM, 1 x IB-HDR100) CentOS8.2 (Tested kernel version: 4.18.0-147.el8.x86_64 and 4.18.0-193.14.2.el8_2.x86_64) OFED-5.0-2.1.8.0
[root@ec01 ~]# lctl set_param osc.*.max_pages_per_rpc=16M osc.*.max_rpcs_in_flight=16 llite.*.max_read_ahead_mb=2048 llite.*.max_read_ahead_per_file_mb=N [root@ec01 ~]# clush -w es400nvx1-vm[1-4],7990e3-vm[1-2],ec01 "echo 3 > /proc/sys/vm/drop_caches" [root@ec01 ~]# /work/tools/bin/ior -r -t 1m -b 192g -e -o /es400nv/s/file -k At least, the behaviors with max_read_ahead_per_file_mb=64 (default) are different between two kernel versions 4.18.0-147.el8.x86_64 and 4.18.0-193.14.2.el8_2.x86_64.
It was 30% slower performance with max_read_ahead_per_file_mb=64, but when it increased to 128, both performance were close. There is another results which was tested on HDD based OSTs.
In this case, there was still ~16% performrance regressions in 4.18.0-193.14.2.el8_2.x86_64 regardless max_read_ahead_per_file_mb=64 or 128. |
| Comments |
| Comment by Wang Shilong (Inactive) [ 12/Aug/20 ] |
|
The problem is somehow 4.18.0-147.el8.x86_64 schedule kworker more often than 4.18.0-147.el8.x86_64, we might need investigate what changes has been applied for kernel work queue between this minor version updates. |
| Comment by Wang Shilong (Inactive) [ 12/Aug/20 ] |
|
not aware of specific workqueue changes, but could this be related to some cpupower frequency changes? cpupower frequency-info to check if we could both reach performance mode? |