[LU-12748] parallel readahead needs to be optimized at high number of process Created: 11/Sep/19  Updated: 17/Feb/21  Resolved: 17/Feb/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Minor
Reporter: Shuichi Ihara Assignee: Wang Shilong (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

master


Issue Links:
Related
is related to LU-13138 sanity test 101d fails with 'readahea... Resolved
is related to LU-13258 Bind linux workqueues to specific core Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

parallel readahead is enabled by default in master, it contributes to sequential read performance a lot. 
However, if the number of IO thread is increased (e.g. NP=NCPU), read performance drops and it's lower than without readahead. it needs to be tunning and optimization.
Here is test configuration and resutls.

Client
2 x Platinum 8160 CPU @ 2.10GHz, 192GB memory, 2 x IB-EDR(multi-rail)
CentOS7.6(3.10.0-957.27.2.el7.x86_64)
OFED-4.5
for i in 6 12 24 48; do                                                                                                                                       
        size=$((768/i))                                                                                                                                       
        /work/tools/mpi/gcc/openmpi/2.1.1/bin/mpirun --allow-run-as-root -np $i /work/tools/bin/ior -w -r -t 1m -b ${size}g -e -F -vv -o /scratch0/file  | tee
 ior-1n${i}p-${VER}.log                                                                                                                                       
done

Summaruy of Read Performance(MB/sec)

branch thr=6 thr=12 thr=24 thr=48
b2_12  9,965  14,551  17,177 18,152
master 15,252  16,026  17,842 16,991
master(pRA=off) 10,253  14,489  17,839 18,658

pRA=off  - disabling parallel readahead (llite.*.read_ahead_async_file_threshold_mb=0)



 Comments   
Comment by Wang Shilong (Inactive) [ 11/Sep/19 ]

The problem could be that we try to submit too much async ra workers, even we limit number of active workers for workqueue did not help.

I think to fix the problem we could introduce similar idea like what we did for limit RA memory.
introduce another atomic counter to record flighting active async ra.

And we limit flighting active async to number of active cpu cores etc which will give us a balance
from single thread improvements and reduce contention of multiple threads.

Comment by Andreas Dilger [ 11/Sep/19 ]

It looks like the crossover is at about NCPU/2 where the performance of parallel readahead and in-process readahead is the same. If we stop using async readahead at that point it should give the best of both worlds.

Comment by Gerrit Updater [ 15/Mar/20 ]

Wang Shilong (wshilong@ddn.com) uploaded a new patch: https://review.whamcloud.com/37927
Subject: LU-12748 readahead: limit async ra requests
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 447b93d563e555f3255995234b35c4546960768e

Comment by James A Simmons [ 15/Mar/20 ]

I was just discussing with Wang about this issue with the LU-13258 work. Thanks for figuring out the cross over. I will update my patch this new limit.

Comment by Gerrit Updater [ 24/Mar/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37927/
Subject: LU-12748 readahead: limit async ra requests
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 1427a72002e6b57017f1c66eb95f9bebff9ac37f

Comment by Andreas Dilger [ 05/May/20 ]

Li Xi, note that the ability to change labels on the ticket is one of the reasons that we only mark tickets "Resolved" instead of "Closed". Otherwise, it is necessary to re-open and close the ticket to change it again.

Comment by Li Xi [ 06/May/20 ]

That toally makes sense. Thanks for the explanation, Andreas!

Generated at Sat Feb 10 02:55:18 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.