[LU-11189] OSC flow control Created: 28/Jul/18  Updated: 26/Jan/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Jinshan Xiong Assignee: Zhenyu Xu
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-12864 sanity-benchmark test_iozone crashes ... Open
is related to LU-11092 NMI watchdog: BUG: soft lockup - CPU#... Open
Rank (Obsolete): 9223372036854775807

 Description   

We're running into a problem that somehow OST devices are really slow to serve write requests. However, clients don't the situation so they would just keep sending requests, eventually this causes RPC timeout.

This work is trying to relieve the problem by introducing a flow control on the OSC where it holds sending more requests as soon as it receives early reply.



 Comments   
Comment by Gerrit Updater [ 28/Jul/18 ]

Jinshan Xiong (jinshan.xiong@gmail.com) uploaded a new patch: https://review.whamcloud.com/32895
Subject: LU-11189 osc: flow control for OSC
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: da7e7ec610d1023906c4354605c1542071195735

Comment by Andreas Dilger [ 28/Jul/18 ]

On a related note, if there is heavy random read traffic on a large input file that is causing high IOPS and slowdown, you could consider to use "lfs ladvise willread /path/to/file" to prefetch the into RAM on the OST(s) it is located on. This requires that the input file is striped across enough OSTs to fit into RAM.

Comment by Andreas Dilger [ 23/Nov/20 ]

Bobijam, could you please take a look at this patch and refresh it.

I think we are still seeing similar problems under heavy load.

Generated at Sat Feb 10 02:41:43 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.