[LU-1030] new IO engine Created: 25/Jan/12  Updated: 04/Apr/13  Resolved: 19/Jun/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0
Fix Version/s: Lustre 2.3.0

Type: New Feature Priority: Minor
Reporter: Jinshan Xiong (Inactive) Assignee: Jinshan Xiong (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: Microsoft Word new-perf-results.xlsx     PDF File new_io_engine.pdf     Microsoft Word perf-results.xlsx    
Rank (Obsolete): 4585

 Description   

Target of new IO engine:
1. support new grant parameter
2. generate better IO RPC
3. if a page has been set for writeback, and if the covering lock is being canceled, raise prio of writeback page
4. ...



 Comments   
Comment by Jinshan Xiong (Inactive) [ 08/Mar/12 ]

current status of new io engine.

Comment by Jinshan Xiong (Inactive) [ 25/Apr/12 ]

write performance benchmark result.

Comment by Jinshan Xiong (Inactive) [ 17/May/12 ]

patch list: http://review.whamcloud.com/

{2009,2460,2270}

and this one is proven to reduce memory usage a lot: http://review.whamcloud.com/2514

Comment by Andreas Dilger [ 31/May/12 ]

It appears this may be causing replay-dual.sh test failures that are NOT LU-482 related:

https://maloo.whamcloud.com/test_sets/9a9540c6-a964-11e1-ab65-52540035b04c

Waiting for orphan cleanup...
Waiting 0 secs for  mds-ost sync done.
Waiting 2 secs for  mds-ost sync done.
Waiting for destroy to be done...
before 717896, after 717896

However, it isn't 100% clear that this patch is causing the problem, or only coincidental that Chris was testing for LU-482 repeatedly using this patch. In any case, I don't see any similar problems on master, though ORI-396 at least looks similar.

Comment by Jinshan Xiong (Inactive) [ 31/May/12 ]

Hi Andreas, the patch you're referring to is Chris' testing patch. The original patch is at 2009 and it doesn't change any functionality but move some functions into osc_cache.c; I don't think Chris changed the code instead of adding some test stuff.

Comment by Jinshan Xiong (Inactive) [ 19/Jun/12 ]

all patches have been landed.

Comment by Bob Glossman (Inactive) [ 08/Sep/12 ]

Testing for b2_3, results file attached. The new curve from 2.3 client/servers looks similar but slightly better than the b2_2+rpc curve in the old results, especially at the mid-range of numbers of thread (4-8). Please note that due to lack of resource I had to slightly reduce the number of OSTs in the tests. With no nodes >16Gb trying to put 2 8G ramdisk OSTs on a node led to lots of OOM. When consulted Jinshan said that numbers from only 15 OSTs were probably good enough for comparison.

Comment by Cory Spitz [ 04/Apr/13 ]

Bob, unfortunately, it doesn't look like we have apples to apples here, but the > 15 thread I/O seems a lot worse with b2_3 (around 400 MB/s). Was there an explanation? Granted, we should focus on b2_4 now.

Generated at Sat Feb 10 01:12:50 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.