Data-on-MDT phase II (LU-10176)

[LU-10777] DoM performance is bad with FIO write Created: 06/Mar/18  Updated: 08/Jun/19  Resolved: 30/Apr/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.13.0, Lustre 2.12.3

Type: Technical task Priority: Major
Reporter: Mikhail Pershin Assignee: Mikhail Pershin
Resolution: Fixed Votes: 0
Labels: DoM2, ORNL

Rank (Obsolete): 9223372036854775807

 Description   

FIO test does a lot more RPCs to the MDT with DOM files comparing with OSTs files:

DOM files:
WRITE: BW 5900KiB/sec, IOPS 737, lat (7/83624/83631)usec
----- MDC RPCs: 118696
----- OSC RPCs: 252

OST files:
WRITE: BW 9738KiB/sec, IOPS 1217, lat (5/47728/47734)usec
----- MDC RPCs: 49668
----- OSC RPCs: 12302

so there are about 12K IO-related RPCs to the OST and about 70K extra RPCs to the MDT in the same test. That looks like OST writes are cached much better with FIO write pattern than writes to MDT.



 Comments   
Comment by Mikhail Pershin [ 27/Mar/19 ]

there are smaller page-per-RPC value with DOM files, the write RPC is composed earlier, with less pages than with OST. The problem is related to grants most likely.

Comment by Mikhail Pershin [ 22/Apr/19 ]

This issue doesn't look like a bug but related to less grants space on MDT comparing with several OSTs. With increasing MDT size the page-per-RPC value is growing and performance as well.

Meanwhile there is other problem with FIO READ which is related to read-on-open feature and possible resends if reply buffer is bigger than client had. In that case client re-allocated buffer and resends it with larger buffer. For some reason such technique performs not well and even worse than just separate OPEN and READ RPCs. I am going to disable that option for now

Comment by Jinshan Xiong [ 24/Apr/19 ]

Is this happening on the case of intent-open?

Can we just do a quick fix to fallback to normal open if client doesn't allocate buffer with enough size?

Comment by Mikhail Pershin [ 24/Apr/19 ]

that is so already, read-on-open failure doesn't fail open itself, just doesn't fill reply buffer with data and finish open intent as usual.

Comment by Mikhail Pershin [ 24/Apr/19 ]

patch in gerrit: https://review.whamcloud.com/#/c/34700/

Comment by Jinshan Xiong [ 24/Apr/19 ]

If that's the case, why would it resend the RPC if client provides a small reply buffer, instead of just doing a regular open? or I have missed something here.

Comment by Jinshan Xiong [ 24/Apr/19 ]

ah I see, you patch does what I said. Nevermind

Comment by Mikhail Pershin [ 24/Apr/19 ]

That is being handled on ptlrpc level, check reply_in_callback() and after_reply() for rq_reply_truncate usage. If reply has bigger buffer than allocated on client then client re-allocate it and do resend with adjusted buffer with no errors are returned to upper level, e.g. MDC. That is used in some cases intentionally, e.g. when big attributes are returned from server to the client and it has not updated 'max_md_size'. Usually these are rare cases because finally client will update own maximum.
The idea behind this is that technically immediate resend is faster than new RPC or at least the same. I was trying to use that possibility for read-on-open to fill as much data as maximum reply buffer can be, thinking that resend is at least the same as new RPC, but after test results I tend to think that OPEN resend/reconstruct with new read-on-open is just more expensive than ordinary READ RPC so that makes no sense to do so.

Comment by Mikhail Pershin [ 24/Apr/19 ]

oh, I wrote so much words, let them be, maybe someone will find that useful

Comment by Gerrit Updater [ 30/Apr/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34700/
Subject: LU-10777 dom: disable read-on-open with resend
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e0adb618a4b0d0182419a5731fe046e9157b9f51

Comment by Peter Jones [ 30/Apr/19 ]

Landed for 2.13

Comment by Gerrit Updater [ 21/May/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34912
Subject: LU-10777 dom: disable read-on-open with resend
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 7f304a9c1328518be315a96a2837e3452e6221d3

Comment by Gerrit Updater [ 08/Jun/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34912/
Subject: LU-10777 dom: disable read-on-open with resend
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: bfe06da80e8cb75667d0a58a8e81043a09868112

Generated at Sat Feb 10 02:38:06 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.