Data-on-MDT phase II
(LU-10176)
|
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.13.0, Lustre 2.12.3 |
| Type: | Technical task | Priority: | Major |
| Reporter: | Mikhail Pershin | Assignee: | Mikhail Pershin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | DoM2, ORNL | ||
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
FIO test does a lot more RPCs to the MDT with DOM files comparing with OSTs files: DOM files: WRITE: BW 5900KiB/sec, IOPS 737, lat (7/83624/83631)usec ----- MDC RPCs: 118696 ----- OSC RPCs: 252 OST files: WRITE: BW 9738KiB/sec, IOPS 1217, lat (5/47728/47734)usec ----- MDC RPCs: 49668 ----- OSC RPCs: 12302 so there are about 12K IO-related RPCs to the OST and about 70K extra RPCs to the MDT in the same test. That looks like OST writes are cached much better with FIO write pattern than writes to MDT. |
| Comments |
| Comment by Mikhail Pershin [ 27/Mar/19 ] |
|
there are smaller page-per-RPC value with DOM files, the write RPC is composed earlier, with less pages than with OST. The problem is related to grants most likely. |
| Comment by Mikhail Pershin [ 22/Apr/19 ] |
|
This issue doesn't look like a bug but related to less grants space on MDT comparing with several OSTs. With increasing MDT size the page-per-RPC value is growing and performance as well. Meanwhile there is other problem with FIO READ which is related to read-on-open feature and possible resends if reply buffer is bigger than client had. In that case client re-allocated buffer and resends it with larger buffer. For some reason such technique performs not well and even worse than just separate OPEN and READ RPCs. I am going to disable that option for now |
| Comment by Jinshan Xiong [ 24/Apr/19 ] |
|
Is this happening on the case of intent-open? Can we just do a quick fix to fallback to normal open if client doesn't allocate buffer with enough size? |
| Comment by Mikhail Pershin [ 24/Apr/19 ] |
|
that is so already, read-on-open failure doesn't fail open itself, just doesn't fill reply buffer with data and finish open intent as usual. |
| Comment by Mikhail Pershin [ 24/Apr/19 ] |
|
patch in gerrit: https://review.whamcloud.com/#/c/34700/ |
| Comment by Jinshan Xiong [ 24/Apr/19 ] |
|
If that's the case, why would it resend the RPC if client provides a small reply buffer, instead of just doing a regular open? or I have missed something here. |
| Comment by Jinshan Xiong [ 24/Apr/19 ] |
|
ah I see, you patch does what I said. Nevermind |
| Comment by Mikhail Pershin [ 24/Apr/19 ] |
|
That is being handled on ptlrpc level, check reply_in_callback() and after_reply() for rq_reply_truncate usage. If reply has bigger buffer than allocated on client then client re-allocate it and do resend with adjusted buffer with no errors are returned to upper level, e.g. MDC. That is used in some cases intentionally, e.g. when big attributes are returned from server to the client and it has not updated 'max_md_size'. Usually these are rare cases because finally client will update own maximum. |
| Comment by Mikhail Pershin [ 24/Apr/19 ] |
|
oh, I wrote so much words, let them be, maybe someone will find that useful |
| Comment by Gerrit Updater [ 30/Apr/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34700/ |
| Comment by Peter Jones [ 30/Apr/19 ] |
|
Landed for 2.13 |
| Comment by Gerrit Updater [ 21/May/19 ] |
|
Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34912 |
| Comment by Gerrit Updater [ 08/Jun/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34912/ |