[LU-10191] FLR2: Server Local Client (SLC) - Whamcloud Community JIRA

Details

Type: New Feature
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: None
Labels:
- FLR2
- medium

Rank (Obsolete):
9223372036854775807

Description

When mounting a client locally on the OSS or MDS it would be desirable to have a local IO path for the bulk writes from the OSC to obdfilter rather than sending the data via ptlrpc->lnet->ptlrpc since this would speed up IO performance and reduce local IO CPU usage significantly. It makes sense to implement this initially only for bulk IO (if that is easier), since that would typically have the highest memory copy overhead, and leave the locking/metadata to use the normal RPC paths, so that they are treated consistently with other clients (avoiding potential hard-to-find bugs).

Any modifying RPCs to the local OST should be synchronous by default, or possibly use commit-on-share, so that they do not need to be replayed if the server restarts. This implies that it is more desirable to schedule lfs mirror resync in such a way that it is reading from the local OSS and writing to a remote OSS. It might be desirable to allow this functionality to be disabled for testing purposes (e.g. local client mount in test scripts), or if local performance is more important than waiting for recovery to time out.

It should be possible to enable this mode automatically at mount time based on the client NID, rather than having e.g. a mount option force a "local mount", since it would only apply to targets that are on the same OSS/MDS and not remote targets.

A further optimization would avoid read caching data in the llite layer to avoid double cache of the same data on the node, since the OSS would also cache the same data, and the OSS cache has the advantage that it could also be shared with other clients, though it would have a higher overhead to access than the VFS page cache (depending on the IO size).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

[#LDEV-605] Server Local Clients(SLC) implementation.pdf
70 kB
12/Dec/17 6:06 PM

Issue Links

is related to

LU-12649 Tracker for ongoing FLR improvements

Open

LU-11022 FLR1.5: "lfs mirror" usability for Burst Buffer

Resolved

is related to

LU-9771 FLR1: Landing tickets for File Level Redundancy Phase 1

Resolved

LU-10916 Improve lfs mirror resync performance

Resolved

LU-12722 exclude local client mounted on MDS/OSS from recovery

Resolved

LU-17779 switch lnet loopback to use copy_page()

Resolved

(1 is related to )

Activity

[LU-10191] FLR2: Server Local Client (SLC)

Andreas Dilger made changes - 22/Sep/24 12:00 AM

Description

Original: When mounting a client locally on the OSS or MDS it would be desirable to have a local IO path for the bulk writes from the OSC to obdfilter rather than sending the data via ptlrpc->lnet->ptlrpc.

Any modifying RPCs to the *local* OST should be synchronous by default, or possibly use commit-on-share, so that they do not need to be replayed if the server restarts. This implies that it is more desirable to schedule {{lfs mirror resync}} in such a way that it is reading from the local OSS and writing to a remote OSS. It might be desirable to allow this functionality to be disabled for testing purposes (e.g. local client mount in test scripts), or if local performance is more important than waiting for recovery to time out.

It should be possible to enable this mode automatically at mount time based on the client NID, rather than having e.g. a mount option force a "local mount", since it would only apply to targets that are on the same OSS/MDS and not remote targets.

A further optimization would avoid read caching data in the llite layer to avoid double cache of the same data, since the OSS would also cache the same data, and the OSS cache has the advantage that it could be shared with other clients.

As a final stage, having a local {{llite<->obdfilter}} IO path that avoids data copies and LNet would potentially speed up IO performance and reduce local IO CPU usage significantly. It might be possible to implement this initially only for bulk IO, since that would typically have the highest memory copy overhead, and leave the locking/metadata to use the normal RPC paths, so that they are treated consistently (possibly avoiding hard-to-find bugs).

New: When mounting a client locally on the OSS or MDS it would be desirable to have a local IO path for the bulk writes from the OSC to obdfilter rather than sending the data via ptlrpc->lnet->ptlrpc since this would speed up IO performance and reduce local IO CPU usage significantly. It makes sense to implement this initially only for bulk IO (if that is easier), since that would typically have the highest memory copy overhead, and leave the locking/metadata to use the normal RPC paths, so that they are treated consistently with other clients (avoiding potential hard-to-find bugs).

Any modifying RPCs to the *local* OST should be synchronous by default, or possibly use commit-on-share, so that they do not need to be replayed if the server restarts. This implies that it is more desirable to schedule {{lfs mirror resync}} in such a way that it is reading from the local OSS and writing to a remote OSS. It might be desirable to allow this functionality to be disabled for testing purposes (e.g. local client mount in test scripts), or if local performance is more important than waiting for recovery to time out.

It should be possible to enable this mode automatically at mount time based on the client NID, rather than having e.g. a mount option force a "local mount", since it would only apply to targets that are on the same OSS/MDS and not remote targets.

A further optimization would avoid read caching data in the llite layer to avoid double cache of the same data on the node, since the OSS would also cache the same data, and the OSS cache has the advantage that it could also be shared with other clients, though it would have a higher overhead to access than the VFS page cache (depending on the IO size).

Andreas Dilger made changes - 21/Sep/24 11:55 PM

Link

New: This issue is related to ~~LU-17779~~ [ ~~LU-17779~~ ]

Andreas Dilger made changes - 21/Sep/24 11:55 PM

Description

Original: In order to mount a client locally on the OSS or MDS without affecting the recovery of local targets, we need the ability to mount without inserting the client into the {{last_rcvd}} file. That avoids the problem when a client+server crashes and the local client UUID is no longer available for the recovery, causing recovery to always take the maximum time.

Any modifying RPCs to the *local* OST should be synchronous by default, or possibly use commit-on-share, so that they do not need to be replayed if the server restarts. This implies that it is more desirable to schedule {{lfs mirror resync}} in such a way that it is reading from the local OSS and writing to a remote OSS. It might be desirable to allow this functionality to be disabled for testing purposes (e.g. local client mount in test scripts), or if local performance is more important than waiting for recovery to time out.

It should be possible to enable this mode automatically at mount time based on the client NID, rather than having e.g. a mount option force a "local mount", since it would only apply to targets that are on the same OSS/MDS and not remote targets.

A further optimization would avoid read caching data in the llite layer to avoid double cache of the same data, since the OSS would also cache the same data, and the OSS cache has the advantage that it could be shared with other clients.

As a final stage, having a local {{llite<->obdfilter}} IO path that avoids data copies and LNet would potentially speed up IO performance and reduce local IO CPU usage significantly. It might be possible to implement this initially only for bulk IO, since that would typically have the highest memory copy overhead, and leave the locking/metadata to use the normal RPC paths, so that they are treated consistently (possibly avoiding hard-to-find bugs).

New: When mounting a client locally on the OSS or MDS it would be desirable to have a local IO path for the bulk writes from the OSC to obdfilter rather than sending the data via ptlrpc->lnet->ptlrpc.

Any modifying RPCs to the *local* OST should be synchronous by default, or possibly use commit-on-share, so that they do not need to be replayed if the server restarts. This implies that it is more desirable to schedule {{lfs mirror resync}} in such a way that it is reading from the local OSS and writing to a remote OSS. It might be desirable to allow this functionality to be disabled for testing purposes (e.g. local client mount in test scripts), or if local performance is more important than waiting for recovery to time out.

It should be possible to enable this mode automatically at mount time based on the client NID, rather than having e.g. a mount option force a "local mount", since it would only apply to targets that are on the same OSS/MDS and not remote targets.

A further optimization would avoid read caching data in the llite layer to avoid double cache of the same data, since the OSS would also cache the same data, and the OSS cache has the advantage that it could be shared with other clients.

As a final stage, having a local {{llite<->obdfilter}} IO path that avoids data copies and LNet would potentially speed up IO performance and reduce local IO CPU usage significantly. It might be possible to implement this initially only for bulk IO, since that would typically have the highest memory copy overhead, and leave the locking/metadata to use the normal RPC paths, so that they are treated consistently (possibly avoiding hard-to-find bugs).

Andreas Dilger made changes - 21/Sep/24 11:49 PM

Labels

Original: FLR2

New: FLR2 medium

John Hammond made changes - 20/Oct/20 3:16 PM

Link

Original: This issue is related to EX-391 [ EX-391 ]

Andreas Dilger made changes - 16/Apr/20 8:01 PM

Assignee	Original: Alex Zhuravlev [ bzzz ]	New: WC Triage [ wc-triage ]
Resolution	Original: Fixed [ 1 ]
Status	Original: Closed [ 6 ]	New: Reopened [ 4 ]

Andreas Dilger added a comment - 16/Apr/20 8:01 PM

I don't think this is fixed by ~~LU-12722~~. That just allows local mounting of the client.

This ticket is more about having a direct transfer of data from the local client mount to the local storage (probably OSC->OFD?) rather than doing memcpy() of the bulk data in the 0@lo interface.

Andreas Dilger added a comment - 16/Apr/20 8:01 PM I don't think this is fixed by LU-12722 . That just allows local mounting of the client. This ticket is more about having a direct transfer of data from the local client mount to the local storage (probably OSC->OFD?) rather than doing memcpy() of the bulk data in the 0@lo interface.

Alex Zhuravlev made changes - 16/Apr/20 8:09 AM

Resolution		New: Fixed [ 1 ]
Status	Original: Open [ 1 ]	New: Closed [ 6 ]

Alex Zhuravlev added a comment - 16/Apr/20 8:09 AM

implemented in ~~LU-12722~~

Alex Zhuravlev added a comment - 16/Apr/20 8:09 AM implemented in LU-12722

Patrick Farrell (Inactive) made changes - 09/Sep/19 3:50 PM

Assignee

Original: Patrick Farrell [ pfarrell ]

New: Alex Zhuravlev [ bzzz ]

People

Assignee:: WC Triage

Reporter:: Andreas Dilger

Votes:: 0 Vote for this issue

Watchers:: 18 Start watching this issue

Dates

Created:: 02/Nov/17 1:37 AM

Updated:: 22/Sep/24 12:00 AM