Description
When mounting a client locally on the OSS or MDS it would be desirable to have a local IO path for the bulk writes from the OSC to obdfilter rather than sending the data via ptlrpc->lnet->ptlrpc since this would speed up IO performance and reduce local IO CPU usage significantly. It makes sense to implement this initially only for bulk IO (if that is easier), since that would typically have the highest memory copy overhead, and leave the locking/metadata to use the normal RPC paths, so that they are treated consistently with other clients (avoiding potential hard-to-find bugs).
Any modifying RPCs to the local OST should be synchronous by default, or possibly use commit-on-share, so that they do not need to be replayed if the server restarts. This implies that it is more desirable to schedule lfs mirror resync in such a way that it is reading from the local OSS and writing to a remote OSS. It might be desirable to allow this functionality to be disabled for testing purposes (e.g. local client mount in test scripts), or if local performance is more important than waiting for recovery to time out.
It should be possible to enable this mode automatically at mount time based on the client NID, rather than having e.g. a mount option force a "local mount", since it would only apply to targets that are on the same OSS/MDS and not remote targets.
A further optimization would avoid read caching data in the llite layer to avoid double cache of the same data on the node, since the OSS would also cache the same data, and the OSS cache has the advantage that it could also be shared with other clients, though it would have a higher overhead to access than the VFS page cache (depending on the IO size).
Attachments
Issue Links
- is related to
-
LU-12649 Tracker for ongoing FLR improvements
-
- Open
-
-
LU-11022 FLR1.5: "lfs mirror" usability for Burst Buffer
-
- Resolved
-
- is related to
-
LU-9771 FLR1: Landing tickets for File Level Redundancy Phase 1
-
- Resolved
-
-
LU-10916 Improve lfs mirror resync performance
-
- Resolved
-
-
LU-12722 exclude local client mounted on MDS/OSS from recovery
-
- Resolved
-
-
LU-17779 switch lnet loopback to use copy_page()
-
- Resolved
-
Activity
Description |
Original:
When mounting a client locally on the OSS or MDS it would be desirable to have a local IO path for the bulk writes from the OSC to obdfilter rather than sending the data via ptlrpc->lnet->ptlrpc.
Any modifying RPCs to the *local* OST should be synchronous by default, or possibly use commit-on-share, so that they do not need to be replayed if the server restarts. This implies that it is more desirable to schedule {{lfs mirror resync}} in such a way that it is reading from the local OSS and writing to a remote OSS. It might be desirable to allow this functionality to be disabled for testing purposes (e.g. local client mount in test scripts), or if local performance is more important than waiting for recovery to time out. It should be possible to enable this mode automatically at mount time based on the client NID, rather than having e.g. a mount option force a "local mount", since it would only apply to targets that are on the same OSS/MDS and not remote targets. A further optimization would avoid read caching data in the llite layer to avoid double cache of the same data, since the OSS would also cache the same data, and the OSS cache has the advantage that it could be shared with other clients. As a final stage, having a local {{llite<->obdfilter}} IO path that avoids data copies and LNet would potentially speed up IO performance and reduce local IO CPU usage significantly. It might be possible to implement this initially only for bulk IO, since that would typically have the highest memory copy overhead, and leave the locking/metadata to use the normal RPC paths, so that they are treated consistently (possibly avoiding hard-to-find bugs). |
New:
When mounting a client locally on the OSS or MDS it would be desirable to have a local IO path for the bulk writes from the OSC to obdfilter rather than sending the data via ptlrpc->lnet->ptlrpc since this would speed up IO performance and reduce local IO CPU usage significantly. It makes sense to implement this initially only for bulk IO (if that is easier), since that would typically have the highest memory copy overhead, and leave the locking/metadata to use the normal RPC paths, so that they are treated consistently with other clients (avoiding potential hard-to-find bugs).
Any modifying RPCs to the *local* OST should be synchronous by default, or possibly use commit-on-share, so that they do not need to be replayed if the server restarts. This implies that it is more desirable to schedule {{lfs mirror resync}} in such a way that it is reading from the local OSS and writing to a remote OSS. It might be desirable to allow this functionality to be disabled for testing purposes (e.g. local client mount in test scripts), or if local performance is more important than waiting for recovery to time out. It should be possible to enable this mode automatically at mount time based on the client NID, rather than having e.g. a mount option force a "local mount", since it would only apply to targets that are on the same OSS/MDS and not remote targets. A further optimization would avoid read caching data in the llite layer to avoid double cache of the same data on the node, since the OSS would also cache the same data, and the OSS cache has the advantage that it could also be shared with other clients, though it would have a higher overhead to access than the VFS page cache (depending on the IO size). |
Description |
Original:
In order to mount a client locally on the OSS or MDS without affecting the recovery of local targets, we need the ability to mount without inserting the client into the {{last_rcvd}} file. That avoids the problem when a client+server crashes and the local client UUID is no longer available for the recovery, causing recovery to always take the maximum time. Any modifying RPCs to the *local* OST should be synchronous by default, or possibly use commit-on-share, so that they do not need to be replayed if the server restarts. This implies that it is more desirable to schedule {{lfs mirror resync}} in such a way that it is reading from the local OSS and writing to a remote OSS. It might be desirable to allow this functionality to be disabled for testing purposes (e.g. local client mount in test scripts), or if local performance is more important than waiting for recovery to time out. It should be possible to enable this mode automatically at mount time based on the client NID, rather than having e.g. a mount option force a "local mount", since it would only apply to targets that are on the same OSS/MDS and not remote targets. A further optimization would avoid read caching data in the llite layer to avoid double cache of the same data, since the OSS would also cache the same data, and the OSS cache has the advantage that it could be shared with other clients. As a final stage, having a local {{llite<->obdfilter}} IO path that avoids data copies and LNet would potentially speed up IO performance and reduce local IO CPU usage significantly. It might be possible to implement this initially only for bulk IO, since that would typically have the highest memory copy overhead, and leave the locking/metadata to use the normal RPC paths, so that they are treated consistently (possibly avoiding hard-to-find bugs). |
New:
When mounting a client locally on the OSS or MDS it would be desirable to have a local IO path for the bulk writes from the OSC to obdfilter rather than sending the data via ptlrpc->lnet->ptlrpc.
Any modifying RPCs to the *local* OST should be synchronous by default, or possibly use commit-on-share, so that they do not need to be replayed if the server restarts. This implies that it is more desirable to schedule {{lfs mirror resync}} in such a way that it is reading from the local OSS and writing to a remote OSS. It might be desirable to allow this functionality to be disabled for testing purposes (e.g. local client mount in test scripts), or if local performance is more important than waiting for recovery to time out. It should be possible to enable this mode automatically at mount time based on the client NID, rather than having e.g. a mount option force a "local mount", since it would only apply to targets that are on the same OSS/MDS and not remote targets. A further optimization would avoid read caching data in the llite layer to avoid double cache of the same data, since the OSS would also cache the same data, and the OSS cache has the advantage that it could be shared with other clients. As a final stage, having a local {{llite<->obdfilter}} IO path that avoids data copies and LNet would potentially speed up IO performance and reduce local IO CPU usage significantly. It might be possible to implement this initially only for bulk IO, since that would typically have the highest memory copy overhead, and leave the locking/metadata to use the normal RPC paths, so that they are treated consistently (possibly avoiding hard-to-find bugs). |
Labels | Original: FLR2 | New: FLR2 medium |
Link | Original: This issue is related to EX-391 [ EX-391 ] |
Assignee | Original: Alex Zhuravlev [ bzzz ] | New: WC Triage [ wc-triage ] |
Resolution | Original: Fixed [ 1 ] | |
Status | Original: Closed [ 6 ] | New: Reopened [ 4 ] |
Resolution | New: Fixed [ 1 ] | |
Status | Original: Open [ 1 ] | New: Closed [ 6 ] |
Assignee | Original: Patrick Farrell [ pfarrell ] | New: Alex Zhuravlev [ bzzz ] |
I don't think this is fixed by
LU-12722. That just allows local mounting of the client.This ticket is more about having a direct transfer of data from the local client mount to the local storage (probably OSC->OFD?) rather than doing memcpy() of the bulk data in the 0@lo interface.