Description
When mounting a client locally on the OSS or MDS it would be desirable to have a local IO path for the bulk writes from the OSC to obdfilter rather than sending the data via ptlrpc->lnet->ptlrpc since this would speed up IO performance and reduce local IO CPU usage significantly. It makes sense to implement this initially only for bulk IO (if that is easier), since that would typically have the highest memory copy overhead, and leave the locking/metadata to use the normal RPC paths, so that they are treated consistently with other clients (avoiding potential hard-to-find bugs).
Any modifying RPCs to the local OST should be synchronous by default, or possibly use commit-on-share, so that they do not need to be replayed if the server restarts. This implies that it is more desirable to schedule lfs mirror resync in such a way that it is reading from the local OSS and writing to a remote OSS. It might be desirable to allow this functionality to be disabled for testing purposes (e.g. local client mount in test scripts), or if local performance is more important than waiting for recovery to time out.
It should be possible to enable this mode automatically at mount time based on the client NID, rather than having e.g. a mount option force a "local mount", since it would only apply to targets that are on the same OSS/MDS and not remote targets.
A further optimization would avoid read caching data in the llite layer to avoid double cache of the same data on the node, since the OSS would also cache the same data, and the OSS cache has the advantage that it could also be shared with other clients, though it would have a higher overhead to access than the VFS page cache (depending on the IO size).
Attachments
Issue Links
- is related to
-
LU-12649 Tracker for ongoing FLR improvements
- Open
-
LU-11022 FLR1.5: "lfs mirror" usability for Burst Buffer
- Resolved
- is related to
-
LU-9771 FLR1: Landing tickets for File Level Redundancy Phase 1
- Resolved
-
LU-10916 Improve lfs mirror resync performance
- Resolved
-
LU-12722 exclude local client mounted on MDS/OSS from recovery
- Resolved
-
LU-17779 switch lnet loopback to use copy_page()
- Resolved