Details

    • New Feature
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • 9223372036854775807

    Description

      When mounting a client locally on the OSS or MDS it would be desirable to have a local IO path for the bulk writes from the OSC to obdfilter rather than sending the data via ptlrpc->lnet->ptlrpc since this would speed up IO performance and reduce local IO CPU usage significantly. It makes sense to implement this initially only for bulk IO (if that is easier), since that would typically have the highest memory copy overhead, and leave the locking/metadata to use the normal RPC paths, so that they are treated consistently with other clients (avoiding potential hard-to-find bugs).

      Any modifying RPCs to the local OST should be synchronous by default, or possibly use commit-on-share, so that they do not need to be replayed if the server restarts. This implies that it is more desirable to schedule lfs mirror resync in such a way that it is reading from the local OSS and writing to a remote OSS. It might be desirable to allow this functionality to be disabled for testing purposes (e.g. local client mount in test scripts), or if local performance is more important than waiting for recovery to time out.

      It should be possible to enable this mode automatically at mount time based on the client NID, rather than having e.g. a mount option force a "local mount", since it would only apply to targets that are on the same OSS/MDS and not remote targets.

      A further optimization would avoid read caching data in the llite layer to avoid double cache of the same data on the node, since the OSS would also cache the same data, and the OSS cache has the advantage that it could also be shared with other clients, though it would have a higher overhead to access than the VFS page cache (depending on the IO size).

      Attachments

        Issue Links

          Activity

            [LU-10191] FLR2: Server Local Client (SLC)
            adilger Andreas Dilger made changes -
            Description Original: When mounting a client locally on the OSS or MDS it would be desirable to have a local IO path for the bulk writes from the OSC to obdfilter rather than sending the data via ptlrpc->lnet->ptlrpc.

            Any modifying RPCs to the *local* OST should be synchronous by default, or possibly use commit-on-share, so that they do not need to be replayed if the server restarts. This implies that it is more desirable to schedule {{lfs mirror resync}} in such a way that it is reading from the local OSS and writing to a remote OSS. It might be desirable to allow this functionality to be disabled for testing purposes (e.g. local client mount in test scripts), or if local performance is more important than waiting for recovery to time out.

            It should be possible to enable this mode automatically at mount time based on the client NID, rather than having e.g. a mount option force a "local mount", since it would only apply to targets that are on the same OSS/MDS and not remote targets.

            A further optimization would avoid read caching data in the llite layer to avoid double cache of the same data, since the OSS would also cache the same data, and the OSS cache has the advantage that it could be shared with other clients.

            As a final stage, having a local {{llite<->obdfilter}} IO path that avoids data copies and LNet would potentially speed up IO performance and reduce local IO CPU usage significantly. It might be possible to implement this initially only for bulk IO, since that would typically have the highest memory copy overhead, and leave the locking/metadata to use the normal RPC paths, so that they are treated consistently (possibly avoiding hard-to-find bugs).
            New: When mounting a client locally on the OSS or MDS it would be desirable to have a local IO path for the bulk writes from the OSC to obdfilter rather than sending the data via ptlrpc->lnet->ptlrpc since this would speed up IO performance and reduce local IO CPU usage significantly. It makes sense to implement this initially only for bulk IO (if that is easier), since that would typically have the highest memory copy overhead, and leave the locking/metadata to use the normal RPC paths, so that they are treated consistently with other clients (avoiding potential hard-to-find bugs).

            Any modifying RPCs to the *local* OST should be synchronous by default, or possibly use commit-on-share, so that they do not need to be replayed if the server restarts. This implies that it is more desirable to schedule {{lfs mirror resync}} in such a way that it is reading from the local OSS and writing to a remote OSS. It might be desirable to allow this functionality to be disabled for testing purposes (e.g. local client mount in test scripts), or if local performance is more important than waiting for recovery to time out.

            It should be possible to enable this mode automatically at mount time based on the client NID, rather than having e.g. a mount option force a "local mount", since it would only apply to targets that are on the same OSS/MDS and not remote targets.

            A further optimization would avoid read caching data in the llite layer to avoid double cache of the same data on the node, since the OSS would also cache the same data, and the OSS cache has the advantage that it could also be shared with other clients, though it would have a higher overhead to access than the VFS page cache (depending on the IO size).

            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-17779 [ LU-17779 ]
            adilger Andreas Dilger made changes -
            Description Original: In order to mount a client locally on the OSS or MDS without affecting the recovery of local targets, we need the ability to mount without inserting the client into the {{last_rcvd}} file. That avoids the problem when a client+server crashes and the local client UUID is no longer available for the recovery, causing recovery to always take the maximum time.

            Any modifying RPCs to the *local* OST should be synchronous by default, or possibly use commit-on-share, so that they do not need to be replayed if the server restarts. This implies that it is more desirable to schedule {{lfs mirror resync}} in such a way that it is reading from the local OSS and writing to a remote OSS. It might be desirable to allow this functionality to be disabled for testing purposes (e.g. local client mount in test scripts), or if local performance is more important than waiting for recovery to time out.

            It should be possible to enable this mode automatically at mount time based on the client NID, rather than having e.g. a mount option force a "local mount", since it would only apply to targets that are on the same OSS/MDS and not remote targets.

            A further optimization would avoid read caching data in the llite layer to avoid double cache of the same data, since the OSS would also cache the same data, and the OSS cache has the advantage that it could be shared with other clients.

            As a final stage, having a local {{llite<->obdfilter}} IO path that avoids data copies and LNet would potentially speed up IO performance and reduce local IO CPU usage significantly. It might be possible to implement this initially only for bulk IO, since that would typically have the highest memory copy overhead, and leave the locking/metadata to use the normal RPC paths, so that they are treated consistently (possibly avoiding hard-to-find bugs).
            New: When mounting a client locally on the OSS or MDS it would be desirable to have a local IO path for the bulk writes from the OSC to obdfilter rather than sending the data via ptlrpc->lnet->ptlrpc.

            Any modifying RPCs to the *local* OST should be synchronous by default, or possibly use commit-on-share, so that they do not need to be replayed if the server restarts. This implies that it is more desirable to schedule {{lfs mirror resync}} in such a way that it is reading from the local OSS and writing to a remote OSS. It might be desirable to allow this functionality to be disabled for testing purposes (e.g. local client mount in test scripts), or if local performance is more important than waiting for recovery to time out.

            It should be possible to enable this mode automatically at mount time based on the client NID, rather than having e.g. a mount option force a "local mount", since it would only apply to targets that are on the same OSS/MDS and not remote targets.

            A further optimization would avoid read caching data in the llite layer to avoid double cache of the same data, since the OSS would also cache the same data, and the OSS cache has the advantage that it could be shared with other clients.

            As a final stage, having a local {{llite<->obdfilter}} IO path that avoids data copies and LNet would potentially speed up IO performance and reduce local IO CPU usage significantly. It might be possible to implement this initially only for bulk IO, since that would typically have the highest memory copy overhead, and leave the locking/metadata to use the normal RPC paths, so that they are treated consistently (possibly avoiding hard-to-find bugs).
            adilger Andreas Dilger made changes -
            Labels Original: FLR2 New: FLR2 medium
            jhammond John Hammond made changes -
            Link Original: This issue is related to EX-391 [ EX-391 ]
            adilger Andreas Dilger made changes -
            Assignee Original: Alex Zhuravlev [ bzzz ] New: WC Triage [ wc-triage ]
            Resolution Original: Fixed [ 1 ]
            Status Original: Closed [ 6 ] New: Reopened [ 4 ]

            I don't think this is fixed by LU-12722. That just allows local mounting of the client.

            This ticket is more about having a direct transfer of data from the local client mount to the local storage (probably OSC->OFD?) rather than doing memcpy() of the bulk data in the 0@lo interface.

            adilger Andreas Dilger added a comment - I don't think this is fixed by LU-12722 . That just allows local mounting of the client. This ticket is more about having a direct transfer of data from the local client mount to the local storage (probably OSC->OFD?) rather than doing memcpy() of the bulk data in the 0@lo interface.
            bzzz Alex Zhuravlev made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: Open [ 1 ] New: Closed [ 6 ]

            implemented in LU-12722

            bzzz Alex Zhuravlev added a comment - implemented in LU-12722
            pfarrell Patrick Farrell (Inactive) made changes -
            Assignee Original: Patrick Farrell [ pfarrell ] New: Alex Zhuravlev [ bzzz ]

            People

              wc-triage WC Triage
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

                Created:
                Updated: