Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-19199

Ensure all fast recovery features are enabled and working

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Unresolved
    • Icon: Medium Medium
    • None
    • None
    • 3
    • 9223372036854775807

      We should verify that all known fast recovery type features are enabled and working correctly. There are a number of existing Lustre features that could be used (and further improved) to reduce recovery time.

      Depending on the IO model of the applications running on the cluster (e.g. shared file writers in a monolithic MPI application vs. independent "ensemble" processes working on their own files and directories) it should be possible to "tune" recovery to be more responsive, and potentially avoid waiting for unresponsive clients if they are not using directories or files of interest to the recovered clients. The complex part is to automatically determine if clients have overlapping domains of interest or not.

            cfaber Colin Faber - TLC
            cfaber Colin Faber - TLC
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: