Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-19199

Ensure all fast recovery features are enabled and working

    XMLWordPrintable

Details

    • Task
    • Resolution: Unresolved
    • Medium
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      We should verify that all known fast recovery type features are enabled and working correctly. There are a number of existing Lustre features that could be used (and further improved) to reduce recovery time.

      Depending on the IO model of the applications running on the cluster (e.g. shared file writers in a monolithic MPI application vs. independent "ensemble" processes working on their own files and directories) it should be possible to "tune" recovery to be more responsive, and potentially avoid waiting for unresponsive clients if they are not using directories or files of interest to the recovered clients. The complex part is to automatically determine if clients have overlapping domains of interest or not.

      Attachments

        Issue Links

          Activity

            People

              cfaber Colin Faber - TLC
              cfaber Colin Faber - TLC
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: