Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-20287

FLR-ECRO: change owner, group, project on degraded EC file layout

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Medium
    • None
    • Lustre 2.18.0
    • None
    • 3
    • 9223372036854775807

    Description

      When chown(), chgrp(), or llapi_projid_fset() (LU-15723, TBD) are called to modify an FLR-ECRO file with a missing OST stripe, then the RPC to the missing OST will retry indefinitely for waiting recovery.

      It should first be determined if that is going to impact user-visible operations such as chown or chgrp or lfs projid. If yes, then we need to consider timing those OST RPCs out quickly when an OST is degraded, so that user operations are not impacted.

      If the OST RPCs are just retrying in the background, we should consider timing them out for ECRO layouts eventually so that the client does not OOM or flood the network due to millions of RPCs retrying in a loop. It still makes sense to retry them for some time (minutes at least, maybe up to some maximum number of RPCs) so that there is not a lot of cleanup work needed. Otherwise, the quota usage would be inconsistent for those files until they are manually repaired.

      The OST objects' UID, GID, PROJID for timed-out RPCs can be repaired when "lfs mirror resync" is called, in the unlikely case they do not match.

      The "lfs migrate" command will already create new OST objects for the file using the IDs from the MDT object, so that should be unaffected.

      Attachments

        Issue Links

          Activity

            People

              kxu Keguang Xu
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: