Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17371

OFD ALR records should not be generated for maintenance operations

Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.14.0, Lustre 2.16.0
    • None

    Description

      The OFD access log reader generates records for files that were just created and are being written for the first time.

      This can be confusing in the case of background filesystem maintenance operations, such as "lfs mirror *", "lfs migrate", and "lfs hsm *" operations that incorrectly give the impression that the file is under active application usage when it is not.

      There should be a mechanism under which these background maintenance operations do not inflate the heat of files.

      Attachments

        Activity

          [LU-17371] OFD ALR records should not be generated for maintenance operations

          iirc, there was a code to ignore ALR records on lamigo's side when lamigo itself initiated replication, but that doesn't work for externally-initiated replication.

          bzzz Alex Zhuravlev added a comment - iirc, there was a code to ignore ALR records on lamigo's side when lamigo itself initiated replication, but that doesn't work for externally-initiated replication.

          One relatively simple solution would be to have a "quiet time" for newly created objects, so that writes during this initial period do not generate ALR records and do not count as "heat" for the file. Something like "obdfilter.*.access_log_quiet=300" set on the OSTs would skip ALR generation for the first 300s after the object is created. This could be used to differentiate

          This would cover the write cases for "lfs mirror extend" and "lfs migrate" and "lfs hsm restore" where new OST objects are allocated for the file, or other tools that may be creating and writing files directly in a "cold" OST pool.

          It would not cover the case for "lfs mirror resync" where writes are done to an existing object, nor the read traffic generated by "lfs mirror extend", "lfs mirror resync", "lfs migrate", or "lfs hsm archive". In order to handle this case, there should be an interface (maybe llapi_ladvise(LU_LADVISE_NOALR)}}") that can set a flag on the file descriptor that will flag all OBD_BRW RPCs so that they do not generate ALR records. This is somewhat overlapping with the access_log_quiet functionality, but that has the benefit that it can be applied to old clients/utilities that are not able to use LU_LADVISE_NOALR.

          adilger Andreas Dilger added a comment - One relatively simple solution would be to have a "quiet time" for newly created objects, so that writes during this initial period do not generate ALR records and do not count as "heat" for the file. Something like " obdfilter.*.access_log_quiet=300 " set on the OSTs would skip ALR generation for the first 300s after the object is created. This could be used to differentiate This would cover the write cases for " lfs mirror extend " and " lfs migrate " and " lfs hsm restore " where new OST objects are allocated for the file, or other tools that may be creating and writing files directly in a "cold" OST pool. It would not cover the case for " lfs mirror resync " where writes are done to an existing object, nor the read traffic generated by " lfs mirror extend ", " lfs mirror resync ", " lfs migrate ", or " lfs hsm archive ". In order to handle this case, there should be an interface (maybe llapi_ladvise(LU_LADVISE_NOALR)}}") that can set a flag on the file descriptor that will flag all OBD_BRW RPCs so that they do not generate ALR records. This is somewhat overlapping with the access_log_quiet functionality, but that has the benefit that it can be applied to old clients/utilities that are not able to use LU_LADVISE_NOALR .

          People

            wc-triage WC Triage
            adilger Andreas Dilger
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: