Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9730

cleanup OST objects that have been leaked during interrupted/failed runs of obdfilter-survey

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      If interrupted or upon failure, obdfilter-survey can leave OST objects allocated, unconnected and consuming space.

      A first and simple fix will be to add an exit trap to the script in order to ensure that the previously created objects during the current run will be deleted.

      Alternatively, a post-failure cleanup way/tool is also required to allow later/async deletion of these same kind of orphan objects.

      Attachments

        Activity

          [LU-9730] cleanup OST objects that have been leaked during interrupted/failed runs of obdfilter-survey

          nangelinas, I know you were working on this recently.  Did you end up finishing something?

          pfarrell Patrick Farrell (Inactive) added a comment - nangelinas , I know you were working on this recently.  Did you end up finishing something?

          About the async post-failure cleanup process, as the orphan objects end up as unattached inodes with i_nlink == 1, for ldiskfs back-end case, an e2fsck run would be very helpful to identify all all concerned inodes, among some others if Lustre FS and OST are currently in-use/mounted, and this at least using the -n option.
          But then there is the need to access their LMA xattr content, in order to verify if they are of the FID_SEQ_ECHO sequence and thus use their object-id to request their destruction.
          Or may be some of the OI code could be used/modified in order to permit inode/object-id mapping, or why not to implement a method to parse OI in order to retrieve all known/registered FID_SEQ_ECHO/object-id from it and thus be able to destroy all of them.

          For ZFS back-end, looks like ZAP features usage will be required.

          bfaccini Bruno Faccini (Inactive) added a comment - About the async post-failure cleanup process, as the orphan objects end up as unattached inodes with i_nlink == 1, for ldiskfs back-end case, an e2fsck run would be very helpful to identify all all concerned inodes, among some others if Lustre FS and OST are currently in-use/mounted, and this at least using the -n option. But then there is the need to access their LMA xattr content, in order to verify if they are of the FID_SEQ_ECHO sequence and thus use their object-id to request their destruction. Or may be some of the OI code could be used/modified in order to permit inode/object-id mapping, or why not to implement a method to parse OI in order to retrieve all known/registered FID_SEQ_ECHO/object-id from it and thus be able to destroy all of them. For ZFS back-end, looks like ZAP features usage will be required.

          Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: https://review.whamcloud.com/28113
          Subject: LU-9730 tests: obdfilter-survey cleanup upon exit/signal
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 0b44735657983451407f9bbd90891b6c054bde6d

          gerrit Gerrit Updater added a comment - Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: https://review.whamcloud.com/28113 Subject: LU-9730 tests: obdfilter-survey cleanup upon exit/signal Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 0b44735657983451407f9bbd90891b6c054bde6d

          People

            bfaccini Bruno Faccini (Inactive)
            bfaccini Bruno Faccini (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: