Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7012

files not being deleted from OST after being re-activated

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.8.0
    • Lustre 2.5.4
    • None
    • RHEL-6.6, lustre-2.5.4
    • 2
    • 9223372036854775807

    Description

      We had 4 OSTs that we deactivated because of an imbalance in utilization that was causing ENOSPC messages to our users. We identified a file that was consuming a significant amount of space that we deleted while the OSTs were deactivated. The file is no longer seen in the directory structure (the MDS processed the request), but the objects on the OSTs were not marked as free. After re-activating the OSTs, it doesn't appear that the llog was flushed, which should free up those objects.

      At this time, some users are not able to run jobs because they cannot allocated any space.

      We understand how this is supposed to work, but as the user in LU-4295 pointed out, it is not.

      Please advise.

      Attachments

        Issue Links

          Activity

            [LU-7012] files not being deleted from OST after being re-activated

            I can reproduce this with 2.5, considering that master is working fine I am going to find out related changes in it and port them to the 2.5

            tappro Mikhail Pershin added a comment - I can reproduce this with 2.5, considering that master is working fine I am going to find out related changes in it and port them to the 2.5

            Yu Jian, you are right, I have used master branch instead of b2_5, my mistake. I am repeating local tests with 2.5 now

            tappro Mikhail Pershin added a comment - Yu Jian, you are right, I have used master branch instead of b2_5, my mistake. I am repeating local tests with 2.5 now
            yujian Jian Yu added a comment -

            Hi Mike,

            What Lustre version did you test on? Is it Lustre 2.5.x or master branch?

            yujian Jian Yu added a comment - Hi Mike, What Lustre version did you test on? Is it Lustre 2.5.x or master branch?
            tappro Mikhail Pershin added a comment - - edited

            Well, I was trying to reproduce that locally and objects are not deleted while OSP is deactivated but they are deleted immediately when I re-activate OSP back. I used 'lctl --device <osp device> deactivate' command to deactivate an OSP. Then destroy big file that was previously created on that OST. The 'df' shows that space on related OST is not freed, after that I re-activated OSP back and 'df' shows space is returned back. Any thoughts what else may affect that?

            tappro Mikhail Pershin added a comment - - edited Well, I was trying to reproduce that locally and objects are not deleted while OSP is deactivated but they are deleted immediately when I re-activate OSP back. I used 'lctl --device <osp device> deactivate' command to deactivate an OSP. Then destroy big file that was previously created on that OST. The 'df' shows that space on related OST is not freed, after that I re-activated OSP back and 'df' shows space is returned back. Any thoughts what else may affect that?

            This is OSP problem it seems, which doesn't restart llog processing from the point where OST was de-activated. I am testing it locally now.

            tappro Mikhail Pershin added a comment - This is OSP problem it seems, which doesn't restart llog processing from the point where OST was de-activated. I am testing it locally now.

            Mike, there are two separate problems:
            1) the current method for doing OST space balancing is to deactivate the OSP and then migrate files (or let users do this gradually), so the deactivated OST will not be used for new objects. However, deactivating the OSP also prevents the MDS from destroying the objects of unlinked files (since 2.4) so space is never released on the OST, which confuses users. This issue will be addressed by LU-4825 by adding a new method for disabling object allocation on an OST without fully deactivating the OSP, so that the MDS can still process object destroys.

            2) when the deactivated OSP is reactivated again, even after restarting the OST, it does not process the unlink llogs (and presumably Astarte logs, but that is harder to check) until the MDS is stopped and restarted. The MDS should begin processing the recovery llogs after the OSP has been reactivated. That is what this bug is for.

            Even though LU-4825 will reduce the times when an OSP needs to be deactivated (i.e. Not for space balancing anymore), there are other times when this still needs to be done (e.g. OST offline for maintenance or similar) so recovery llog processing still needs to work.

            adilger Andreas Dilger added a comment - Mike, there are two separate problems: 1) the current method for doing OST space balancing is to deactivate the OSP and then migrate files (or let users do this gradually), so the deactivated OST will not be used for new objects. However, deactivating the OSP also prevents the MDS from destroying the objects of unlinked files (since 2.4) so space is never released on the OST, which confuses users. This issue will be addressed by LU-4825 by adding a new method for disabling object allocation on an OST without fully deactivating the OSP, so that the MDS can still process object destroys. 2) when the deactivated OSP is reactivated again, even after restarting the OST, it does not process the unlink llogs (and presumably Astarte logs, but that is harder to check) until the MDS is stopped and restarted. The MDS should begin processing the recovery llogs after the OSP has been reactivated. That is what this bug is for. Even though LU-4825 will reduce the times when an OSP needs to be deactivated (i.e. Not for space balancing anymore), there are other times when this still needs to be done (e.g. OST offline for maintenance or similar) so recovery llog processing still needs to work.

            Andreas, what is the difference between two cases in you comment? As I can see LU-4825 is about orphans as well. If file was deleted while OST is deactivated then its objects on OST are orphans and are not deleted after all. This is what LU-4825 is going to solve, isn't it?

            tappro Mikhail Pershin added a comment - Andreas, what is the difference between two cases in you comment? As I can see LU-4825 is about orphans as well. If file was deleted while OST is deactivated then its objects on OST are orphans and are not deleted after all. This is what LU-4825 is going to solve, isn't it?
            ezell Matt Ezell added a comment -

            We chose the "safer" route and unmounted the OST before mounting as ldiskfs. We removed the files and usage went back down.

            ezell Matt Ezell added a comment - We chose the "safer" route and unmounted the OST before mounting as ldiskfs. We removed the files and usage went back down.

            While this is related to LU-4825, I think that there are two separate issues here:

            • files are not deleted while the import is deactivated. I think that issue should be handled by LU-4825.
            • orphans are not cleaned up when the import is reactivated. I think that issue should be handled by this ticket.

            I'm not sure why the OSP doesn't restart orphan cleanup when it is reactivated, but currently this needs an MDS restart. That issue should be fixed to allow orphan cleanup to resume once the import is reactivated.

            adilger Andreas Dilger added a comment - While this is related to LU-4825 , I think that there are two separate issues here: files are not deleted while the import is deactivated. I think that issue should be handled by LU-4825 . orphans are not cleaned up when the import is reactivated. I think that issue should be handled by this ticket. I'm not sure why the OSP doesn't restart orphan cleanup when it is reactivated, but currently this needs an MDS restart. That issue should be fixed to allow orphan cleanup to resume once the import is reactivated.
            green Oleg Drokin added a comment -

            Removing objects is not going to be a problem later.
            In fact I imagine you can even mount ost in parallel as ldiskfs and remove the objects in the object dir (just make sure not to delete anything that is actually referenced).
            Kernel will moderate access so lustre and parallel ldiskfs mount can coexist (just make sure to mount it on the same node).

            Though it's still strange that objects are not deleted by log replay.
            An interesting experiment would be an MDS restart/failover, though I guess you would rather prefer not to try it.

            green Oleg Drokin added a comment - Removing objects is not going to be a problem later. In fact I imagine you can even mount ost in parallel as ldiskfs and remove the objects in the object dir (just make sure not to delete anything that is actually referenced). Kernel will moderate access so lustre and parallel ldiskfs mount can coexist (just make sure to mount it on the same node). Though it's still strange that objects are not deleted by log replay. An interesting experiment would be an MDS restart/failover, though I guess you would rather prefer not to try it.
            ezell Matt Ezell added a comment -

            Oleg-

            We have some OST object IDs of large files that should be deleted. I just checked with debugfs, and the objects are still there. If we unmount, mount as ldiskfs, remove the objects, unmount, and remount as lustre, will this cause a problem later (if the MDS delete request ever makes it through)? We'd also prefer a solution that doesn't require taking OSTs offline, but we'll do what we have to. And we have an unknown number of other orphan objects out there.

            We also dumped the llog on the MDS, and the latest entry was from October 2013.

            ezell Matt Ezell added a comment - Oleg- We have some OST object IDs of large files that should be deleted. I just checked with debugfs, and the objects are still there. If we unmount, mount as ldiskfs, remove the objects, unmount, and remount as lustre, will this cause a problem later (if the MDS delete request ever makes it through)? We'd also prefer a solution that doesn't require taking OSTs offline, but we'll do what we have to. And we have an unknown number of other orphan objects out there. We also dumped the llog on the MDS, and the latest entry was from October 2013.

            People

              tappro Mikhail Pershin
              dustb100 Dustin Leverman
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: