Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7012

files not being deleted from OST after being re-activated

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.8.0
    • Lustre 2.5.4
    • None
    • RHEL-6.6, lustre-2.5.4
    • 2
    • 9223372036854775807

    Description

      We had 4 OSTs that we deactivated because of an imbalance in utilization that was causing ENOSPC messages to our users. We identified a file that was consuming a significant amount of space that we deleted while the OSTs were deactivated. The file is no longer seen in the directory structure (the MDS processed the request), but the objects on the OSTs were not marked as free. After re-activating the OSTs, it doesn't appear that the llog was flushed, which should free up those objects.

      At this time, some users are not able to run jobs because they cannot allocated any space.

      We understand how this is supposed to work, but as the user in LU-4295 pointed out, it is not.

      Please advise.

      Attachments

        Issue Links

          Activity

            [LU-7012] files not being deleted from OST after being re-activated
            tappro Mikhail Pershin added a comment - - edited

            Well, I was trying to reproduce that locally and objects are not deleted while OSP is deactivated but they are deleted immediately when I re-activate OSP back. I used 'lctl --device <osp device> deactivate' command to deactivate an OSP. Then destroy big file that was previously created on that OST. The 'df' shows that space on related OST is not freed, after that I re-activated OSP back and 'df' shows space is returned back. Any thoughts what else may affect that?

            tappro Mikhail Pershin added a comment - - edited Well, I was trying to reproduce that locally and objects are not deleted while OSP is deactivated but they are deleted immediately when I re-activate OSP back. I used 'lctl --device <osp device> deactivate' command to deactivate an OSP. Then destroy big file that was previously created on that OST. The 'df' shows that space on related OST is not freed, after that I re-activated OSP back and 'df' shows space is returned back. Any thoughts what else may affect that?

            This is OSP problem it seems, which doesn't restart llog processing from the point where OST was de-activated. I am testing it locally now.

            tappro Mikhail Pershin added a comment - This is OSP problem it seems, which doesn't restart llog processing from the point where OST was de-activated. I am testing it locally now.

            Mike, there are two separate problems:
            1) the current method for doing OST space balancing is to deactivate the OSP and then migrate files (or let users do this gradually), so the deactivated OST will not be used for new objects. However, deactivating the OSP also prevents the MDS from destroying the objects of unlinked files (since 2.4) so space is never released on the OST, which confuses users. This issue will be addressed by LU-4825 by adding a new method for disabling object allocation on an OST without fully deactivating the OSP, so that the MDS can still process object destroys.

            2) when the deactivated OSP is reactivated again, even after restarting the OST, it does not process the unlink llogs (and presumably Astarte logs, but that is harder to check) until the MDS is stopped and restarted. The MDS should begin processing the recovery llogs after the OSP has been reactivated. That is what this bug is for.

            Even though LU-4825 will reduce the times when an OSP needs to be deactivated (i.e. Not for space balancing anymore), there are other times when this still needs to be done (e.g. OST offline for maintenance or similar) so recovery llog processing still needs to work.

            adilger Andreas Dilger added a comment - Mike, there are two separate problems: 1) the current method for doing OST space balancing is to deactivate the OSP and then migrate files (or let users do this gradually), so the deactivated OST will not be used for new objects. However, deactivating the OSP also prevents the MDS from destroying the objects of unlinked files (since 2.4) so space is never released on the OST, which confuses users. This issue will be addressed by LU-4825 by adding a new method for disabling object allocation on an OST without fully deactivating the OSP, so that the MDS can still process object destroys. 2) when the deactivated OSP is reactivated again, even after restarting the OST, it does not process the unlink llogs (and presumably Astarte logs, but that is harder to check) until the MDS is stopped and restarted. The MDS should begin processing the recovery llogs after the OSP has been reactivated. That is what this bug is for. Even though LU-4825 will reduce the times when an OSP needs to be deactivated (i.e. Not for space balancing anymore), there are other times when this still needs to be done (e.g. OST offline for maintenance or similar) so recovery llog processing still needs to work.

            Andreas, what is the difference between two cases in you comment? As I can see LU-4825 is about orphans as well. If file was deleted while OST is deactivated then its objects on OST are orphans and are not deleted after all. This is what LU-4825 is going to solve, isn't it?

            tappro Mikhail Pershin added a comment - Andreas, what is the difference between two cases in you comment? As I can see LU-4825 is about orphans as well. If file was deleted while OST is deactivated then its objects on OST are orphans and are not deleted after all. This is what LU-4825 is going to solve, isn't it?
            ezell Matt Ezell added a comment -

            We chose the "safer" route and unmounted the OST before mounting as ldiskfs. We removed the files and usage went back down.

            ezell Matt Ezell added a comment - We chose the "safer" route and unmounted the OST before mounting as ldiskfs. We removed the files and usage went back down.

            While this is related to LU-4825, I think that there are two separate issues here:

            • files are not deleted while the import is deactivated. I think that issue should be handled by LU-4825.
            • orphans are not cleaned up when the import is reactivated. I think that issue should be handled by this ticket.

            I'm not sure why the OSP doesn't restart orphan cleanup when it is reactivated, but currently this needs an MDS restart. That issue should be fixed to allow orphan cleanup to resume once the import is reactivated.

            adilger Andreas Dilger added a comment - While this is related to LU-4825 , I think that there are two separate issues here: files are not deleted while the import is deactivated. I think that issue should be handled by LU-4825 . orphans are not cleaned up when the import is reactivated. I think that issue should be handled by this ticket. I'm not sure why the OSP doesn't restart orphan cleanup when it is reactivated, but currently this needs an MDS restart. That issue should be fixed to allow orphan cleanup to resume once the import is reactivated.
            green Oleg Drokin added a comment -

            Removing objects is not going to be a problem later.
            In fact I imagine you can even mount ost in parallel as ldiskfs and remove the objects in the object dir (just make sure not to delete anything that is actually referenced).
            Kernel will moderate access so lustre and parallel ldiskfs mount can coexist (just make sure to mount it on the same node).

            Though it's still strange that objects are not deleted by log replay.
            An interesting experiment would be an MDS restart/failover, though I guess you would rather prefer not to try it.

            green Oleg Drokin added a comment - Removing objects is not going to be a problem later. In fact I imagine you can even mount ost in parallel as ldiskfs and remove the objects in the object dir (just make sure not to delete anything that is actually referenced). Kernel will moderate access so lustre and parallel ldiskfs mount can coexist (just make sure to mount it on the same node). Though it's still strange that objects are not deleted by log replay. An interesting experiment would be an MDS restart/failover, though I guess you would rather prefer not to try it.
            ezell Matt Ezell added a comment -

            Oleg-

            We have some OST object IDs of large files that should be deleted. I just checked with debugfs, and the objects are still there. If we unmount, mount as ldiskfs, remove the objects, unmount, and remount as lustre, will this cause a problem later (if the MDS delete request ever makes it through)? We'd also prefer a solution that doesn't require taking OSTs offline, but we'll do what we have to. And we have an unknown number of other orphan objects out there.

            We also dumped the llog on the MDS, and the latest entry was from October 2013.

            ezell Matt Ezell added a comment - Oleg- We have some OST object IDs of large files that should be deleted. I just checked with debugfs, and the objects are still there. If we unmount, mount as ldiskfs, remove the objects, unmount, and remount as lustre, will this cause a problem later (if the MDS delete request ever makes it through)? We'd also prefer a solution that doesn't require taking OSTs offline, but we'll do what we have to. And we have an unknown number of other orphan objects out there. We also dumped the llog on the MDS, and the latest entry was from October 2013.

            Oleg,
            Below is the log messages for the reactivation of atlas1-OST039b, atlas1-OST02c1, atlas1-OST02fb, and atlas1-OST02ce:

            Aug 17 08:11:19 atlas-mds1.ccs.ornl.gov kernel: [2973887.632969] Lustre: setting import atlas1-OST02c1_UUID INACTIVE by administrator request
            Aug 17 08:11:25 atlas-mds1.ccs.ornl.gov kernel: [2973893.078469] Lustre: setting import atlas1-OST02fb_UUID INACTIVE by administrator request
            Aug 17 08:11:30 atlas-mds1.ccs.ornl.gov kernel: [2973898.379605] Lustre: setting import atlas1-OST039b_UUID INACTIVE by administrator request
            Aug 17 08:42:11 atlas-mds1.ccs.ornl.gov kernel: [2975741.381423] Lustre: atlas1-OST039b-osc-MDT0000: Connection to atlas1-OST039b (at 10.36.225.89@o2ib) was lost; in progress operations using this service will wait for recovery to complete
            Aug 17 08:42:11 atlas-mds1.ccs.ornl.gov kernel: [2975741.400737] LustreError: 167-0: atlas1-OST039b-osc-MDT0000: This client was evicted by atlas1-OST039b; in progress operations using this service will fail.
            Aug 17 08:42:11 atlas-mds1.ccs.ornl.gov kernel: [2975741.416837] Lustre: atlas1-OST039b-osc-MDT0000: Connection restored to atlas1-OST039b (at 10.36.225.89@o2ib)
            Aug 17 08:42:18 atlas-mds1.ccs.ornl.gov kernel: [2975747.822971] Lustre: atlas1-OST02fb-osc-MDT0000: Connection to atlas1-OST02fb (at 10.36.225.73@o2ib) was lost; in progress operations using this service will wait for recovery to complete
            Aug 17 08:42:18 atlas-mds1.ccs.ornl.gov kernel: [2975747.842235] LustreError: 167-0: atlas1-OST02fb-osc-MDT0000: This client was evicted by atlas1-OST02fb; in progress operations using this service will fail.
            Aug 17 08:42:18 atlas-mds1.ccs.ornl.gov kernel: [2975747.858294] Lustre: atlas1-OST02fb-osc-MDT0000: Connection restored to atlas1-OST02fb (at 10.36.225.73@o2ib)
            Aug 17 08:42:26 atlas-mds1.ccs.ornl.gov kernel: [2975756.287935] Lustre: atlas1-OST02c1-osc-MDT0000: Connection to atlas1-OST02c1 (at 10.36.225.159@o2ib) was lost; in progress operations using this service will wait for recovery to complete
            Aug 17 08:42:26 atlas-mds1.ccs.ornl.gov kernel: [2975756.307394] LustreError: 167-0: atlas1-OST02c1-osc-MDT0000: This client was evicted by atlas1-OST02c1; in progress operations using this service will fail.
            Aug 17 08:42:26 atlas-mds1.ccs.ornl.gov kernel: [2975756.323480] Lustre: atlas1-OST02c1-osc-MDT0000: Connection restored to atlas1-OST02c1 (at 10.36.225.159@o2ib)
            Aug 17 11:53:44 atlas-mds1.ccs.ornl.gov kernel: [2987244.922580] Lustre: setting import atlas1-OST02c7_UUID INACTIVE by administrator request
            Aug 17 11:53:47 atlas-mds1.ccs.ornl.gov kernel: [2987248.220947] Lustre: atlas1-OST02c7-osc-MDT0000: Connection to atlas1-OST02c7 (at 10.36.225.165@o2ib) was lost; in progress operations using this service will wait for recovery to complete
            Aug 17 11:53:47 atlas-oss2h8.ccs.ornl.gov kernel: [7165636.459725] Lustre: atlas1-OST02c7: Client atlas1-MDT0000-mdtlov_UUID (at 10.36.226.72@o2ib) reconnecting
            Aug 17 11:53:47 atlas-mds1.ccs.ornl.gov kernel: [2987248.265826] LustreError: 167-0: atlas1-OST02c7-osc-MDT0000: This client was evicted by atlas1-OST02c7; in progress operations using this service will fail.
            Aug 17 11:53:47 atlas-mds1.ccs.ornl.gov kernel: [2987248.281892] Lustre: atlas1-OST02c7-osc-MDT0000: Connection restored to atlas1-OST02c7 (at 10.36.225.165@o2ib)
            Aug 17 11:53:47 atlas-oss2h8.ccs.ornl.gov kernel: [7165636.501432] Lustre: atlas1-OST02c7: deleting orphan objects from 0x0:11321511 to 0x0:11321537
            
            dustb100 Dustin Leverman added a comment - Oleg, Below is the log messages for the reactivation of atlas1-OST039b, atlas1-OST02c1, atlas1-OST02fb, and atlas1-OST02ce: Aug 17 08:11:19 atlas-mds1.ccs.ornl.gov kernel: [2973887.632969] Lustre: setting import atlas1-OST02c1_UUID INACTIVE by administrator request Aug 17 08:11:25 atlas-mds1.ccs.ornl.gov kernel: [2973893.078469] Lustre: setting import atlas1-OST02fb_UUID INACTIVE by administrator request Aug 17 08:11:30 atlas-mds1.ccs.ornl.gov kernel: [2973898.379605] Lustre: setting import atlas1-OST039b_UUID INACTIVE by administrator request Aug 17 08:42:11 atlas-mds1.ccs.ornl.gov kernel: [2975741.381423] Lustre: atlas1-OST039b-osc-MDT0000: Connection to atlas1-OST039b (at 10.36.225.89@o2ib) was lost; in progress operations using this service will wait for recovery to complete Aug 17 08:42:11 atlas-mds1.ccs.ornl.gov kernel: [2975741.400737] LustreError: 167-0: atlas1-OST039b-osc-MDT0000: This client was evicted by atlas1-OST039b; in progress operations using this service will fail. Aug 17 08:42:11 atlas-mds1.ccs.ornl.gov kernel: [2975741.416837] Lustre: atlas1-OST039b-osc-MDT0000: Connection restored to atlas1-OST039b (at 10.36.225.89@o2ib) Aug 17 08:42:18 atlas-mds1.ccs.ornl.gov kernel: [2975747.822971] Lustre: atlas1-OST02fb-osc-MDT0000: Connection to atlas1-OST02fb (at 10.36.225.73@o2ib) was lost; in progress operations using this service will wait for recovery to complete Aug 17 08:42:18 atlas-mds1.ccs.ornl.gov kernel: [2975747.842235] LustreError: 167-0: atlas1-OST02fb-osc-MDT0000: This client was evicted by atlas1-OST02fb; in progress operations using this service will fail. Aug 17 08:42:18 atlas-mds1.ccs.ornl.gov kernel: [2975747.858294] Lustre: atlas1-OST02fb-osc-MDT0000: Connection restored to atlas1-OST02fb (at 10.36.225.73@o2ib) Aug 17 08:42:26 atlas-mds1.ccs.ornl.gov kernel: [2975756.287935] Lustre: atlas1-OST02c1-osc-MDT0000: Connection to atlas1-OST02c1 (at 10.36.225.159@o2ib) was lost; in progress operations using this service will wait for recovery to complete Aug 17 08:42:26 atlas-mds1.ccs.ornl.gov kernel: [2975756.307394] LustreError: 167-0: atlas1-OST02c1-osc-MDT0000: This client was evicted by atlas1-OST02c1; in progress operations using this service will fail. Aug 17 08:42:26 atlas-mds1.ccs.ornl.gov kernel: [2975756.323480] Lustre: atlas1-OST02c1-osc-MDT0000: Connection restored to atlas1-OST02c1 (at 10.36.225.159@o2ib) Aug 17 11:53:44 atlas-mds1.ccs.ornl.gov kernel: [2987244.922580] Lustre: setting import atlas1-OST02c7_UUID INACTIVE by administrator request Aug 17 11:53:47 atlas-mds1.ccs.ornl.gov kernel: [2987248.220947] Lustre: atlas1-OST02c7-osc-MDT0000: Connection to atlas1-OST02c7 (at 10.36.225.165@o2ib) was lost; in progress operations using this service will wait for recovery to complete Aug 17 11:53:47 atlas-oss2h8.ccs.ornl.gov kernel: [7165636.459725] Lustre: atlas1-OST02c7: Client atlas1-MDT0000-mdtlov_UUID (at 10.36.226.72@o2ib) reconnecting Aug 17 11:53:47 atlas-mds1.ccs.ornl.gov kernel: [2987248.265826] LustreError: 167-0: atlas1-OST02c7-osc-MDT0000: This client was evicted by atlas1-OST02c7; in progress operations using this service will fail. Aug 17 11:53:47 atlas-mds1.ccs.ornl.gov kernel: [2987248.281892] Lustre: atlas1-OST02c7-osc-MDT0000: Connection restored to atlas1-OST02c7 (at 10.36.225.165@o2ib) Aug 17 11:53:47 atlas-oss2h8.ccs.ornl.gov kernel: [7165636.501432] Lustre: atlas1-OST02c7: deleting orphan objects from 0x0:11321511 to 0x0:11321537
            green Oleg Drokin added a comment -

            Essentially the footprint I am looking for (on the MDS) would be:

            [ 7384.128329] Lustre: setting import lustre-OST0001_UUID INACTIVE by administrator request
            [ 7403.759510] Lustre: lustre-OST0001-osc-ffff8800b96a1800: Connection to lustre-OST0001 (at 192.168.10.227@tcp) was lost; in progress operations using this service will wait for recovery to complete
            [ 7403.764253] LustreError: 167-0: lustre-OST0001-osc-ffff8800b96a1800: This client was evicted by lustre-OST0001; in progress operations using this service will fail.
            [ 7403.765235] Lustre: lustre-OST0001-osc-ffff8800b96a1800: Connection restored to lustre-OST0001 (at 192.168.10.227@tcp)
            

            Where the first INACTIVE would come from lctl deactivate and the connection restored would come from lctl activate.

            green Oleg Drokin added a comment - Essentially the footprint I am looking for (on the MDS) would be: [ 7384.128329] Lustre: setting import lustre-OST0001_UUID INACTIVE by administrator request [ 7403.759510] Lustre: lustre-OST0001-osc-ffff8800b96a1800: Connection to lustre-OST0001 (at 192.168.10.227@tcp) was lost; in progress operations using this service will wait for recovery to complete [ 7403.764253] LustreError: 167-0: lustre-OST0001-osc-ffff8800b96a1800: This client was evicted by lustre-OST0001; in progress operations using this service will fail. [ 7403.765235] Lustre: lustre-OST0001-osc-ffff8800b96a1800: Connection restored to lustre-OST0001 (at 192.168.10.227@tcp) Where the first INACTIVE would come from lctl deactivate and the connection restored would come from lctl activate.
            green Oleg Drokin added a comment -

            "Permanently reactivating" is just a message from mgs.
            How about on the MDs logs showing reconnect to the OST and OST showing thta MDT connected to it?

            green Oleg Drokin added a comment - "Permanently reactivating" is just a message from mgs. How about on the MDs logs showing reconnect to the OST and OST showing thta MDT connected to it?

            People

              tappro Mikhail Pershin
              dustb100 Dustin Leverman
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: