Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.3.0, Lustre 2.4.0
    • Lustre 2.3.0, Lustre 2.1.3, Lustre 2.1.6
    • b2_1 g636ddbf
    • 3
    • 4236

    Description

      I have a smallish filesystem to which I only allocated a 5GB MDT since the overall dataset was always intended to be very small. This filesystem is simply being used to add and remove files in a loop with something along the lines of:

      while true; do
          cp -a /lib /mnt/lustre/foo
          rm -rf /mnt/lustre/foo
      done
      

      It seems in doing this I have filled up my MDT with an "oi.16" file that is now 94% of the space of the MDT:

      # stat /mnt/lustre/mdt/oi.16 
        File: `/mnt/lustre/mdt/oi.16'
        Size: 4733702144	Blocks: 9254568    IO Block: 4096   regular file
      Device: fd05h/64773d	Inode: 13          Links: 1
      Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
      Access: 2012-05-27 11:55:00.175323551 +0000
      Modify: 2012-05-27 11:55:00.175323551 +0000
      Change: 2012-05-27 11:55:00.175323551 +0000
      
      # df -k /mnt/lustre/mdt/
      Filesystem           1K-blocks      Used Available Use% Mounted on
      /dev/mapper/LustreVG-mdt0
                             5240128   5240128         0 100% /mnt/lustre/mdt
      
      # ls -ls /mnt/lustre/mdt/oi.16 
      4627284 -rw-r--r-- 1 root root 4733702144 May 27 11:55 /mnt/lustre/mdt/oi.16
      

      It seems the OI is leaking and not being reaped when files are removed.

      Attachments

        Activity

          [LU-1512] OI leaks

          I'm not sure why this bug was closed. The patch for b2_1 was still not landed, and the work to rebuild the OI files in LFSCK Phase 4 is not completed.

          adilger Andreas Dilger added a comment - I'm not sure why this bug was closed. The patch for b2_1 was still not landed, and the work to rebuild the OI files in LFSCK Phase 4 is not completed.

          Don't know why it stalled. Suspect it may have been kept out near the 2.1.4 release as not being important enough for the risk. Not sure why it didn't go in later.

          bogl Bob Glossman (Inactive) added a comment - Don't know why it stalled. Suspect it may have been kept out near the 2.1.4 release as not being important enough for the risk. Not sure why it didn't go in later.

          back port to b2_1: http://review.whamcloud.com/#change,4516

          This seems to have stalled back in Dec., strangely enough, with +3. Any reason it did not progress to landing?

          brian Brian Murrell (Inactive) added a comment - back port to b2_1: http://review.whamcloud.com/#change,4516 This seems to have stalled back in Dec., strangely enough, with +3. Any reason it did not progress to landing?
          bogl Bob Glossman (Inactive) added a comment - back port to b2_1: http://review.whamcloud.com/#change,4516
          pjones Peter Jones added a comment -

          Yes we will fix this for 2.1.4

          pjones Peter Jones added a comment - Yes we will fix this for 2.1.4

          I notice this is fixed for 2.3 and 2.4. Will anything be done for 2.1.x?

          brian Brian Murrell (Inactive) added a comment - I notice this is fixed for 2.3 and 2.4. Will anything be done for 2.1.x?
          pjones Peter Jones added a comment -

          Landed for 2.3 and 2.4

          pjones Peter Jones added a comment - Landed for 2.3 and 2.4

          I think the important thing is that it improves the update performance, and only hurts lookup performance for lookup by FID for objects that are not in cache. This should be only a very small fraction of operations.

          adilger Andreas Dilger added a comment - I think the important thing is that it improves the update performance, and only hurts lookup performance for lookup by FID for objects that are not in cache. This should be only a very small fraction of operations.

          OK, that means the lookup-by-fid will check new OI file firstly, if missed, then check the old OI file. It improves the update performance by lost some lookup performance.

          Oleg, what's your suggestion? If you do not oppose, I will start the back porting.

          yong.fan nasf (Inactive) added a comment - OK, that means the lookup-by-fid will check new OI file firstly, if missed, then check the old OI file. It improves the update performance by lost some lookup performance. Oleg, what's your suggestion? If you do not oppose, I will start the back porting.

          For backup OI, I think it makes more sense to do the opposite - update only the new OI, and leave the old OI as only the backup. For created, only add newly created FIDs into the new OI. For normal lookup by name, the existing OI rebuild will add the FID into the new OI already. In the case of a by-FID lookup that is missed in the new OI do we need to do a lookup in the backup OI. For unlinked files, only delete the FID from the new OI.

          If there is an old (invalid) lookup by FID for a deleted file that is missed in the new OI, but found in the old OI, there will still be a chance to return an error if the inode is not found.

          I think this will reduce the amount of updates to disk, with only changes being made to the new OI file, and the old OI file will not be modified.

          As for when to do this, I think the OI rebuild should be ported to b2_1 first (subject to pre-approval from Oleg), and the "backup OI" handling can be done in Phase IV, since this is largely only a performance/usability improvement after the base OI scrub is available.

          adilger Andreas Dilger added a comment - For backup OI, I think it makes more sense to do the opposite - update only the new OI, and leave the old OI as only the backup. For created, only add newly created FIDs into the new OI. For normal lookup by name, the existing OI rebuild will add the FID into the new OI already. In the case of a by-FID lookup that is missed in the new OI do we need to do a lookup in the backup OI. For unlinked files, only delete the FID from the new OI. If there is an old (invalid) lookup by FID for a deleted file that is missed in the new OI, but found in the old OI, there will still be a chance to return an error if the inode is not found. I think this will reduce the amount of updates to disk, with only changes being made to the new OI file, and the old OI file will not be modified. As for when to do this, I think the OI rebuild should be ported to b2_1 first (subject to pre-approval from Oleg), and the "backup OI" handling can be done in Phase IV, since this is largely only a performance/usability improvement after the base OI scrub is available.

          Then current idea for "backup mode" OI scrub will be like as following:

          For create: it will insert the OI mapping into the old OI file firstly, if the target ino is in front of OI scrub current postion, then OI scrub can add the mapping to new OI file also, otherwise the OI mapping should be inserted into the new OI file by the creator.

          For unlink: it will delete the OI mapping from the new OI file firstly (if it is there).

          For lookup: it will check old OI file only, if there is no relate OI mapping, then return -ENOENT; if found related OI mapping, but fail to load related inode, then return -EIO; if found related OI mapping, but the loaded inode is not the expected one, then return -ENOENT.

          When should we do that? Now or LFSCK phase IV?

          yong.fan nasf (Inactive) added a comment - Then current idea for "backup mode" OI scrub will be like as following: For create: it will insert the OI mapping into the old OI file firstly, if the target ino is in front of OI scrub current postion, then OI scrub can add the mapping to new OI file also, otherwise the OI mapping should be inserted into the new OI file by the creator. For unlink: it will delete the OI mapping from the new OI file firstly (if it is there). For lookup: it will check old OI file only, if there is no relate OI mapping, then return -ENOENT; if found related OI mapping, but fail to load related inode, then return -EIO; if found related OI mapping, but the loaded inode is not the expected one, then return -ENOENT. When should we do that? Now or LFSCK phase IV?

          People

            yong.fan nasf (Inactive)
            brian Brian Murrell (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            15 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: