Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.3.0, Lustre 2.4.0
    • Lustre 2.3.0, Lustre 2.1.3, Lustre 2.1.6
    • b2_1 g636ddbf
    • 3
    • 4236

    Description

      I have a smallish filesystem to which I only allocated a 5GB MDT since the overall dataset was always intended to be very small. This filesystem is simply being used to add and remove files in a loop with something along the lines of:

      while true; do
          cp -a /lib /mnt/lustre/foo
          rm -rf /mnt/lustre/foo
      done
      

      It seems in doing this I have filled up my MDT with an "oi.16" file that is now 94% of the space of the MDT:

      # stat /mnt/lustre/mdt/oi.16 
        File: `/mnt/lustre/mdt/oi.16'
        Size: 4733702144	Blocks: 9254568    IO Block: 4096   regular file
      Device: fd05h/64773d	Inode: 13          Links: 1
      Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
      Access: 2012-05-27 11:55:00.175323551 +0000
      Modify: 2012-05-27 11:55:00.175323551 +0000
      Change: 2012-05-27 11:55:00.175323551 +0000
      
      # df -k /mnt/lustre/mdt/
      Filesystem           1K-blocks      Used Available Use% Mounted on
      /dev/mapper/LustreVG-mdt0
                             5240128   5240128         0 100% /mnt/lustre/mdt
      
      # ls -ls /mnt/lustre/mdt/oi.16 
      4627284 -rw-r--r-- 1 root root 4733702144 May 27 11:55 /mnt/lustre/mdt/oi.16
      

      It seems the OI is leaking and not being reaped when files are removed.

      Attachments

        Activity

          [LU-1512] OI leaks

          Since we have no plan to back port more patches to b2_1 based branch, then close it.

          yong.fan nasf (Inactive) added a comment - Since we have no plan to back port more patches to b2_1 based branch, then close it.

          I'm not sure why this bug was closed. The patch for b2_1 was still not landed, and the work to rebuild the OI files in LFSCK Phase 4 is not completed.

          adilger Andreas Dilger added a comment - I'm not sure why this bug was closed. The patch for b2_1 was still not landed, and the work to rebuild the OI files in LFSCK Phase 4 is not completed.

          Don't know why it stalled. Suspect it may have been kept out near the 2.1.4 release as not being important enough for the risk. Not sure why it didn't go in later.

          bogl Bob Glossman (Inactive) added a comment - Don't know why it stalled. Suspect it may have been kept out near the 2.1.4 release as not being important enough for the risk. Not sure why it didn't go in later.

          back port to b2_1: http://review.whamcloud.com/#change,4516

          This seems to have stalled back in Dec., strangely enough, with +3. Any reason it did not progress to landing?

          brian Brian Murrell (Inactive) added a comment - back port to b2_1: http://review.whamcloud.com/#change,4516 This seems to have stalled back in Dec., strangely enough, with +3. Any reason it did not progress to landing?
          bogl Bob Glossman (Inactive) added a comment - back port to b2_1: http://review.whamcloud.com/#change,4516
          pjones Peter Jones added a comment -

          Yes we will fix this for 2.1.4

          pjones Peter Jones added a comment - Yes we will fix this for 2.1.4

          I notice this is fixed for 2.3 and 2.4. Will anything be done for 2.1.x?

          brian Brian Murrell (Inactive) added a comment - I notice this is fixed for 2.3 and 2.4. Will anything be done for 2.1.x?
          pjones Peter Jones added a comment -

          Landed for 2.3 and 2.4

          pjones Peter Jones added a comment - Landed for 2.3 and 2.4

          I think the important thing is that it improves the update performance, and only hurts lookup performance for lookup by FID for objects that are not in cache. This should be only a very small fraction of operations.

          adilger Andreas Dilger added a comment - I think the important thing is that it improves the update performance, and only hurts lookup performance for lookup by FID for objects that are not in cache. This should be only a very small fraction of operations.

          OK, that means the lookup-by-fid will check new OI file firstly, if missed, then check the old OI file. It improves the update performance by lost some lookup performance.

          Oleg, what's your suggestion? If you do not oppose, I will start the back porting.

          yong.fan nasf (Inactive) added a comment - OK, that means the lookup-by-fid will check new OI file firstly, if missed, then check the old OI file. It improves the update performance by lost some lookup performance. Oleg, what's your suggestion? If you do not oppose, I will start the back porting.

          People

            yong.fan nasf (Inactive)
            brian Brian Murrell (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            15 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: