Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11306

Moving files from one MDT to another does not free inodes on source MDT

Details

    • Bug
    • Resolution: Not a Bug
    • Minor
    • None
    • Lustre 2.10.4
    • None
    • RHEL 7.5, kernel 3.10.0-862.11.6.el7.x86_64
      Seen with 2.10.4 and master (325e23899aa38de32ec00b19ed675bcc64c6e5c8)
      ldiskfs MDTs.
    • 3
    • 9223372036854775807

    Description

      When moving files that are on MDT0 to a directory residing on MDT1, the corresponding inodes on MDT are not deallocated.

      Here is what I see:

      [root@lustre211cli test]# for file in {1..999}; do echo $file > $file; done
      [root@lustre211cli test]# lfs df -i
      UUID                      Inodes       IUsed       IFree IUse% Mounted on
      test-MDT0000_UUID         419432        1262      418170   1% /test[MDT:0]
      test-MDT0001_UUID         419432         248      419184   1% /test[MDT:1]
      test-OST0000_UUID         737280        1389      735891   0% /test[OST:0]
      
      filesystem_summary:       737401        1510      735891   0% /test
      
      [root@lustre211cli test]# lfs mkdir -i 1 dir2
      [root@lustre211cli test]# mv {1..999} dir2/
      [root@lustre211cli test]# lfs df -i
      UUID                      Inodes       IUsed       IFree IUse% Mounted on
      test-MDT0000_UUID         419432        1265      418167   1% /test[MDT:0]
      test-MDT0001_UUID         419432        1249      418183   1% /test[MDT:1]
      test-OST0000_UUID         737280        1389      735891   0% /test[OST:0]
      
      filesystem_summary:       738405        2514      735891   0% /test
      
      [root@lustre211cli test]# ls
      dir1  dir2
      [root@lustre211cli test]# ls dir1
      [root@lustre211cli test]# sync
      [root@lustre211cli test]# echo 3 > /proc/sys/vm/drop_caches
      [root@lustre211cli test]# lfs df -i
      UUID                      Inodes       IUsed       IFree IUse% Mounted on
      test-MDT0000_UUID         419432        1265      418167   1% /test[MDT:0]
      test-MDT0001_UUID         419432        1249      418183   1% /test[MDT:1]
      test-OST0000_UUID         737280        1389      735891   0% /test[OST:0]
      
      filesystem_summary:       738405        2514      735891   0% /test
      

      The inodes used on MDT0 never decrease, even after umount/mount or by umounting the MDT from the MDS.

      When performing an e2fsck (1.42.13.wc5) on MDT0, the behaviour changes between 2.10.4 and 2.11.54:

      • With 2.10.4, e2fsck will find as many unattached inodes as there were files moved
      • With 2.11.54, e2fsck will not find anything

      I attach the complete debug logs from the client and server taken during this manipulation.

      Attachments

        Issue Links

          Activity

            [LU-11306] Moving files from one MDT to another does not free inodes on source MDT

            Sebastien, I think your question was perfectly reasonable, and I wish we had already implemented the automatic inode migration functionality. Until that happens, we need the extra overhead to ensure that the on-disk format remains consistent.

            I don't think cross-MDT rename is a common case for Lustre, so this shouldn't cause too much overhead.

            adilger Andreas Dilger added a comment - Sebastien, I think your question was perfectly reasonable, and I wish we had already implemented the automatic inode migration functionality. Until that happens, we need the extra overhead to ensure that the on-disk format remains consistent. I don't think cross-MDT rename is a common case for Lustre, so this shouldn't cause too much overhead.

            Hi Andreas,

             

            Ok, looks like I did not do my homework ...

            Next time, I'll read the HLD or the source code before submitting this kind of thing ... 

             

            Thanks for the explanation. You can close this ticket.

            spiechurski Sebastien Piechurski added a comment - Hi Andreas,   Ok, looks like I did not do my homework ... Next time, I'll read the HLD or the source code before submitting this kind of thing ...    Thanks for the explanation. You can close this ticket.

            Note that just renaming the file does not cause the inode to be moved, only the name is moved to the new MDT. In order to keep ext4 consistent (as you see with the avoidance of e2fsck errors), an "agent" inode needs to be added on the new MDT so that the directory entry has something to point at. If the inode were also moved to the target MDT with a rename, this would cause a number of other problems, such as changing the userspace-visible inode number (due to the new FID being assigned to map to the new MDT), breaking the DLM locking (which is also tied to the FID), break open file handles (also tied to the FID), and hard links to the file.

            There is an open ticket (LU-7607) for implementing a mechanism to preserve at least the inode number across MDTs, which would allow the common case of closed, nlink = 1 inodes to be transparently moved to another MDT, but this has not been implemented yet.

            So, for the time being the behaviour you observe is working as intended.

            adilger Andreas Dilger added a comment - Note that just renaming the file does not cause the inode to be moved, only the name is moved to the new MDT. In order to keep ext4 consistent (as you see with the avoidance of e2fsck errors), an "agent" inode needs to be added on the new MDT so that the directory entry has something to point at. If the inode were also moved to the target MDT with a rename, this would cause a number of other problems, such as changing the userspace-visible inode number (due to the new FID being assigned to map to the new MDT), breaking the DLM locking (which is also tied to the FID), break open file handles (also tied to the FID), and hard links to the file. There is an open ticket ( LU-7607 ) for implementing a mechanism to preserve at least the inode number across MDTs, which would allow the common case of closed, nlink = 1 inodes to be transparently moved to another MDT, but this has not been implemented yet. So, for the time being the behaviour you observe is working as intended.

            People

              wc-triage WC Triage
              spiechurski Sebastien Piechurski
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: