Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5726

MDS buffer not freed when deleting files

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.7.0, Lustre 2.5.4
    • Lustre 2.4.3
    • None
    • CentOS 6.5
      Kernel 2.6.32-358.23.2
    • 3
    • 16083

    Description

      When deleting large numbers of files, memory usage on the MDS server grows significantly. Attempts to reclaim memory by dropping caches only results in some of the memory being freed. The buffer usage continues to grow until eventually the MDS server starts OOMing.

      The rate at which the buffer usage grows seems to vary but looks like it might be based on the number of clients that are deleting files and the speed at which the files are deleted.

      Attachments

        1. lustre-debug-malloc.gz
          0.2 kB
        2. mds-crash-log-20140913
          47 kB
        3. meminfo.after
          1 kB
        4. meminfo.before
          1 kB
        5. slabinfo.after
          26 kB
        6. slabinfo.before
          26 kB

        Issue Links

          Activity

            [LU-5726] MDS buffer not freed when deleting files

            Rick, could you verify that if the patch can fix your problem? It works for me, after applied the patch, I didn't see the "growing buffers" problem anymore.

            niu Niu Yawei (Inactive) added a comment - Rick, could you verify that if the patch can fix your problem? It works for me, after applied the patch, I didn't see the "growing buffers" problem anymore.
            rmohr Rick Mohr added a comment -

            In response to Andreas' question:

            dumpe2fs 1.42.12.wc1 (15-Sep-2014)
            Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent mmp flex_bg dirdata sparse_super large_file huge_file uninit_bg dir_nlink extra_isize quota
            Journal features: journal_incompat_revoke

            Our file system has 90 OSTs.

            rmohr Rick Mohr added a comment - In response to Andreas' question: dumpe2fs 1.42.12.wc1 (15-Sep-2014) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent mmp flex_bg dirdata sparse_super large_file huge_file uninit_bg dir_nlink extra_isize quota Journal features: journal_incompat_revoke Our file system has 90 OSTs.

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13452/
            Subject: LU-5726 ldiskfs: missed brelse() in large EA patch
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: ffd42ff529f5823b5a04529e1db2ea3b32a9f59f

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13452/ Subject: LU-5726 ldiskfs: missed brelse() in large EA patch Project: fs/lustre-release Branch: master Current Patch Set: Commit: ffd42ff529f5823b5a04529e1db2ea3b32a9f59f
            niu Niu Yawei (Inactive) added a comment - Port to b2_5: http://review.whamcloud.com/13464

            Niu Yawei (yawei.niu@intel.com) uploaded a new patch: http://review.whamcloud.com/13464
            Subject: LU-5726 ldiskfs: missed brelse() in large EA patch
            Project: fs/lustre-release
            Branch: b2_5
            Current Patch Set: 1
            Commit: 516a0cf6020fa169b0890ba6a51dc8295c1a44cd

            gerrit Gerrit Updater added a comment - Niu Yawei (yawei.niu@intel.com) uploaded a new patch: http://review.whamcloud.com/13464 Subject: LU-5726 ldiskfs: missed brelse() in large EA patch Project: fs/lustre-release Branch: b2_5 Current Patch Set: 1 Commit: 516a0cf6020fa169b0890ba6a51dc8295c1a44cd

            Andreas, ea_inode/large_xattr isn't enabled in my testing, but I also observed the "growing buffers" problem, I think this bug will be triggered as long as the inode has ea_in_inode.

            int
            ldiskfs_xattr_delete_inode(handle_t *handle, struct inode *inode,
                                    struct ldiskfs_xattr_ino_array **lea_ino_array)
            {
                    struct buffer_head *bh = NULL;
                    struct ldiskfs_xattr_ibody_header *header;
                    struct ldiskfs_inode *raw_inode;
                    struct ldiskfs_iloc iloc;
                    struct ldiskfs_xattr_entry *entry;
                    int error = 0;
            
                    if (!ldiskfs_test_inode_state(inode, LDISKFS_STATE_XATTR))
                            goto delete_external_ea;
            
                    error = ldiskfs_get_inode_loc(inode, &iloc);
            

            As long as the LDISKFS_STATE_XATTR is set on inode, it'll get the bh.

            niu Niu Yawei (Inactive) added a comment - Andreas, ea_inode/large_xattr isn't enabled in my testing, but I also observed the "growing buffers" problem, I think this bug will be triggered as long as the inode has ea_in_inode. int ldiskfs_xattr_delete_inode(handle_t *handle, struct inode *inode, struct ldiskfs_xattr_ino_array **lea_ino_array) { struct buffer_head *bh = NULL; struct ldiskfs_xattr_ibody_header *header; struct ldiskfs_inode *raw_inode; struct ldiskfs_iloc iloc; struct ldiskfs_xattr_entry *entry; int error = 0; if (!ldiskfs_test_inode_state(inode, LDISKFS_STATE_XATTR)) goto delete_external_ea; error = ldiskfs_get_inode_loc(inode, &iloc); As long as the LDISKFS_STATE_XATTR is set on inode, it'll get the bh.

            We are running 2.4.3 and 2.5.3 default MDT settings, so ea_inode is not enable (Here is output from one of our MDT):

            [root@puma-mds-10-5 ~]# dumpe2fs -h /dev/md0 | grep features
            dumpe2fs 1.42.7.wc1 (12-Apr-2013)
            Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery mmp flex_bg sparse_super large_file huge_file uninit_bg dir_nlink quota
            Journal features: journal_incompat_revoke

            In addition, all our filesystems hit this bug have less than 160 OSTs.

            Haisong

            haisong Haisong Cai (Inactive) added a comment - We are running 2.4.3 and 2.5.3 default MDT settings, so ea_inode is not enable (Here is output from one of our MDT): [root@puma-mds-10-5 ~] # dumpe2fs -h /dev/md0 | grep features dumpe2fs 1.42.7.wc1 (12-Apr-2013) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery mmp flex_bg sparse_super large_file huge_file uninit_bg dir_nlink quota Journal features: journal_incompat_revoke In addition, all our filesystems hit this bug have less than 160 OSTs. Haisong

            Niu, Lai, excellent work finding and fixing this bug.

            A question for the users hitting this problem - is the ea_inode (also named large_xattr) feature enabled on the MDT filesystem? Running dumpe2fs -h /dev/{mdtdev} | grep features on the MDT device would list ea_inode in the Filesystem features: output. This feature is needed if there are more than 160 OSTs in the filesystem, or if many and/or large xattrs are being stored (e.g. lots of ACLs, user xattrs, etc).

            While I hope that is the case and we can close this bug, if the ea_inode feature is not enabled on your MDT, then this patch is unlikely to solve your problem.

            adilger Andreas Dilger added a comment - Niu, Lai, excellent work finding and fixing this bug. A question for the users hitting this problem - is the ea_inode (also named large_xattr ) feature enabled on the MDT filesystem? Running dumpe2fs -h /dev/{mdtdev} | grep features on the MDT device would list ea_inode in the Filesystem features: output. This feature is needed if there are more than 160 OSTs in the filesystem, or if many and/or large xattrs are being stored (e.g. lots of ACLs, user xattrs, etc). While I hope that is the case and we can close this bug, if the ea_inode feature is not enabled on your MDT, then this patch is unlikely to solve your problem.
            niu Niu Yawei (Inactive) added a comment - patch to master: http://review.whamcloud.com/13452

            Niu Yawei (yawei.niu@intel.com) uploaded a new patch: http://review.whamcloud.com/13452
            Subject: LU-5726 ldiskfs: missed brelse() in large EA patch
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 1eb46ffbec85016db1054594094abde6d09a3616

            gerrit Gerrit Updater added a comment - Niu Yawei (yawei.niu@intel.com) uploaded a new patch: http://review.whamcloud.com/13452 Subject: LU-5726 ldiskfs: missed brelse() in large EA patch Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 1eb46ffbec85016db1054594094abde6d09a3616

            After quite a lot of testing & debugging with Lai, we found that a brelse() is missed in ldiskfs large EA patch, I'll post patch soon.

            niu Niu Yawei (Inactive) added a comment - After quite a lot of testing & debugging with Lai, we found that a brelse() is missed in ldiskfs large EA patch, I'll post patch soon.

            People

              niu Niu Yawei (Inactive)
              rmohr Rick Mohr
              Votes:
              0 Vote for this issue
              Watchers:
              22 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: