Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1366

getting "dirdata length set incorrectly" running e2fsck

Details

    • Bug
    • Resolution: Won't Fix
    • Minor
    • Lustre 2.1.2
    • Lustre 2.1.1
    • DDN SFA10k - Dell R710 - TOSS2.0 OS release
    • 3
    • 4619

    Description

      After adding a network to the file system and adding the IP for the failover node to the MDS it wouldn't mount. (I later found that --param failnode= is no longer valid - much to my chagrin) I attempted to run fsck against the file system but it responded that the e2fsprogs was out of date for the file system so I ran fsck.ldiskfs. The fsck.ldiskfs found some bad inodes and corrected them but on a subsequent run with the -n option (done to make sure it was clean) I started seeing a flood of "dirdata length set incorrectly" messages. I stopped it and was able to mount the FS but later the FS spontaneously unmounted.

      What does this mean? Fortunately this file system is in pre-production and can be recreated (which is intended) but I'd like to know if this was caused by running fsck.ldiskfs since I did not see these messages on the first pass. The version of e2fsprogs (non-Redhat) is ldiskfsprogs-1.41.90.3chaos.wc3-0.ch5.x86_64. I have downloaded the wc4 version from the WC repo and installed it into a test image where I have rebooted the node into. I was able to use e2fsck to check the FS and I am using -fDy options but the "dirdata length set incorrectly" message continues to stream and has been going for more that an hour.

      Any help would be appreciated.

      Attachments

        Issue Links

          Activity

            [LU-1366] getting "dirdata length set incorrectly" running e2fsck

            Sorry, I wasn't really using my terms consistently. The fast symlinks are those stored directly in the inode, while slow symlinks are stored in an external block. These correspond to short and long symlinks (the boundary being at 60 bytes).

            I think the issue may be that if the symlink is stored in the inode (fast symlink) but the EXTENTS flag is set, that this may incorrectly be interpreting the symlink text as extent data, and e2fsck considers this a corrupt inode.

            To test this theory, an MDT filesystem with extents enabled should get some symlinks created, then mounted as ldiskfs and lsattr run on the symlinks to see if the extent flag is set. Alternately, debugfs "stat" can be used ok the inodes to print the flags.

            adilger Andreas Dilger added a comment - Sorry, I wasn't really using my terms consistently. The fast symlinks are those stored directly in the inode, while slow symlinks are stored in an external block. These correspond to short and long symlinks (the boundary being at 60 bytes). I think the issue may be that if the symlink is stored in the inode (fast symlink) but the EXTENTS flag is set, that this may incorrectly be interpreting the symlink text as extent data, and e2fsck considers this a corrupt inode. To test this theory, an MDT filesystem with extents enabled should get some symlinks created, then mounted as ldiskfs and lsattr run on the symlinks to see if the extent flag is set. Alternately, debugfs "stat" can be used ok the inodes to print the flags.
            jamervi Joe Mervini added a comment -

            Not to detour from the subject of this ticket, but could you explain the difference between fast, short and long symlinks? I wanted to keep my ignorance on the down-low by checking the web and with several people here, but no one seems to know.

            jamervi Joe Mervini added a comment - Not to detour from the subject of this ticket, but could you explain the difference between fast, short and long symlinks? I wanted to keep my ignorance on the down-low by checking the web and with several people here, but no one seems to know.

            So that explains why the "extent" option was set for the MDT filesystem. That said, with the patch in http://review.whamcloud.com/2798 it will explicitly unset the extents feature for the MDT filesystem to avoid this problem for new filesystems.

            We still need to understand/address the extents symlink problem. I see commits related to symlinks with extents (below), but it isn't clear whether the problem only applies to short symlinks, or long symlinks as well? Given that there are reports of many symlinks being deleted, I would suspect that the problem is with fast symlinks, and somehow the MDT is setting the "EXTENTS_FL" for symlinks, when it shouldn't be doing that.

            Author: Theodore Ts'o <tytso@mit.edu>
            Date:   Thu Mar 13 23:13:18 2008 -0400
            
                e2fsck: Check for fast symlinks that have EXTENTS_FL set
                
                These shouldn't show up in the wild, but if they do, e2fsck will offer
                to clear them.
                
                Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
            
            commit 7cadc57780f3e3e8e644e8976e11a336902d4a25
            Author: Theodore Ts'o <tytso@mit.edu>
            Date:   Thu Mar 13 23:05:00 2008 -0400
            
                e2fsck: Support long symlinks which use extents
                
                Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
            
            adilger Andreas Dilger added a comment - So that explains why the "extent" option was set for the MDT filesystem. That said, with the patch in http://review.whamcloud.com/2798 it will explicitly unset the extents feature for the MDT filesystem to avoid this problem for new filesystems. We still need to understand/address the extents symlink problem. I see commits related to symlinks with extents (below), but it isn't clear whether the problem only applies to short symlinks, or long symlinks as well? Given that there are reports of many symlinks being deleted, I would suspect that the problem is with fast symlinks, and somehow the MDT is setting the "EXTENTS_FL" for symlinks, when it shouldn't be doing that. Author: Theodore Ts'o <tytso@mit.edu> Date: Thu Mar 13 23:13:18 2008 -0400 e2fsck: Check for fast symlinks that have EXTENTS_FL set These shouldn't show up in the wild, but if they do, e2fsck will offer to clear them. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> commit 7cadc57780f3e3e8e644e8976e11a336902d4a25 Author: Theodore Ts'o <tytso@mit.edu> Date: Thu Mar 13 23:05:00 2008 -0400 e2fsck: Support long symlinks which use extents Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

            Ned pointed out to me that we are adding an "/etc/mkfs.ldiskfs.conf" file. Here is an excerpt:

            [fs_types]
                   ext3 = {
                           features = has_journal
                   }
                   ldiskfs = {
                           features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize
                           auto_64-bit_support = 1
                           inode_size = 256
                   }
            
            morrone Christopher Morrone (Inactive) added a comment - Ned pointed out to me that we are adding an "/etc/mkfs.ldiskfs.conf" file. Here is an excerpt: [fs_types] ext3 = { features = has_journal } ldiskfs = { features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize auto_64-bit_support = 1 inode_size = 256 }

            Thinking about this further, I think I understand the root cause. The standard mkfs_lustre.c will call "mke2fs {lots of options}", which starts with an ext2 filesystem and enables the individual features needed to make the filesystem ext4. For the MDT filesystem, it does not turn on the "extents" feature, but it does for the OST.

            In the TOSS ldiskfsprogs, I suspect that "mkfs.ldiskfs" starts with an "ext4" filesystem, and (re)sets the same options, but for the MDT it already has "extents" enabled.

            I don't think that we are modifying mkfs.lustre. We just configure lustre "--with-ldiskfsprogs", but that code is entirely in the upstream lustre.

            The ldiskfsprogs's mkfs.ldiskfs does not intentionally change the default filesystem type from ext2 to ext4. The patch that introduces the ldiskfsprogs changes is here:

            http://review.whamcloud.com/2582

            morrone Christopher Morrone (Inactive) added a comment - - edited Thinking about this further, I think I understand the root cause. The standard mkfs_lustre.c will call "mke2fs {lots of options}", which starts with an ext2 filesystem and enables the individual features needed to make the filesystem ext4. For the MDT filesystem, it does not turn on the "extents" feature, but it does for the OST. In the TOSS ldiskfsprogs, I suspect that "mkfs.ldiskfs" starts with an "ext4" filesystem, and (re)sets the same options, but for the MDT it already has "extents" enabled. I don't think that we are modifying mkfs.lustre. We just configure lustre "--with-ldiskfsprogs", but that code is entirely in the upstream lustre. The ldiskfsprogs's mkfs.ldiskfs does not intentionally change the default filesystem type from ext2 to ext4. The patch that introduces the ldiskfsprogs changes is here: http://review.whamcloud.com/2582

            FEATURE_I8 is "mmp" and FEATURE_I12 is "dir_data". These are not being printed because you are using the stock "debugfs" instead of "debugfs.ldiskfs" (or whatever the equivalent is), which doesn't know what these features are called. That is expected when using a separate ldiskfsprogs and leaving the stock e2fsprogs installed.

            The "fsck.ldiskfs -fDy" problem will still exist, even without the extents option, unless you apply the patch from http://review.whamcloud.com/2661.

            adilger Andreas Dilger added a comment - FEATURE_I8 is "mmp" and FEATURE_I12 is "dir_data". These are not being printed because you are using the stock "debugfs" instead of "debugfs.ldiskfs" (or whatever the equivalent is), which doesn't know what these features are called. That is expected when using a separate ldiskfsprogs and leaving the stock e2fsprogs installed. The "fsck.ldiskfs -fDy" problem will still exist, even without the extents option, unless you apply the patch from http://review.whamcloud.com/2661 .
            jamervi Joe Mervini added a comment -

            To be thorough I created the rest of the file system after reformatting the the MDT and reran the symlink test. LLNL's fsck.ldiskfs -fy passed without errors.

            jamervi Joe Mervini added a comment - To be thorough I created the rest of the file system after reformatting the the MDT and reran the symlink test. LLNL's fsck.ldiskfs -fy passed without errors.
            jamervi Joe Mervini added a comment -

            I was able to reformat the MDT with mkfsoptions="-O ^extent" with the TOSS bits. It doesn't show up in the features of dumpe2fs but there is FEATURE_I8 and _I12 that I haven't found an reference for:

            Filesystem features: has_journal ext_attr resize_inode dir_index filetype FEATURE_I8 flex_bg FEATURE_I12 sparse_super large_file huge_file uninit_bg dir_nlink extra_isize

            So is it your opinion that we should start from scratch again while we have the chance?

            jamervi Joe Mervini added a comment - I was able to reformat the MDT with mkfsoptions="-O ^extent" with the TOSS bits. It doesn't show up in the features of dumpe2fs but there is FEATURE_I8 and _I12 that I haven't found an reference for: Filesystem features: has_journal ext_attr resize_inode dir_index filetype FEATURE_I8 flex_bg FEATURE_I12 sparse_super large_file huge_file uninit_bg dir_nlink extra_isize So is it your opinion that we should start from scratch again while we have the chance?

            In the ldiskfs code (but not in upstream ext4) it is possible to mount a filesystem with the "noextents" mount option, so that new files/directories are not created with the extent flag set. This does not affect existing files/directories. There isn't really a mechanism to "migrate" such files to non-extent files without essentially a file-level backup/restore, but that will not currently work for 2.x MDT filesystems due to the Object Index becoming inconsistent. Only block-device level backup/restore is currently functional for 2.x MDT filesystems. The OI Scrub feature is nearing completion and will be landed for the 2.3 release, which will again allow file-level backup/restore for the MDT.

            I think a combination of factors is required here, to avoid this problem for other filesystems:

            • explicitly disable "extents" for MDT filesystems in mkfs_lustre.c (should go into 2.1.x)
            • fix e2fsck so that it does not corrupt extent-mapped symlinks (this may already be fixed in newer e2fsprogs)
            • land the OI Scrub feature for 2.3 (this is likely too much of a "feature" for 2.1.x)
            adilger Andreas Dilger added a comment - In the ldiskfs code (but not in upstream ext4) it is possible to mount a filesystem with the "noextents" mount option, so that new files/directories are not created with the extent flag set. This does not affect existing files/directories. There isn't really a mechanism to "migrate" such files to non-extent files without essentially a file-level backup/restore, but that will not currently work for 2.x MDT filesystems due to the Object Index becoming inconsistent. Only block-device level backup/restore is currently functional for 2.x MDT filesystems. The OI Scrub feature is nearing completion and will be landed for the 2.3 release, which will again allow file-level backup/restore for the MDT. I think a combination of factors is required here, to avoid this problem for other filesystems: explicitly disable "extents" for MDT filesystems in mkfs_lustre.c (should go into 2.1.x) fix e2fsck so that it does not corrupt extent-mapped symlinks (this may already be fixed in newer e2fsprogs) land the OI Scrub feature for 2.3 (this is likely too much of a "feature" for 2.1.x)
            jamervi Joe Mervini added a comment -

            All this being said, is there a way to back out the extent feature without reformatting the file system? As before I would prefer to deal with the pain now as opposed to down the road when there's a petabyte of data with no place to move it.

            jamervi Joe Mervini added a comment - All this being said, is there a way to back out the extent feature without reformatting the file system? As before I would prefer to deal with the pain now as opposed to down the road when there's a petabyte of data with no place to move it.

            Thinking about this further, I think I understand the root cause. The standard mkfs_lustre.c will call "mke2fs

            {lots of options}

            ", which starts with an ext2 filesystem and enables the individual features needed to make the filesystem ext4. For the MDT filesystem, it does not turn on the "extents" feature, but it does for the OST.

            In the TOSS ldiskfsprogs, I suspect that "mkfs.ldiskfs" starts with an "ext4" filesystem, and (re)sets the same options, but for the MDT it already has "extents" enabled.

            It makes sense to explicitly disable the extents feature in mkfs_lustre.c for MDT filesystems, since they provide absolutely no benefit, and may instead be hurting performance. That is a simple matter of appending ",^extents" to the list of MDT features.

            adilger Andreas Dilger added a comment - Thinking about this further, I think I understand the root cause. The standard mkfs_lustre.c will call "mke2fs {lots of options} ", which starts with an ext2 filesystem and enables the individual features needed to make the filesystem ext4. For the MDT filesystem, it does not turn on the "extents" feature, but it does for the OST. In the TOSS ldiskfsprogs, I suspect that "mkfs.ldiskfs" starts with an "ext4" filesystem, and (re)sets the same options, but for the MDT it already has "extents" enabled. It makes sense to explicitly disable the extents feature in mkfs_lustre.c for MDT filesystems, since they provide absolutely no benefit, and may instead be hurting performance. That is a simple matter of appending ",^extents" to the list of MDT features.

            People

              bobijam Zhenyu Xu
              jamervi Joe Mervini
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: