Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1366

getting "dirdata length set incorrectly" running e2fsck

Details

    • Bug
    • Resolution: Won't Fix
    • Minor
    • Lustre 2.1.2
    • Lustre 2.1.1
    • DDN SFA10k - Dell R710 - TOSS2.0 OS release
    • 3
    • 4619

    Description

      After adding a network to the file system and adding the IP for the failover node to the MDS it wouldn't mount. (I later found that --param failnode= is no longer valid - much to my chagrin) I attempted to run fsck against the file system but it responded that the e2fsprogs was out of date for the file system so I ran fsck.ldiskfs. The fsck.ldiskfs found some bad inodes and corrected them but on a subsequent run with the -n option (done to make sure it was clean) I started seeing a flood of "dirdata length set incorrectly" messages. I stopped it and was able to mount the FS but later the FS spontaneously unmounted.

      What does this mean? Fortunately this file system is in pre-production and can be recreated (which is intended) but I'd like to know if this was caused by running fsck.ldiskfs since I did not see these messages on the first pass. The version of e2fsprogs (non-Redhat) is ldiskfsprogs-1.41.90.3chaos.wc3-0.ch5.x86_64. I have downloaded the wc4 version from the WC repo and installed it into a test image where I have rebooted the node into. I was able to use e2fsck to check the FS and I am using -fDy options but the "dirdata length set incorrectly" message continues to stream and has been going for more that an hour.

      Any help would be appreciated.

      Attachments

        Issue Links

          Activity

            [LU-1366] getting "dirdata length set incorrectly" running e2fsck

            Old ticket for unsupported version

            simmonsja James A Simmons added a comment - Old ticket for unsupported version

            The fix for e2fsck breaking dirdata with "-fD" is fixed in 1.42.3.wc1. The mkfs_lustre.c code now also explicitly disables extents (in b2_1 and master), which will avoid this problem for new filesystems in the future.

            What still appears to need fixing is the use of the EXT4_EXTENTS_FL on short symlinks in the osd-ldiskfs code. This would need a special conf-sanity.sh test that tries to format the MDT with extents enabled, since we don't do that by default (specifying '--mkfsoptions="-O extents"' would override the "^extents" option specified internal to mkfs_lustre.c).

            adilger Andreas Dilger added a comment - The fix for e2fsck breaking dirdata with "-fD" is fixed in 1.42.3.wc1. The mkfs_lustre.c code now also explicitly disables extents (in b2_1 and master), which will avoid this problem for new filesystems in the future. What still appears to need fixing is the use of the EXT4_EXTENTS_FL on short symlinks in the osd-ldiskfs code. This would need a special conf-sanity.sh test that tries to format the MDT with extents enabled, since we don't do that by default (specifying '--mkfsoptions="-O extents"' would override the "^extents" option specified internal to mkfs_lustre.c).

            Ah, I see what happened, the v1.42.3.wc1 tag is actually a different commit than the commit on master-lustre.

            * 9a5ba10 (tag: v1.42.3.wc1) e2fsck: allow checking on mounted root filesystem
            | * f7a92f9 (wc/master-lustre) e2fsck: allow checking on mounted root filesystem
            |/  
            

            You might want to just force-update master-lustre to be the commit that v1.42.3.wc1 tags. It looks like the only difference is the addition of the gerrit commit ID in the commit message in the tagged one.

            So where does this leave us? Do we still think that something in osd-ldiskfs or somewhere else in lustre needs fixing, or do we no believe that e2fsck is entirely to blame?

            morrone Christopher Morrone (Inactive) added a comment - Ah, I see what happened, the v1.42.3.wc1 tag is actually a different commit than the commit on master-lustre. * 9a5ba10 (tag: v1.42.3.wc1) e2fsck: allow checking on mounted root filesystem | * f7a92f9 (wc/master-lustre) e2fsck: allow checking on mounted root filesystem |/ You might want to just force-update master-lustre to be the commit that v1.42.3.wc1 tags. It looks like the only difference is the addition of the gerrit commit ID in the commit message in the tagged one. So where does this leave us? Do we still think that something in osd-ldiskfs or somewhere else in lustre needs fixing, or do we no believe that e2fsck is entirely to blame?

            Whoops, I needed an explicit "fetch --tags". Must have that remote configured wrong.

            morrone Christopher Morrone (Inactive) added a comment - Whoops, I needed an explicit "fetch --tags". Must have that remote configured wrong.

            The v1.42.3.wc1 tag is on the master-lustre branch.

            adilger Andreas Dilger added a comment - The v1.42.3.wc1 tag is on the master-lustre branch.

            I wee the v1.42.3-lustre branch, but not the 1.42.3.wc1 tag.

            morrone Christopher Morrone (Inactive) added a comment - I wee the v1.42.3-lustre branch, but not the 1.42.3.wc1 tag.

            The e2fsck fix for this is included into the rebased e2fsprogs-1.42.3.wc1 build, currently undergoing testing.

            adilger Andreas Dilger added a comment - The e2fsck fix for this is included into the rebased e2fsprogs-1.42.3.wc1 build, currently undergoing testing.

            The "Flags: 0x80000" line maps to EXT4_EXTENTS_FL, so in fact it seems this is being set/inherited incorrectly on the MDT fast symlinks. Note "Fast_link_dest: ../bin/passwd" indicates that the symlink is indeed stored inside the inode.

            My first guess is a defect in the osd-ldiskfs code that is unconditionally setting LDISKFS_EXTENTS_FL on all inodes, when this should only be set on regular files.

            adilger Andreas Dilger added a comment - The "Flags: 0x80000" line maps to EXT4_EXTENTS_FL, so in fact it seems this is being set/inherited incorrectly on the MDT fast symlinks. Note "Fast_link_dest: ../bin/passwd" indicates that the symlink is indeed stored inside the inode. My first guess is a defect in the osd-ldiskfs code that is unconditionally setting LDISKFS_EXTENTS_FL on all inodes, when this should only be set on regular files.

            People

              bobijam Zhenyu Xu
              jamervi Joe Mervini
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: