Details

    • Bug
    • Resolution: Fixed
    • Critical
    • None
    • Lustre 2.1.3
    • None
    • 3
    • 6149

    Description

      Doing an ls gives the following error
      ls: reading directory d4_stats/: Input/output error

      client error:
      [5237686.818045] LustreError: 77522:0:(dir.c:648:ll_readdir()) error reading dir [0x4488b6ced74:0x1edb5:0x0] at 0: rc -5
      [5237686.849844] LustreError: 77522:0:(dir.c:648:ll_readdir()) Skipped 51 previous similar messages

      MDT Error:
      Jan 16 11:18:37 nbp1-mds kernel: Lustre: 15390:0:(mdd_object.c:2412:__mdd_readpage()) build page failed: -5!

      Please advise on debug flags to use to gather logs.

      Attachments

        1. fsck.2.8.2012.nbp1.out.gz
          1.63 MB
        2. mdtsnap.fsck.out.gz
          1.10 MB
        3. nbp1FSCK.out.gz
          4.56 MB

        Issue Links

          Activity

            [LU-2627] /bin/ls gets Input/output error

            This problem will persist for large 1.8 directories that are renamed until a version of the LU-2638 patch http://review.whamcloud.com/5179 is applied. For the short term, until this patch is applied, it is possible to disable the dirdata feature on the unmounted MDT filesystem:

            tune2fs -O dirdata /dev/mdtdev
            

            though this will have some negative performance impact for all newly-created files when doing name lookups and "ls -l".

            adilger Andreas Dilger added a comment - This problem will persist for large 1.8 directories that are renamed until a version of the LU-2638 patch http://review.whamcloud.com/5179 is applied. For the short term, until this patch is applied, it is possible to disable the dirdata feature on the unmounted MDT filesystem: tune2fs -O dirdata /dev/mdtdev though this will have some negative performance impact for all newly-created files when doing name lookups and "ls -l".

            We seem to have hit this issue again on the same filesystem.

            pfe1 ~ # ls -l /nobackupp1/xmeng/run_sc_anisopi/run06_dipole_semiimpl_nohyp_taug
            r_60000ss/SC
            ls: reading directory /nobackupp1/xmeng/run_sc_anisopi/run06_dipole_semiimpl_noh
            yp_taugr_60000ss/SC: Input/output error
            total 0

            from the mdt
            Feb 8 06:50:58 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Unrecognised inode hash code 18 for directory #17309149
            Feb 8 06:50:58 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Corrupt dir inode 17309149, running e2fsck is recommended.
            Feb 8 06:51:57 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Unrecognised inode hash code 8 for directory #17309159
            Feb 8 06:51:57 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Corrupt dir inode 17309159, running e2fsck is recommended.
            Feb 8 08:35:12 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Unrecognised inode hash code 15 for directory #130557236
            Feb 8 08:35:12 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Corrupt dir inode 130557236, running e2fsck is recommended.
            Feb 8 11:45:38 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Unrecognised inode hash code 3 for directory #157287952
            Feb 8 11:45:39 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Corrupt dir inode 157287952, running e2fsck is recommended.
            Feb 8 11:46:07 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Unrecognised inode hash code 4 for directory #157331367
            Feb 8 11:46:07 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Corrupt dir inode 157331367, running e2fsck is recommended.

            mhanafi Mahmoud Hanafi added a comment - We seem to have hit this issue again on the same filesystem. pfe1 ~ # ls -l /nobackupp1/xmeng/run_sc_anisopi/run06_dipole_semiimpl_nohyp_taug r_60000ss/SC ls: reading directory /nobackupp1/xmeng/run_sc_anisopi/run06_dipole_semiimpl_noh yp_taugr_60000ss/SC: Input/output error total 0 from the mdt Feb 8 06:50:58 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Unrecognised inode hash code 18 for directory #17309149 Feb 8 06:50:58 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Corrupt dir inode 17309149, running e2fsck is recommended. Feb 8 06:51:57 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Unrecognised inode hash code 8 for directory #17309159 Feb 8 06:51:57 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Corrupt dir inode 17309159, running e2fsck is recommended. Feb 8 08:35:12 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Unrecognised inode hash code 15 for directory #130557236 Feb 8 08:35:12 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Corrupt dir inode 130557236, running e2fsck is recommended. Feb 8 11:45:38 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Unrecognised inode hash code 3 for directory #157287952 Feb 8 11:45:39 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Corrupt dir inode 157287952, running e2fsck is recommended. Feb 8 11:46:07 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Unrecognised inode hash code 4 for directory #157331367 Feb 8 11:46:07 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Corrupt dir inode 157331367, running e2fsck is recommended.

            Is the issue closed, or is there some other help we can give you?

            cliffw Cliff White (Inactive) added a comment - Is the issue closed, or is there some other help we can give you?

            At this point we have been able to run fsck on the mdt and have recovered from the errors.

            mhanafi Mahmoud Hanafi added a comment - At this point we have been able to run fsck on the mdt and have recovered from the errors.

            What is your current state? What help can we give you?

            cliffw Cliff White (Inactive) added a comment - What is your current state? What help can we give you?

            The "dirdata" option is enabled by default for 2.x filesystems, but I don't think it is necessarily advisable to disable it at this time. It does appear at first glance that running e2fsck after removing the dirdata feature would handle this correctly and clear the extra dirdata flag in each dirent, but we haven't tested this at all, and it would also cause the MDS to become considerably slower.

            So far I don't see any indication besides the mixup with ".." entries that there is anything seriously wrong with these directories. The bytes at the start of the directory are used for ".", "..", and the htree index on directories over 4kB in size, and not user data. e2fsck should regenerate all of the needed information from redundant information elsewhere, except being able to move the entry from lost+found back to the proper place in the tree.

            adilger Andreas Dilger added a comment - The "dirdata" option is enabled by default for 2.x filesystems, but I don't think it is necessarily advisable to disable it at this time. It does appear at first glance that running e2fsck after removing the dirdata feature would handle this correctly and clear the extra dirdata flag in each dirent, but we haven't tested this at all, and it would also cause the MDS to become considerably slower. So far I don't see any indication besides the mixup with ".." entries that there is anything seriously wrong with these directories. The bytes at the start of the directory are used for ".", "..", and the htree index on directories over 4kB in size, and not user data. e2fsck should regenerate all of the needed information from redundant information elsewhere, except being able to move the entry from lost+found back to the proper place in the tree.

            It has been a very long time since we have ran e2fsck and that was during the 1.8.x code. We have never ran e2fsck since moving to 2.1.

            Should we remove the dirdata options?

            I will check the date and size of the directories. We may want to just archive these and restore them after the fsck or tar/delete/untar them.

            mhanafi Mahmoud Hanafi added a comment - It has been a very long time since we have ran e2fsck and that was during the 1.8.x code. We have never ran e2fsck since moving to 2.1. Should we remove the dirdata options? I will check the date and size of the directories. We may want to just archive these and restore them after the fsck or tar/delete/untar them.

            Looking at the test e2fsck log, one new directory is getting yet a different error related to the "." entry:

            Directory entry for '.' in /ROOT/msekula/fun/camrad (13208388) is big.
            Split? yes
            Missing '..' in directory inode 13208388.
            Fix? yes
            Setting filetype for entry '..' in /ROOT/msekula/fun/camrad (13208388) to 2.
            Entry '..' in /ROOT/msekula/fun/camrad (13208388) is duplicate '..' entry.
            Fix? yes
            

            I suspect that there is some code in e2fsck or in ldiskfs that is not handling the dirdata field correctly. It likely relates to LU-2638. There are several files moved to lost+found as I suspected, but it looks like the majority of symlinks are fine.

            It doesn't seem that a large number of directories will be repaired, so I think it makes sense to go ahead and fix the real MDT at this point. The only other thing you might want to check before doing the final is if you run "e2fsck -fy" on the snapshot a second time that it passes cleanly without any repairs. About 30 directories will be moved to lost+found, but they can be moved back to their correct location, and nothing should be lost.

            The next question to figure out what has caused this problem. When did you upgrade to 2.1? Were these directories existing before the upgrade from 1.8, or were they created afterward? How large are the directories (number of entries = "find ${directory} -print | wc -l", size of directory = "ls -ld ${directory}")? Do you know if the directories where renamed after they were created? How long has it been since you last ran e2fsck? Have you run it since the upgrade?

            adilger Andreas Dilger added a comment - Looking at the test e2fsck log, one new directory is getting yet a different error related to the "." entry: Directory entry for '.' in /ROOT/msekula/fun/camrad (13208388) is big. Split? yes Missing '..' in directory inode 13208388. Fix? yes Setting filetype for entry '..' in /ROOT/msekula/fun/camrad (13208388) to 2. Entry '..' in /ROOT/msekula/fun/camrad (13208388) is duplicate '..' entry. Fix? yes I suspect that there is some code in e2fsck or in ldiskfs that is not handling the dirdata field correctly. It likely relates to LU-2638 . There are several files moved to lost+found as I suspected, but it looks like the majority of symlinks are fine. It doesn't seem that a large number of directories will be repaired, so I think it makes sense to go ahead and fix the real MDT at this point. The only other thing you might want to check before doing the final is if you run " e2fsck -fy " on the snapshot a second time that it passes cleanly without any repairs. About 30 directories will be moved to lost+found, but they can be moved back to their correct location, and nothing should be lost. The next question to figure out what has caused this problem. When did you upgrade to 2.1? Were these directories existing before the upgrade from 1.8, or were they created afterward? How large are the directories (number of entries = "find ${directory} -print | wc -l", size of directory = "ls -ld ${directory}")? Do you know if the directories where renamed after they were created? How long has it been since you last ran e2fsck? Have you run it since the upgrade?

            Uploading the fsck ran on the snap. Please review before we run on the real mdt device.

            mhanafi Mahmoud Hanafi added a comment - Uploading the fsck ran on the snap. Please review before we run on the real mdt device.

            We did not use the xyratex upgrade tool. But we added that dirdata option at some point. Should we remove that option?

            mhanafi Mahmoud Hanafi added a comment - We did not use the xyratex upgrade tool. But we added that dirdata option at some point. Should we remove that option?

            I also see in your MDT feature list that there is the "dirdata" feature enabled, but this is definitely NOT a feature that would have been enabled with a filesystem formatted with 1.8. Also, the ".." corruption is definitely not random.

            Did you perhaps run the Xyratex "upgrade" tool on the MDT filesystem?

            I believe that this would be the root cause of the ".." corruption. My understanding is that it was deleting the ".." entry to add the FID, and then re-inserting it into the directory, but ext4/e2fsck require that the ".." entry immediately follow the "." entry at the start.

            adilger Andreas Dilger added a comment - I also see in your MDT feature list that there is the "dirdata" feature enabled, but this is definitely NOT a feature that would have been enabled with a filesystem formatted with 1.8. Also, the ".." corruption is definitely not random. Did you perhaps run the Xyratex "upgrade" tool on the MDT filesystem? I believe that this would be the root cause of the ".." corruption. My understanding is that it was deleting the ".." entry to add the FID, and then re-inserting it into the directory, but ext4/e2fsck require that the ".." entry immediately follow the "." entry at the start.

            People

              cliffw Cliff White (Inactive)
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: