Details

    • Bug
    • Resolution: Fixed
    • Critical
    • None
    • Lustre 2.1.3
    • None
    • 3
    • 6149

    Description

      Doing an ls gives the following error
      ls: reading directory d4_stats/: Input/output error

      client error:
      [5237686.818045] LustreError: 77522:0:(dir.c:648:ll_readdir()) error reading dir [0x4488b6ced74:0x1edb5:0x0] at 0: rc -5
      [5237686.849844] LustreError: 77522:0:(dir.c:648:ll_readdir()) Skipped 51 previous similar messages

      MDT Error:
      Jan 16 11:18:37 nbp1-mds kernel: Lustre: 15390:0:(mdd_object.c:2412:__mdd_readpage()) build page failed: -5!

      Please advise on debug flags to use to gather logs.

      Attachments

        1. fsck.2.8.2012.nbp1.out.gz
          1.63 MB
        2. mdtsnap.fsck.out.gz
          1.10 MB
        3. nbp1FSCK.out.gz
          4.56 MB

        Issue Links

          Activity

            [LU-2627] /bin/ls gets Input/output error
            pjones Peter Jones added a comment -

            As per NASA ok to close ticket

            pjones Peter Jones added a comment - As per NASA ok to close ticket

            There is nothing new in the fsck output compared to last time. I think you should go ahead and run fsck.

            johann Johann Lombardi (Inactive) added a comment - There is nothing new in the fsck output compared to last time. I think you should go ahead and run fsck.

            uploading fsck output for review before we run it for real.

            mhanafi Mahmoud Hanafi added a comment - uploading fsck output for review before we run it for real.

            This problem will persist for large 1.8 directories that are renamed until a version of the LU-2638 patch http://review.whamcloud.com/5179 is applied. For the short term, until this patch is applied, it is possible to disable the dirdata feature on the unmounted MDT filesystem:

            tune2fs -O dirdata /dev/mdtdev
            

            though this will have some negative performance impact for all newly-created files when doing name lookups and "ls -l".

            adilger Andreas Dilger added a comment - This problem will persist for large 1.8 directories that are renamed until a version of the LU-2638 patch http://review.whamcloud.com/5179 is applied. For the short term, until this patch is applied, it is possible to disable the dirdata feature on the unmounted MDT filesystem: tune2fs -O dirdata /dev/mdtdev though this will have some negative performance impact for all newly-created files when doing name lookups and "ls -l".

            We seem to have hit this issue again on the same filesystem.

            pfe1 ~ # ls -l /nobackupp1/xmeng/run_sc_anisopi/run06_dipole_semiimpl_nohyp_taug
            r_60000ss/SC
            ls: reading directory /nobackupp1/xmeng/run_sc_anisopi/run06_dipole_semiimpl_noh
            yp_taugr_60000ss/SC: Input/output error
            total 0

            from the mdt
            Feb 8 06:50:58 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Unrecognised inode hash code 18 for directory #17309149
            Feb 8 06:50:58 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Corrupt dir inode 17309149, running e2fsck is recommended.
            Feb 8 06:51:57 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Unrecognised inode hash code 8 for directory #17309159
            Feb 8 06:51:57 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Corrupt dir inode 17309159, running e2fsck is recommended.
            Feb 8 08:35:12 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Unrecognised inode hash code 15 for directory #130557236
            Feb 8 08:35:12 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Corrupt dir inode 130557236, running e2fsck is recommended.
            Feb 8 11:45:38 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Unrecognised inode hash code 3 for directory #157287952
            Feb 8 11:45:39 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Corrupt dir inode 157287952, running e2fsck is recommended.
            Feb 8 11:46:07 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Unrecognised inode hash code 4 for directory #157331367
            Feb 8 11:46:07 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Corrupt dir inode 157331367, running e2fsck is recommended.

            mhanafi Mahmoud Hanafi added a comment - We seem to have hit this issue again on the same filesystem. pfe1 ~ # ls -l /nobackupp1/xmeng/run_sc_anisopi/run06_dipole_semiimpl_nohyp_taug r_60000ss/SC ls: reading directory /nobackupp1/xmeng/run_sc_anisopi/run06_dipole_semiimpl_noh yp_taugr_60000ss/SC: Input/output error total 0 from the mdt Feb 8 06:50:58 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Unrecognised inode hash code 18 for directory #17309149 Feb 8 06:50:58 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Corrupt dir inode 17309149, running e2fsck is recommended. Feb 8 06:51:57 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Unrecognised inode hash code 8 for directory #17309159 Feb 8 06:51:57 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Corrupt dir inode 17309159, running e2fsck is recommended. Feb 8 08:35:12 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Unrecognised inode hash code 15 for directory #130557236 Feb 8 08:35:12 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Corrupt dir inode 130557236, running e2fsck is recommended. Feb 8 11:45:38 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Unrecognised inode hash code 3 for directory #157287952 Feb 8 11:45:39 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Corrupt dir inode 157287952, running e2fsck is recommended. Feb 8 11:46:07 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Unrecognised inode hash code 4 for directory #157331367 Feb 8 11:46:07 nbp1-mds kernel: LDISKFS-fs warning (device dm-4): dx_probe: Corrupt dir inode 157331367, running e2fsck is recommended.

            Is the issue closed, or is there some other help we can give you?

            cliffw Cliff White (Inactive) added a comment - Is the issue closed, or is there some other help we can give you?

            At this point we have been able to run fsck on the mdt and have recovered from the errors.

            mhanafi Mahmoud Hanafi added a comment - At this point we have been able to run fsck on the mdt and have recovered from the errors.

            What is your current state? What help can we give you?

            cliffw Cliff White (Inactive) added a comment - What is your current state? What help can we give you?

            The "dirdata" option is enabled by default for 2.x filesystems, but I don't think it is necessarily advisable to disable it at this time. It does appear at first glance that running e2fsck after removing the dirdata feature would handle this correctly and clear the extra dirdata flag in each dirent, but we haven't tested this at all, and it would also cause the MDS to become considerably slower.

            So far I don't see any indication besides the mixup with ".." entries that there is anything seriously wrong with these directories. The bytes at the start of the directory are used for ".", "..", and the htree index on directories over 4kB in size, and not user data. e2fsck should regenerate all of the needed information from redundant information elsewhere, except being able to move the entry from lost+found back to the proper place in the tree.

            adilger Andreas Dilger added a comment - The "dirdata" option is enabled by default for 2.x filesystems, but I don't think it is necessarily advisable to disable it at this time. It does appear at first glance that running e2fsck after removing the dirdata feature would handle this correctly and clear the extra dirdata flag in each dirent, but we haven't tested this at all, and it would also cause the MDS to become considerably slower. So far I don't see any indication besides the mixup with ".." entries that there is anything seriously wrong with these directories. The bytes at the start of the directory are used for ".", "..", and the htree index on directories over 4kB in size, and not user data. e2fsck should regenerate all of the needed information from redundant information elsewhere, except being able to move the entry from lost+found back to the proper place in the tree.

            It has been a very long time since we have ran e2fsck and that was during the 1.8.x code. We have never ran e2fsck since moving to 2.1.

            Should we remove the dirdata options?

            I will check the date and size of the directories. We may want to just archive these and restore them after the fsck or tar/delete/untar them.

            mhanafi Mahmoud Hanafi added a comment - It has been a very long time since we have ran e2fsck and that was during the 1.8.x code. We have never ran e2fsck since moving to 2.1. Should we remove the dirdata options? I will check the date and size of the directories. We may want to just archive these and restore them after the fsck or tar/delete/untar them.

            People

              cliffw Cliff White (Inactive)
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: