Details

    • Bug
    • Resolution: Fixed
    • Critical
    • None
    • Lustre 2.1.3
    • None
    • 3
    • 6149

    Description

      Doing an ls gives the following error
      ls: reading directory d4_stats/: Input/output error

      client error:
      [5237686.818045] LustreError: 77522:0:(dir.c:648:ll_readdir()) error reading dir [0x4488b6ced74:0x1edb5:0x0] at 0: rc -5
      [5237686.849844] LustreError: 77522:0:(dir.c:648:ll_readdir()) Skipped 51 previous similar messages

      MDT Error:
      Jan 16 11:18:37 nbp1-mds kernel: Lustre: 15390:0:(mdd_object.c:2412:__mdd_readpage()) build page failed: -5!

      Please advise on debug flags to use to gather logs.

      Attachments

        1. fsck.2.8.2012.nbp1.out.gz
          1.63 MB
        2. mdtsnap.fsck.out.gz
          1.10 MB
        3. nbp1FSCK.out.gz
          4.56 MB

        Issue Links

          Activity

            [LU-2627] /bin/ls gets Input/output error

            We did not use the xyratex upgrade tool. But we added that dirdata option at some point. Should we remove that option?

            mhanafi Mahmoud Hanafi added a comment - We did not use the xyratex upgrade tool. But we added that dirdata option at some point. Should we remove that option?

            I also see in your MDT feature list that there is the "dirdata" feature enabled, but this is definitely NOT a feature that would have been enabled with a filesystem formatted with 1.8. Also, the ".." corruption is definitely not random.

            Did you perhaps run the Xyratex "upgrade" tool on the MDT filesystem?

            I believe that this would be the root cause of the ".." corruption. My understanding is that it was deleting the ".." entry to add the FID, and then re-inserting it into the directory, but ext4/e2fsck require that the ".." entry immediately follow the "." entry at the start.

            adilger Andreas Dilger added a comment - I also see in your MDT feature list that there is the "dirdata" feature enabled, but this is definitely NOT a feature that would have been enabled with a filesystem formatted with 1.8. Also, the ".." corruption is definitely not random. Did you perhaps run the Xyratex "upgrade" tool on the MDT filesystem? I believe that this would be the root cause of the ".." corruption. My understanding is that it was deleting the ".." entry to add the FID, and then re-inserting it into the directory, but ext4/e2fsck require that the ".." entry immediately follow the "." entry at the start.

            Looking at the e2fsck code, it appears that it will correctly remove just the EXTENT_FL flag, rather than clear the whole inode:

                            if (extent_fs && (inode->i_flags & EXT4_EXTENTS_FL) &&
                                LINUX_S_ISLNK(inode->i_mode) &&
                                !ext2fs_inode_has_valid_blocks2(fs, inode) &&
                                fix_problem(ctx, PR_1_FAST_SYMLINK_EXTENT_FL, &pctx)) {
                                    inode->i_flags &= ~EXT4_EXTENTS_FL;
                                    e2fsck_write_inode(ctx, ino, inode, "pass1");
                            }
            

            so the only confusion is that the PR_1_FAST_SYMLINK_EXTENT_FL problem code is asking "Clear", which might be confusing to some (including myself) as asking whether the inode should be cleared instead of the flag being cleared. I will submit a patch to fix this.

            The later errors:

            Symlink /ROOT/pheimbac/ecco/2013-01-seaice-adjoint/MITgcm_latest/mysetups/arctic210x192x50/build_forw/timeave_cumulate.F (inode #68169598) is invalid.
            Clear? no
            Symlink /ROOT/pheimbac/ecco/2013-01-seaice-adjoint/MITgcm_latest/mysetups/arctic210x192x50/build_forw/cal_compdates.F (inode #68169136) is invalid.
            Clear? no
            

            should not be hit if the earlier checks to clear EXT4_EXTENT_FL had been allowed to clear this flag from the short symlinks.

            There are some further errors, much later in the log. There are ~20 of the following errors in Pass 2:

            Pass 2: Checking directory structure
            Second entry 'IE_t040101_000000.log' (inode=18364943) in directory inode 1837308
            5 should be '..'
            Fix? no
            Entry '..' in /ROOT/xjia/Saturn/run_IdealizedSW_notilt_1e275_newgrid2_highorder/
            RESULTS/run_all/IE (18373085) is duplicate '..' entry.
            Fix? no
            Entry '..' in /ROOT/xjia/Saturn/run_IdealizedSW_notilt_1e275_newgrid2_highorder/
            RESULTS/run_all/IE (18373085) is duplicate '..' entry.
            Fix? no
            Entry '..' in /ROOT/xjia/Saturn/run_IdealizedSW_notilt_1e275_newgrid2_highorder/
            RESULTS/run_all/IE (18373085) is a link to directory /ROOT/xjia/Saturn/run_Ideal
            izedSW_notilt_1e275_newgrid2_highorder/RESULTS/run_all (13221653).
            Clear? no
            

            that appear a bit unusual, but are not fatally broken. There are ~20 matching errors for the unfixed ".." entries later in Pass 3:

            Pass 3: Checking directory connectivity
            '..' in /ROOT/xjia/Saturn/run_IdealizedSW_notilt_1e275_newgrid2_highorder/RESULTS/run_all/IE (18373085) is <The NULL inode> (0), should be /ROOT/xjia/Saturn/run_IdealizedSW_notilt_1e275_newgrid2_highorder/RESULTS/run_all (13221653).
            Fix? no
            

            and a few minor errors in Pass 3A:

            Pass 3A: Optimizing directories
            Duplicate entry 'c_t_f.x' in /ROOT/aiannett/NCC/Testing/Back-Face-Step (77623393) found.  Clear? no
            Entry 'c_t_f.x' in /ROOT/aiannett/NCC/Testing/Back-Face-Step (77623393) has a non-unique filename.
            Rename to c_t_f.~0? no
            Duplicate entry 'b1b2b3.x' in /ROOT/aiannett/NCC/Testing/Back-Face-Step (77623393) found.  Clear? no
            Entry 'b1b2b3.x' in /ROOT/aiannett/NCC/Testing/Back-Face-Step (77623393) has a non-unique filename.
            Rename to b1b2b3~0? no
            

            It appears that the entries that would be "fixed" in Pass 2 will likely appear in lost+found once they are fixed, and if you want to recover those files you could mount the MDT locally with mount -t ldiskfs and rename them from .../lost+found/#inode to the path given for each inode number.

            I think you could go ahead with running e2fsck -fy on the snapshot, mount the snapshot MDT filesystem locally as ldiskfs to verify a handful of the symlinks are still intact, and check lost+found for the ~20 or so inodes that would need to be fixed (you could even write a short script to rename them if downtime is critical). If that works OK, then when you take the real MDT filesystem offline for repair, please make another snapshot at that time, run the e2fsck -fy on the real MDT, mount as ldiskfs and repair the files in lost+found before unmounting and remounting it again as lustre.

            In order to get the number of messages in the e2fsck log to a manageable number, I filtered out all of the duplicate messages:

            egrep -v "^$|^Fast symlink .* EXTENT_FL|^Inode .* missing NUL terminator|^Clear" e2fsck.log > e2fsck-filtered.log
            

            I had also filtered out "^Symlink.*is invalid" messages, but I don't think you should hit them during the repairing e2fsck run.

            adilger Andreas Dilger added a comment - Looking at the e2fsck code, it appears that it will correctly remove just the EXTENT_FL flag, rather than clear the whole inode: if (extent_fs && (inode->i_flags & EXT4_EXTENTS_FL) && LINUX_S_ISLNK(inode->i_mode) && !ext2fs_inode_has_valid_blocks2(fs, inode) && fix_problem(ctx, PR_1_FAST_SYMLINK_EXTENT_FL, &pctx)) { inode->i_flags &= ~EXT4_EXTENTS_FL; e2fsck_write_inode(ctx, ino, inode, "pass1" ); } so the only confusion is that the PR_1_FAST_SYMLINK_EXTENT_FL problem code is asking "Clear", which might be confusing to some (including myself) as asking whether the inode should be cleared instead of the flag being cleared. I will submit a patch to fix this. The later errors: Symlink /ROOT/pheimbac/ecco/2013-01-seaice-adjoint/MITgcm_latest/mysetups/arctic210x192x50/build_forw/timeave_cumulate.F (inode #68169598) is invalid. Clear? no Symlink /ROOT/pheimbac/ecco/2013-01-seaice-adjoint/MITgcm_latest/mysetups/arctic210x192x50/build_forw/cal_compdates.F (inode #68169136) is invalid. Clear? no should not be hit if the earlier checks to clear EXT4_EXTENT_FL had been allowed to clear this flag from the short symlinks. There are some further errors, much later in the log. There are ~20 of the following errors in Pass 2: Pass 2: Checking directory structure Second entry 'IE_t040101_000000.log' (inode=18364943) in directory inode 1837308 5 should be '..' Fix? no Entry '..' in /ROOT/xjia/Saturn/run_IdealizedSW_notilt_1e275_newgrid2_highorder/ RESULTS/run_all/IE (18373085) is duplicate '..' entry. Fix? no Entry '..' in /ROOT/xjia/Saturn/run_IdealizedSW_notilt_1e275_newgrid2_highorder/ RESULTS/run_all/IE (18373085) is duplicate '..' entry. Fix? no Entry '..' in /ROOT/xjia/Saturn/run_IdealizedSW_notilt_1e275_newgrid2_highorder/ RESULTS/run_all/IE (18373085) is a link to directory /ROOT/xjia/Saturn/run_Ideal izedSW_notilt_1e275_newgrid2_highorder/RESULTS/run_all (13221653). Clear? no that appear a bit unusual, but are not fatally broken. There are ~20 matching errors for the unfixed ".." entries later in Pass 3: Pass 3: Checking directory connectivity '..' in /ROOT/xjia/Saturn/run_IdealizedSW_notilt_1e275_newgrid2_highorder/RESULTS/run_all/IE (18373085) is <The NULL inode> (0), should be /ROOT/xjia/Saturn/run_IdealizedSW_notilt_1e275_newgrid2_highorder/RESULTS/run_all (13221653). Fix? no and a few minor errors in Pass 3A: Pass 3A: Optimizing directories Duplicate entry 'c_t_f.x' in /ROOT/aiannett/NCC/Testing/Back-Face-Step (77623393) found. Clear? no Entry 'c_t_f.x' in /ROOT/aiannett/NCC/Testing/Back-Face-Step (77623393) has a non-unique filename. Rename to c_t_f.~0? no Duplicate entry 'b1b2b3.x' in /ROOT/aiannett/NCC/Testing/Back-Face-Step (77623393) found. Clear? no Entry 'b1b2b3.x' in /ROOT/aiannett/NCC/Testing/Back-Face-Step (77623393) has a non-unique filename. Rename to b1b2b3~0? no It appears that the entries that would be "fixed" in Pass 2 will likely appear in lost+found once they are fixed, and if you want to recover those files you could mount the MDT locally with mount -t ldiskfs and rename them from .../lost+found/#inode to the path given for each inode number. I think you could go ahead with running e2fsck -fy on the snapshot, mount the snapshot MDT filesystem locally as ldiskfs to verify a handful of the symlinks are still intact, and check lost+found for the ~20 or so inodes that would need to be fixed (you could even write a short script to rename them if downtime is critical). If that works OK, then when you take the real MDT filesystem offline for repair, please make another snapshot at that time, run the e2fsck -fy on the real MDT, mount as ldiskfs and repair the files in lost+found before unmounting and remounting it again as lustre. In order to get the number of messages in the e2fsck log to a manageable number, I filtered out all of the duplicate messages: egrep -v "^$|^Fast symlink .* EXTENT_FL|^Inode .* missing NUL terminator|^Clear" e2fsck.log > e2fsck-filtered.log I had also filtered out " ^Symlink.*is invalid " messages, but I don't think you should hit them during the repairing e2fsck run.

            This was a 1.8.x filesystem that was upgraded. So I think the extent option is leftover from the 1.8.x format.

            mhanafi Mahmoud Hanafi added a comment - This was a 1.8.x filesystem that was upgraded. So I think the extent option is leftover from the 1.8.x format.

            Bobijam, I think that the problem is with e2fsck rejecting short symlinks with the EXT4_EXTENTS_FL set. The LU-1540 NUL termination problem appears that it would be fixed correctly with the current e2fsck. This EXT4_EXTENTS_FL appears to be a bug in the osd-ldiskfs code, if "extents" is enabled, for which I've filed LU-2634. Since we never format the MDT with "extents", we have never seen such a problem in our testing.

            Inode 9482890 symlink missing NUL terminator.  Fix? no
            Inode 9482897 symlink missing NUL terminator.  Fix? no
            Fast symlink 9482914 has EXTENT_FL set.  Clear? no
            Fast symlink 9482917 has EXTENT_FL set.  Clear? no
            Fast symlink 9482921 has EXTENT_FL set.  Clear? no
            

            It makes sense to change e2fsck to accept such inodes and just clear the EXT4_EXTENTS_FL instead of considering it corrupted. That will allow recovering the filesystem without the need to restore the symlinks (which would just get EXT4_EXTENTS_FL set again, until LU-2634 is fixed).

            adilger Andreas Dilger added a comment - Bobijam, I think that the problem is with e2fsck rejecting short symlinks with the EXT4_EXTENTS_FL set. The LU-1540 NUL termination problem appears that it would be fixed correctly with the current e2fsck. This EXT4_EXTENTS_FL appears to be a bug in the osd-ldiskfs code, if "extents" is enabled, for which I've filed LU-2634 . Since we never format the MDT with "extents", we have never seen such a problem in our testing. Inode 9482890 symlink missing NUL terminator. Fix? no Inode 9482897 symlink missing NUL terminator. Fix? no Fast symlink 9482914 has EXTENT_FL set. Clear? no Fast symlink 9482917 has EXTENT_FL set. Clear? no Fast symlink 9482921 has EXTENT_FL set. Clear? no It makes sense to change e2fsck to accept such inodes and just clear the EXT4_EXTENTS_FL instead of considering it corrupted. That will allow recovering the filesystem without the need to restore the symlinks (which would just get EXT4_EXTENTS_FL set again, until LU-2634 is fixed).

            Filed LU-2634 for tracking issue with EXT4_EXTENTS_FL set on symlinks for MDT with "extents" feature enabled.

            adilger Andreas Dilger added a comment - Filed LU-2634 for tracking issue with EXT4_EXTENTS_FL set on symlinks for MDT with "extents" feature enabled.

            file is uploaded

            mhanafi Mahmoud Hanafi added a comment - file is uploaded
            bobijam Zhenyu Xu added a comment - - edited

            please compress and upload fck.out.

            I want to check whether those invalid symlink file are those long symlink which miss NUL terminator. Something like

            an example
            Pass 1: Checking inodes, blocks, and sizes
            Inode 121351 symlink missing NUL terminator.  Fix? no
            ...
            ...
            Pass 2: Checking directory structure
            Symlink /path/to/long/symlink/file (inode #121351) is invalid.		
            Clear? no
            ...
            

            If it's this case, latest e2fsck should be capable of fixing them. (like LU-1540 indicates)

            bobijam Zhenyu Xu added a comment - - edited please compress and upload fck.out. I want to check whether those invalid symlink file are those long symlink which miss NUL terminator. Something like an example Pass 1: Checking inodes, blocks, and sizes Inode 121351 symlink missing NUL terminator. Fix? no ... ... Pass 2: Checking directory structure Symlink /path/to/long/symlink/file (inode #121351) is invalid. Clear? no ... If it's this case, latest e2fsck should be capable of fixing them. (like LU-1540 indicates)

            We are not certain that the symlinks would be deleted, in a case such as this it is always desirable to have a backup, if possible.

            cliffw Cliff White (Inactive) added a comment - We are not certain that the symlinks would be deleted, in a case such as this it is always desirable to have a backup, if possible.

            here is the summary of the test fsck.

            nbp1-MDT0000: ********** WARNING: Filesystem still has errors **********

            63829418 inodes used (23.78%, out of 268435456)
            3242 non-contiguous files (0.0%)
            35652 non-contiguous directories (0.1%)

            1. of inodes with ind/dind/tind blocks: 0/0/0
              Extent depth histogram: 63096846/16544/13
              43499835 blocks used (16.20%, out of 268435456)
              0 bad blocks
              21886 large files

            62047854 regular files
            773172 directories
            0 character device files
            0 block device files
            20 fifos
            6652 links
            612271 symbolic links (349798 fast symbolic links)
            65 sockets
            ------------
            63836054 files

            I can upload the full upload of the output.

            [root@pladmin4:~/mhanafi]$ grep invalid fck.out | wc -l
            396020

            mhanafi Mahmoud Hanafi added a comment - here is the summary of the test fsck. nbp1-MDT0000: ********** WARNING: Filesystem still has errors ********** 63829418 inodes used (23.78%, out of 268435456) 3242 non-contiguous files (0.0%) 35652 non-contiguous directories (0.1%) of inodes with ind/dind/tind blocks: 0/0/0 Extent depth histogram: 63096846/16544/13 43499835 blocks used (16.20%, out of 268435456) 0 bad blocks 21886 large files 62047854 regular files 773172 directories 0 character device files 0 block device files 20 fifos 6652 links 612271 symbolic links (349798 fast symbolic links) 65 sockets ------------ 63836054 files I can upload the full upload of the output. [root@pladmin4:~/mhanafi] $ grep invalid fck.out | wc -l 396020

            It is not clear to me why it is removing all the symlunks. Is it because of the extent option? How would we restore the symlinks from the dd backup?

            mhanafi Mahmoud Hanafi added a comment - It is not clear to me why it is removing all the symlunks. Is it because of the extent option? How would we restore the symlinks from the dd backup?

            People

              cliffw Cliff White (Inactive)
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: