Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16610

ldiskfs_find_dest_de bad entry in directory when running io500 test

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • openEuler 22.03 kernel: 5.10.0-60.79.0.103.oe2203.aarch64
    • 3
    • 9223372036854775807

    Description

      Directory corruption when running io500 test on openEuler 22.03:

      Client side log 

      [openeuler@oe2203-test io500]$ sudo /io500.sh config-minimal.ini 
      IO500 version io500-sc22_v2 (standard)
      [RESULT]       ior-easy-write        0.105593 GiB/s : time 338.211 seconds
      ERROR: open64("/mnt/lustre/datafiles/2023.02.14-10.12.17/mdtest-easy/test-dir.0-0/mdtest_tree.0.0/file.mdtest.1.85", 66, 0664) failed. Error: Read-only file system, (aiori-POSIX.c:569)
      --------------------------------------------------------------------------
      MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
      with errorcode -1.

        
      Server side log 

      [ 9962.007724] LDISKFS-fs error (device dm-0): ldiskfs_find_dest_de:2412: inode #5767170: block 3771253: comm mdt00_000: bad entry in directory: rec_len is smaller than minimal - offset=0, inode=0, rec_len=8, name_len=0, size=4096
      [ 9962.051171] Aborting journal on device dm-0-8.
      [ 9962.058456] LDISKFS-fs (dm-0): Remounting filesystem read-only
      [ 9962.059877] LDISKFS-fs error (device dm-0) in iam_txn_add:547: Journal has aborted
      [ 9962.064365] LustreError: 11366:0:(osd_io.c:2222:osd_ldiskfs_write_record()) journal_get_write_access() returned error -30
      [ 9962.066805] LustreError: 11366:0:(llog_cat.c:592:llog_cat_add_rec()) llog_write_rec -30: lh=00000000c04e4ff3
      [ 9962.069137] LustreError: 11366:0:(tgt_lastrcvd.c:1326:tgt_add_reply_data()) lustre-MDT0000: can't update reply_data file: rc = -30
      [ 9962.071742] LustreError: 11366:0:(osd_handler.c:2089:osd_trans_stop()) lustre-MDT0000: failed in transaction hook: rc = -30
      [ 9962.074184] LustreError: 11366:0:(osd_handler.c:2099:osd_trans_stop()) lustre-MDT0000: failed to stop transaction: rc = -30
      [ 9962.074274] LustreError: 11348:0:(osd_handler.c:1789:osd_trans_commit_cb()) transaction @0x00000000c73ec34c commit error: 2

       

      Attachments

        Issue Links

          Activity

            [LU-16610] ldiskfs_find_dest_de bad entry in directory when running io500 test
            dongyang Dongyang Li added a comment -

            I was just going to add that after going through the ext4-pdirop.patch again,
            The rhel9.1 version does have the code block introduced by ext4-pdirop.patch before the block
            from 3ba733f879c2 ext4: avoid cycles in directory h-tree, and previous comments mentioned that rhel9.1
            kernel doesn't have the problem.
            so the ordering of the code blocks is irrelevant, the real fix is as Xinliang suggested the move of "de2 = dx_move_dirents(...)"

            Having said that, could still be a good idea to check cycles in tree blocks before we do anything?

            dongyang Dongyang Li added a comment - I was just going to add that after going through the ext4-pdirop.patch again, The rhel9.1 version does have the code block introduced by ext4-pdirop.patch before the block from 3ba733f879c2 ext4: avoid cycles in directory h-tree, and previous comments mentioned that rhel9.1 kernel doesn't have the problem. so the ordering of the code blocks is irrelevant, the real fix is as Xinliang suggested the move of "de2 = dx_move_dirents(...)" Having said that, could still be a good idea to check cycles in tree blocks before we do anything?
            xinliang Xinliang Liu added a comment -

            Hi Dongyang and Andreas, this issue should be oe2203 only.

            Because when I updated the patch, I found that the root cause is that the call of "de2 = dx_move_dirents(...)" should be removed there:

            https://review.whamcloud.com/c/fs/lustre-release/+/50192/2/ldiskfs/kernel_patches/patches/oe2203/ext4-pdirop.patch#b757

            Anyway, I made a mistake when fixing the patch conflict.

            xinliang Xinliang Liu added a comment - Hi Dongyang and Andreas, this issue should be oe2203 only. Because when I updated the patch, I found that the root cause is that the call of "de2 = dx_move_dirents(...)" should be removed there: https://review.whamcloud.com/c/fs/lustre-release/+/50192/2/ldiskfs/kernel_patches/patches/oe2203/ext4-pdirop.patch#b757 Anyway, I made a mistake when fixing the patch conflict.

            Dongyang, definitely yes, this should be fixed on all the major distros (and others if it is easily done). I thought the oe2203 patch was only catching up that series with some fix that had been landed on el9.x already.

            adilger Andreas Dilger added a comment - Dongyang, definitely yes, this should be fixed on all the major distros (and others if it is easily done). I thought the oe2203 patch was only catching up that series with some fix that had been landed on el9.x already.
            dongyang Dongyang Li added a comment -

            rhel9.1 kernel doesn't have 3ba733f879c2 ext4: avoid cycles in directory h-tree
            and looks like patch 50192 only fixes ext4-pdirop.patch for oe2203.

            adilgerdo we need the fix for other ldiskfs series? e.g. rhel9.2/ext4-pdirop.patch and linux-6.0/ext4-pdirop.patch

            dongyang Dongyang Li added a comment - rhel9.1 kernel doesn't have 3ba733f879c2 ext4: avoid cycles in directory h-tree and looks like patch 50192 only fixes ext4-pdirop.patch for oe2203. adilger do we need the fix for other ldiskfs series? e.g. rhel9.2/ext4-pdirop.patch and linux-6.0/ext4-pdirop.patch
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50192/
            Subject: LU-16610 ldiskfs: fix directory corruption on openeuler 22.03
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 85b76aa91a3999a325a9ef970f0cc8b6dd1cdda7

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50192/ Subject: LU-16610 ldiskfs: fix directory corruption on openeuler 22.03 Project: fs/lustre-release Branch: master Current Patch Set: Commit: 85b76aa91a3999a325a9ef970f0cc8b6dd1cdda7

            "xinliang <xinliang.liu@linaro.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50192
            Subject: LU-16610 ldiskfs: fix directory corruption on openeuler 22.03
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 9280d8c880e534629df89b7d40e531297e701c99

            gerrit Gerrit Updater added a comment - "xinliang <xinliang.liu@linaro.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50192 Subject: LU-16610 ldiskfs: fix directory corruption on openeuler 22.03 Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 9280d8c880e534629df89b7d40e531297e701c99
            xinliang Xinliang Liu added a comment -

            Andreas, sure. Working on it.

            xinliang Xinliang Liu added a comment - Andreas, sure. Working on it.

            Xinliang, since this is a bug in the ldiskfs patch series for that kernel version, can you please submit a patch to update that series with the fix.

            adilger Andreas Dilger added a comment - Xinliang, since this is a bug in the ldiskfs patch series for that kernel version, can you please submit a patch to update that series with the fix.
            xinliang Xinliang Liu added a comment -

            Verified that rhel9.1 kernel kernel-5.14.0-162.12.1.el9_1 has no this issue.

            xinliang Xinliang Liu added a comment - Verified that rhel9.1 kernel kernel-5.14.0-162.12.1.el9_1 has no this issue.

            People

              xinliang Xinliang Liu
              xinliang Xinliang Liu
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: