[LU-16610] ldiskfs_find_dest_de bad entry in directory when running io500 test Created: 02/Mar/23  Updated: 22/Mar/23  Resolved: 22/Mar/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Minor
Reporter: Xinliang Liu Assignee: Xinliang Liu
Resolution: Fixed Votes: 0
Labels: None
Environment:

openEuler 22.03 kernel: 5.10.0-60.79.0.103.oe2203.aarch64


Issue Links:
Related
is related to LU-12268 LDISKFS-fs error: ldiskfs_find_dest_d... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Directory corruption when running io500 test on openEuler 22.03:

Client side log 

[openeuler@oe2203-test io500]$ sudo /io500.sh config-minimal.ini 
IO500 version io500-sc22_v2 (standard)
[RESULT]       ior-easy-write        0.105593 GiB/s : time 338.211 seconds
ERROR: open64("/mnt/lustre/datafiles/2023.02.14-10.12.17/mdtest-easy/test-dir.0-0/mdtest_tree.0.0/file.mdtest.1.85", 66, 0664) failed. Error: Read-only file system, (aiori-POSIX.c:569)
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
with errorcode -1.

  
Server side log 

[ 9962.007724] LDISKFS-fs error (device dm-0): ldiskfs_find_dest_de:2412: inode #5767170: block 3771253: comm mdt00_000: bad entry in directory: rec_len is smaller than minimal - offset=0, inode=0, rec_len=8, name_len=0, size=4096
[ 9962.051171] Aborting journal on device dm-0-8.
[ 9962.058456] LDISKFS-fs (dm-0): Remounting filesystem read-only
[ 9962.059877] LDISKFS-fs error (device dm-0) in iam_txn_add:547: Journal has aborted
[ 9962.064365] LustreError: 11366:0:(osd_io.c:2222:osd_ldiskfs_write_record()) journal_get_write_access() returned error -30
[ 9962.066805] LustreError: 11366:0:(llog_cat.c:592:llog_cat_add_rec()) llog_write_rec -30: lh=00000000c04e4ff3
[ 9962.069137] LustreError: 11366:0:(tgt_lastrcvd.c:1326:tgt_add_reply_data()) lustre-MDT0000: can't update reply_data file: rc = -30
[ 9962.071742] LustreError: 11366:0:(osd_handler.c:2089:osd_trans_stop()) lustre-MDT0000: failed in transaction hook: rc = -30
[ 9962.074184] LustreError: 11366:0:(osd_handler.c:2099:osd_trans_stop()) lustre-MDT0000: failed to stop transaction: rc = -30
[ 9962.074274] LustreError: 11348:0:(osd_handler.c:1789:osd_trans_commit_cb()) transaction @0x00000000c73ec34c commit error: 2

 



 Comments   
Comment by Xinliang Liu [ 02/Mar/23 ]

A similar issue also happens in running the io500 test. But the kernel version is different.

LU-12268

Comment by Xinliang Liu [ 02/Mar/23 ]

Bisected the related commit/patch:

f94c02917f1d ext4: avoid cycles in directory h-tree ( Which included in openEuler 22.03 LTS kernel kernel-5.10.0-60.58.0.86.oe2203)
ldiskfs/kernel_patches/patches/oe2203/ext4-pdirop.patch (Lustre ldiskfs patch on ext4) 

Workaround:

revert commit “f94c02917f1d ext4: avoid cycles in directory h-tree“ and update ext4-pdirop.patch.

see io500 test suite running result, it is running ok with this workaround.

[openeuler@oe2203-test io500]$ sudo ./io500 config-minimal.ini
IO500 version io500-sc22_v2 (standard)
[RESULT]       ior-easy-write        0.103132 GiB/s : time 316.294 seconds
[RESULT]    mdtest-easy-write        0.067036 kIOPS : time 301.645 seconds
[      ]            timestamp        0.000000 kIOPS : time 0.000 seconds
[RESULT]       ior-hard-write        0.101985 GiB/s : time 312.619 seconds
[RESULT]    mdtest-hard-write        0.054293 kIOPS : time 301.826 seconds
[RESULT]                 find        3.992785 kIOPS : time 9.124 seconds
[RESULT]        ior-easy-read        0.023636 GiB/s : time 1380.092 seconds
[RESULT]     mdtest-easy-stat        0.107839 kIOPS : time 187.558 seconds
[RESULT]        ior-hard-read        0.022159 GiB/s : time 1438.550 seconds
[RESULT]     mdtest-hard-stat        0.203911 kIOPS : time 81.015 seconds
[RESULT]   mdtest-easy-delete        0.106105 kIOPS : time 190.760 seconds
[RESULT]     mdtest-hard-read        0.065468 kIOPS : time 250.149 seconds
[RESULT]   mdtest-hard-delete        0.103164 kIOPS : time 159.408 seconds
[SCORE ] Bandwidth 0.048447 GiB/s : IOPS 0.147904 kiops : TOTAL 0.084649
The result files are stored in the directory: ./results/2023.02.22-01.44.09
[openeuler@oe2203-test io500]$ uname -r
5.10.0-60.79.0.103.oe2203.aarch64 

 

The commit “f94c02917f1d ext4: avoid cycles in directory h-tree“ should be ok, we need to tune the ext4-pdirop.patch maybe.

Comment by Xinliang Liu [ 02/Mar/23 ]

It seems this issue related to below two code parts:

Part1 (introduced by  commit: f94c02917f1d ext4: avoid cycles in directory h-tree)

block = dx_get_block(at);
for (i = 0; i <= level; i++) {
    if (blocks[i] == block) {
        ext4_warning_inode(dir,
            "dx entry: tree cycle block %u points back to block %u",
            blocks[level], block);
        goto fail;
    }
}

Part2 (introduced by ext4-pdirop.patch)

if (indirect == level) { /* the last index level */
    struct ext4_dir_lock_data *ld;
    u64 myblock;    /* By default we only lock DE-block, however, we will
     * also lock the last level DX-block if:
     * a) there is hash collision
     *    we will set DX-lock flag (a few lines below)
     *    and redo to lock DX-block
     *    see detail in dx_probe_hash_collision()
     * b) it's a retry from splitting
     *    we need to lock the last level DX-block so nobody
     *    else can split any leaf blocks under the same
     *    DX-block, see detail in ext4_dx_add_entry()
     */
    if (ext4_htree_dx_locked(lck)) {
        /* DX-block is locked, just lock DE-block
         * and return
         */
        ext4_htree_spin_unlock(lck);
        if (!ext4_htree_safe_locked(lck))
                ext4_htree_de_lock(lck, frame->at);

...

    if (myblock == EXT4_HTREE_NODE_CHANGED) {
        /* someone split this DE-block before
         * I locked it, I need to retry and lock
         * valid DE-block
         */
        ext4_htree_de_unlock(lck);
        continue;
    }
    return frame;
}

After putting part2 after part1, this issue gone.

Comment by Andreas Dilger [ 02/Mar/23 ]

Xinliang, great debugging.

Comment by Xinliang Liu [ 02/Mar/23 ]

Verified that rhel9.1 kernel kernel-5.14.0-162.12.1.el9_1 has no this issue.

Comment by Andreas Dilger [ 02/Mar/23 ]

Xinliang, since this is a bug in the ldiskfs patch series for that kernel version, can you please submit a patch to update that series with the fix.

Comment by Xinliang Liu [ 03/Mar/23 ]

Andreas, sure. Working on it.

Comment by Gerrit Updater [ 03/Mar/23 ]

"xinliang <xinliang.liu@linaro.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50192
Subject: LU-16610 ldiskfs: fix directory corruption on openeuler 22.03
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9280d8c880e534629df89b7d40e531297e701c99

Comment by Gerrit Updater [ 21/Mar/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50192/
Subject: LU-16610 ldiskfs: fix directory corruption on openeuler 22.03
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 85b76aa91a3999a325a9ef970f0cc8b6dd1cdda7

Comment by Peter Jones [ 22/Mar/23 ]

Landed for 2.16

Generated at Sat Feb 10 03:28:29 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.