[LU-17397] mdtest failed (Lustre became read-only) under high stress - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Duplicate
Priority: Blocker
Fix Version/s: None
Affects Version/s: Lustre 2.15.0, Lustre 2.15.3
Labels:
None
Environment:
client/server: CentOS-8.5.2111 + Lustre 2.15.3
Linux 4.18.0-348.2.1.el8_lustre.x86_64 #1 SMP Fri Jun 17 00:10:32 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Epic/Theme:
Severity:
3
Epic:
Rank (Obsolete):
9223372036854775807

Description

We test metadata performance in a simple Lustre environment, where we deploy two servers (#server01, #server02) and both connect to a SAN storage:

For #server01: we mount a MGT, a MDT, and four OSTs
For #server02: we mount a MDT, and four OSTs

Here, MDS and OSS run in the same server, and Lustre includes two MDTs and 8 OSTs.

[root@client02 lustre]# lfs df -h
UUID bytes Used Available Use% Mounted on
l_lfs-MDT0000_UUID 1.8T 39.2G 1.6T 3% /lustre[MDT:0]
l_lfs-MDT0001_UUID 1.8T 39.4G 1.6T 3% /lustre[MDT:1]
l_lfs-OST0000_UUID 11.9T 3.5T 7.8T 31% /lustre[OST:0]
l_lfs-OST0001_UUID 11.9T 3.6T 7.7T 32% /lustre[OST:1]
l_lfs-OST0002_UUID 11.9T 3.6T 7.7T 32% /lustre[OST:2]
l_lfs-OST0003_UUID 11.9T 3.6T 7.7T 32% /lustre[OST:3]
l_lfs-OST0004_UUID 11.9T 3.8T 7.5T 34% /lustre[OST:4]
l_lfs-OST0005_UUID 11.9T 3.5T 7.8T 32% /lustre[OST:5]
l_lfs-OST0006_UUID 11.9T 3.5T 7.8T 31% /lustre[OST:6]
l_lfs-OST0007_UUID 11.9T 3.6T 7.7T 32% /lustre[OST:7]

filesystem_summary: 95.1T 28.6T 61.8T 32% /lustre

We leverage mdtest, mpirun with two clients to test metadate performance under the configuration above, the test command is as follows:

$> mpirun --allow-run-as-root --oversubscribe -mca btl ^openib --mca btl_tcp_if_include 40.40.22.0/24 -np 64 -host client01:32,client02:32 --map-by node mdtest -L -z 3 -b 2 -I 160000 -i 1 -d /lustre/mdtest_demo | tee 2client_64np_3z_2b_160000I.log

After stably running around 15 mins, Lustre becomes read-only (blocks the whole test) and generate the sys log as follows:

[Fri Jan 5 17:29:36 2024] Lustre: l_lfs-OST0001: deleting orphan objects from 0x440000400:26730785 to 0x440000400:26744321
[Fri Jan 5 17:43:19 2024] LDISKFS-fs error (device ultrapatha): dx_probe:1169: inode #61343829: block 151386: comm mdt19_001: directory leaf block found instead of index block
[Fri Jan 5 17:43:19 2024] Aborting journal on device ultrapatha-8.
[Fri Jan 5 17:43:19 2024] LDISKFS-fs (ultrapatha): Remounting filesystem read-only
[Fri Jan 5 17:43:19 2024] LDISKFS-fs error (device ultrapatha): ldiskfs_journal_check_start:61: Detected aborted journal
[Fri Jan 5 17:43:19 2024] LDISKFS-fs error (device ultrapatha): dx_probe:1169: inode #61343829: block 151386: comm mdt20_004: directory leaf block found instead of index block
[Fri Jan 5 17:43:19 2024] LDISKFS-fs error (device ultrapatha): dx_probe:1169: inode #61343829: block 151386: comm mdt20_004: directory leaf block found instead of index block
[Fri Jan 5 17:43:19 2024] LustreError: 61165:0:(osd_handler.c:1790:osd_trans_commit_cb()) transaction @0x0000000082b2d9d3 commit error: 2
[Fri Jan 5 17:43:19 2024] LDISKFS-fs error (device ultrapatha): dx_probe:1169: inode #61343829: block 151386: comm mdt08_003: directory leaf block found instead of index block
[Fri Jan 5 17:43:19 2024] LDISKFS-fs error (device ultrapatha): dx_probe:1169: inode #61343829: block 151386: comm mdt21_000: directory leaf block found instead of index block
[Fri Jan 5 17:43:19 2024] LDISKFS-fs error (device ultrapatha): dx_probe:1169: inode #61343829: block 151386: comm mdt20_004: directory leaf block found instead of index block
[Fri Jan 5 17:43:19 2024] LDISKFS-fs error (device ultrapatha): dx_probe:1169: inode #61343829: block 151386: comm mdt08_003: directory leaf block found instead of index block
[Fri Jan 5 17:43:19 2024] LDISKFS-fs error (device ultrapatha): dx_probe:1169: inode #61343829: block 151386: comm mdt05_001: directory leaf block found instead of index block
[Fri Jan 5 17:43:20 2024] LDISKFS-fs error (device ultrapatha): dx_probe:1169: inode #61343829: block 151386: comm mdt21_000: directory leaf block found instead of index block
[Fri Jan 5 17:43:20 2024] LDISKFS-fs error (device ultrapatha): dx_probe:1169: inode #61343829: block 151386: comm mdt18_002: directory leaf block found instead of index block
[Fri Jan 5 17:43:24 2024] LDISKFS-fs error: 355 callbacks suppressed
[Fri Jan 5 17:43:24 2024] LDISKFS-fs error (device ultrapatha): dx_probe:1169: inode #61343829: block 151386: comm mdt20_000: directory leaf block found instead of index block
[Fri Jan 5 17:43:24 2024] LDISKFS-fs error (device ultrapatha): dx_probe:1169: inode #61343829: block 151386: comm mdt21_000: directory leaf block found instead of index block
[Fri Jan 5 17:43:24 2024] LDISKFS-fs error (device ultrapatha): dx_probe:1169: inode #61343829: block 151386: comm mdt07_004: directory leaf block found instead of index block
[Fri Jan 5 17:43:24 2024] LDISKFS-fs error (device ultrapatha): dx_probe:1169: inode #61343829: block 151386: comm mdt20_000: directory leaf block found instead of index block
[Fri Jan 5 17:43:24 2024] LDISKFS-fs error (device ultrapatha): dx_probe:1169: inode #61343829: block 151386: comm mdt07_004: directory leaf block found instead of index block
[Fri Jan 5 17:43:25 2024] LDISKFS-fs error (device ultrapatha): dx_probe:1169: inode #61343829: block 151386: comm mdt19_001: directory leaf block found instead of index block
[Fri Jan 5 17:43:25 2024] LDISKFS-fs error (device ultrapatha): dx_probe:1169: inode #61343829: block 151386: comm mdt19_001: directory leaf block found instead of index block
[Fri Jan 5 17:43:25 2024] LDISKFS-fs error (device ultrapatha): dx_probe:1169: inode #61343829: block 151386: comm mdt18_001: directory leaf block found instead of index block
[Fri Jan 5 17:43:25 2024] LDISKFS-fs error (device ultrapatha): dx_probe:1169: inode #61343829: block 151386: comm mdt07_004: directory leaf block found instead of index block
[Fri Jan 5 17:43:25 2024] LDISKFS-fs error (device ultrapatha): dx_probe:1169: inode #61343829: block 151386: comm mdt20_000: directory leaf block found instead of index block

We repeat the test many times, and still get the similar result (i.e., the LDISKFS-fs error in MDT0 or MDT1), and the workload scale is as follow:

[root@client01 lustre]# lfs quota -u root /lustre/
Disk quotas for usr root (uid 0):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/ 30805903960 0 0 - 96453928 0 0 -

Originally, we find this issue with 2.15.0 and we try to upgrade to 2.15.3, but this issue still exists and block our test.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

image-2024-01-09-21-00-00-430.png
18 kB
09/Jan/24 1:00 PM
image-2024-01-17-15-52-59-936.png
50 kB
17/Jan/24 7:53 AM

Activity

[LU-17397] mdtest failed (Lustre became read-only) under high stress

Zuoru Yang added a comment - 17/Jan/24 9:16 AM

@Andreas Dilger Sure, we have evaluated the same test case in AlmaLinux 8.8 + 2.15.3 with the new kernel (4.18.0-477.10.1.el8_lustre.x86_64), now the issue did not occur. Thanks again!

Zuoru Yang added a comment - 17/Jan/24 9:16 AM @Andreas Dilger Sure, we have evaluated the same test case in AlmaLinux 8.8 + 2.15.3 with the new kernel (4.18.0-477.10.1.el8_lustre.x86_64), now the issue did not occur. Thanks again!

Andreas Dilger added a comment - 17/Jan/24 8:42 AM

Time to upgrade your server kernel and rebuild in that case.

Andreas Dilger added a comment - 17/Jan/24 8:42 AM Time to upgrade your server kernel and rebuild in that case.

Zuoru Yang added a comment - 17/Jan/24 7:56 AM

@Andreas Dilger, Hi Andreas, thanks for your insights. We double-checked the linux kernel in our env (actually, we install the kernel package from the Whamcloud with 2.15.0 repo (later upgrade Lustre server to 2.15.3): https://downloads.whamcloud.com/public/lustre/lustre-2.15.0-ib/MOFED-5.6-1.0.3.3/el8.5.2111/server/RPMS/x86_64/), and we confirm that the kernel in the link does not have the patch.

Zuoru Yang added a comment - 17/Jan/24 7:56 AM @Andreas Dilger, Hi Andreas, thanks for your insights. We double-checked the linux kernel in our env (actually, we install the kernel package from the Whamcloud with 2.15.0 repo (later upgrade Lustre server to 2.15.3): https://downloads.whamcloud.com/public/lustre/lustre-2.15.0-ib/MOFED-5.6-1.0.3.3/el8.5.2111/server/RPMS/x86_64/), and we confirm that the kernel in the link does not have the patch.

Andreas Dilger added a comment - 17/Jan/24 4:54 AM - edited

yzr95924, thank you for your launchpad reference. Indeed that bug looks like it could be related. That patch is reported included in upstream kernel 5.14 and stable kernel 5.11, and fixing a bug originally in kernel 5.11 (but also backported to the RHEL kernel):

commit 877ba3f729fd3d8ef0e29bc2a55e57cfa54b2e43
Author:     Theodore Ts'o <tytso@mit.edu>
AuthorDate: Wed Aug 4 14:23:55 2021 -0400

    ext4: fix potential htree corruption when growing large_dir directories
    
    Commit b5776e7524af ("ext4: fix potential htree index checksum
    corruption) removed a required restart when multiple levels of index
    nodes need to be split.  Fix this to avoid directory htree corruptions
    when using the large_dir feature.
    
    Cc: stable@kernel.org # v5.11
    Cc: Artem Blagodarenko <artem.blagodarenko@gmail.com>
    Fixes: b5776e7524af ("ext4: fix potential htree index checksum corruption)
    Reported-by: Denis <denis@voxelsoft.com>
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

I can confirm that the patch is applied in 4.18.0-425.13.1.el8_7.x86_64 in fs/ext4/namei.c:

                        if (err)
                                goto journal_error;
                        err = ext4_handle_dirty_dx_node(handle, dir,
                                                        frame->bh);
                        if (restart || err)
                                goto journal_error;

but I'm not sure whether it is applied in your kernel 4.18.0-348.2.1.el8_lustre.x86_64.

Andreas Dilger added a comment - 17/Jan/24 4:54 AM - edited yzr95924 , thank you for your launchpad reference. Indeed that bug looks like it could be related. That patch is reported included in upstream kernel 5.14 and stable kernel 5.11, and fixing a bug originally in kernel 5.11 (but also backported to the RHEL kernel): commit 877ba3f729fd3d8ef0e29bc2a55e57cfa54b2e43 Author: Theodore Ts'o <tytso@mit.edu> AuthorDate: Wed Aug 4 14:23:55 2021 -0400 ext4: fix potential htree corruption when growing large_dir directories Commit b5776e7524af ("ext4: fix potential htree index checksum corruption) removed a required restart when multiple levels of index nodes need to be split. Fix this to avoid directory htree corruptions when using the large_dir feature. Cc: stable@kernel.org # v5.11 Cc: Artem Blagodarenko <artem.blagodarenko@gmail.com> Fixes: b5776e7524af ("ext4: fix potential htree index checksum corruption) Reported-by: Denis <denis@voxelsoft.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> I can confirm that the patch is applied in 4.18.0-425.13.1.el8_7.x86_64 in fs/ext4/namei.c : if (err) goto journal_error; err = ext4_handle_dirty_dx_node(handle, dir, frame->bh); if (restart || err) goto journal_error; but I'm not sure whether it is applied in your kernel 4.18.0-348.2.1.el8_lustre.x86_64 .

Zuoru Yang added a comment - 12/Jan/24 3:39 AM

@Andreas Dilger BTW, the reason why I initially consider this issue is related large_dir is this link https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1933074

which also reports "directory leaf block found instead of index block" when there are millions of files on ext4. Never mind, we will test this issue with a newer kernel (e.g., in AlmaLinux 8.8 + 2.15.3)

Zuoru Yang added a comment - 12/Jan/24 3:39 AM @Andreas Dilger BTW, the reason why I initially consider this issue is related large_dir is this link https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1933074 which also reports "directory leaf block found instead of index block" when there are millions of files on ext4. Never mind, we will test this issue with a newer kernel (e.g., in AlmaLinux 8.8 + 2.15.3)

Zuoru Yang added a comment - 11/Jan/24 1:15 PM

@Andreas Dilger Thanks Andreas! We will follow this direction and try the same test with a newer kernel.

Zuoru Yang added a comment - 11/Jan/24 1:15 PM @Andreas Dilger Thanks Andreas! We will follow this direction and try the same test with a newer kernel.

Andreas Dilger added a comment - 11/Jan/24 12:18 PM

Also, have you tried updating to a newer kernel? It is possible that the ext4 in the kernel (and ldiskfs that is generated from this) has a bug that has since been fixed.

Andreas Dilger added a comment - 11/Jan/24 12:18 PM Also, have you tried updating to a newer kernel? It is possible that the ext4 in the kernel (and ldiskfs that is generated from this) has a bug that has since been fixed.

Andreas Dilger added a comment - 11/Jan/24 12:16 PM

Lustre does not modify the on-disk data structures of ldiskfs directly, although it is accessing the filesystem somewhat differently than a regular ext4 mount does. I don't think the issue is with large_dir, but more likely with parallel directory locking and updates. There would need to be some kind of bug in ext4 or the ldiskfs patches applied. It is not possible for the clients to corrupt the server filesystem directly.

That said, it appears from the e2fsck output that the on-disk data structures are not corrupted, so it seems like this is some kind of in-memory corruption? The free blocks/inode counts quota usage messages are normal for a filesystem that is in use.

There is a tunable parameter to disable the parallel directory locking and updates with "lctl set_param osd-ldiskfs.lustre-MDT*.pdo=0" on the MDS nodes. Note, that this is never tested and potentially could have some issues, beyond being much slower, but it would be useful to test if this avoids the issue.

Andreas Dilger added a comment - 11/Jan/24 12:16 PM Lustre does not modify the on-disk data structures of ldiskfs directly, although it is accessing the filesystem somewhat differently than a regular ext4 mount does. I don't think the issue is with large_dir, but more likely with parallel directory locking and updates. There would need to be some kind of bug in ext4 or the ldiskfs patches applied. It is not possible for the clients to corrupt the server filesystem directly. That said, it appears from the e2fsck output that the on-disk data structures are not corrupted, so it seems like this is some kind of in-memory corruption? The free blocks/inode counts quota usage messages are normal for a filesystem that is in use. There is a tunable parameter to disable the parallel directory locking and updates with " lctl set_param osd-ldiskfs.lustre-MDT*.pdo=0 " on the MDS nodes. Note, that this is never tested and potentially could have some issues, beyond being much slower, but it would be useful to test if this avoids the issue.

Zuoru Yang added a comment - 09/Jan/24 1:17 PM

@Andreas Dilger Sorry for my late reply, we spend some time to check our RAID to ensure that this is not caused by the storage backend. We consider that it might be a bug in EXT4?

Yes, there exists some files in the filesystem of previous experiments, and we remove them and try again with the same test command this time. The issue still occurs, and the info is as follows (since it does not create all files due to this issue):

[root@client02 ~]# lfs quota -u root /lustre/
Disk quotas for usr root (uid 0):
Filesystem kbytes quota limit grace files quota limit grace
/lustre/ 185332124 0 0 - 45025839 0 0 -

[Tue Jan 9 20:51:22 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt05_002: directory leaf block found instead of index block
[Tue Jan 9 20:51:22 2024] Aborting journal on device ultrapathb-8.
[Tue Jan 9 20:51:22 2024] LDISKFS-fs (ultrapathb): Remounting filesystem read-only
[Tue Jan 9 20:51:22 2024] LDISKFS-fs error (device ultrapathb): ldiskfs_journal_check_start:61: Detected aborted journal
[Tue Jan 9 20:51:22 2024] LustreError: 260307:0:(osd_handler.c:1790:osd_trans_commit_cb()) transaction @0x0000000069cf0f59 commit error: 2
[Tue Jan 9 20:51:22 2024] LustreError: 260307:0:(osd_handler.c:1790:osd_trans_commit_cb()) Skipped 52 previous similar messages
[Tue Jan 9 20:51:22 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt21_003: directory leaf block found instead of index block
[Tue Jan 9 20:51:22 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt18_002: directory leaf block found instead of index block
[Tue Jan 9 20:51:22 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt10_001: directory leaf block found instead of index block
[Tue Jan 9 20:51:22 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt05_002: directory leaf block found instead of index block
[Tue Jan 9 20:51:22 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt05_002: directory leaf block found instead of index block
[Tue Jan 9 20:51:22 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt18_002: directory leaf block found instead of index block
[Tue Jan 9 20:51:22 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt07_001: directory leaf block found instead of index block
[Tue Jan 9 20:51:22 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt19_002: directory leaf block found instead of index block
[Tue Jan 9 20:51:22 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt18_002: directory leaf block found instead of index block
[Tue Jan 9 20:51:27 2024] LDISKFS-fs error: 180 callbacks suppressed
[Tue Jan 9 20:51:27 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt20_004: directory leaf block found instead of index block
[Tue Jan 9 20:51:27 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt05_002: directory leaf block found instead of index block
[Tue Jan 9 20:51:27 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt18_002: directory leaf block found instead of index block
[Tue Jan 9 20:51:27 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt18_002: directory leaf block found instead of index block
[Tue Jan 9 20:51:27 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt10_003: directory leaf block found instead of index block
[Tue Jan 9 20:51:27 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt20_004: directory leaf block found instead of index block
[Tue Jan 9 20:51:27 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt07_003: directory leaf block found instead of index block
[Tue Jan 9 20:51:27 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt19_000: directory leaf block found instead of index block
[Tue Jan 9 20:51:27 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt10_003: directory leaf block found instead of index block
[Tue Jan 9 20:51:27 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt05_002: directory leaf block found instead of index block

Note that device ultrapathb is the backend of MDT1, and the following is the process record when we do e2fsck for device ultrapathb

Script started on 2024-01-09 21:09:49+08:00
^{[]0;root@server02:~^G[root@server02 ~]# e2fsck -f /dev/ut^H}[[Klt^Grapathb^M
e2fsck 1.46.6-wc1 (10-Jan-2023)^M
MMP interval is 10 seconds and total wait time is 42 seconds. Please wait...^M
l_lfs-MDT0001: recovering journal^M
Pass 1: Checking inodes, blocks, and sizes^M
Pass 2: Checking directory structure^M
Pass 3: Checking directory connectivity^M
Pass 4: Checking reference counts^M
Pass 5: Checking group summary information^M
Free blocks count wrong (142139109, counted=142167445).^M
Fix<y>? yes^M
Free inodes count wrong (412219565, counted=412247109).^M
Fix<y>? yes^M
[QUOTA WARNING] Usage inconsistent for ID 0:actual (72814075904, 17302001) != expected (72899485696, 17302001)^M
Update quota info for quota type 0<y>? yes^M
[QUOTA WARNING] Usage inconsistent for ID 0:actual (72814075904, 17302001) != expected (72899485696, 17302001)^M
Update quota info for quota type 1<y>? yes^M
[QUOTA WARNING] Usage inconsistent for ID 0:actual (72814075904, 17302001) != expected (72899485696, 17302001)^M
Update quota info for quota type 2<y>? yes^M
^M
l_lfs-MDT0001: ***** FILE SYSTEM WAS MODIFIED *****^M
l_lfs-MDT0001: 17302011/429549120 files (0.0% non-contiguous), 126261419/268428864 blocks^M
^[]0;root@server02:~^G[root@server02 ~]# exit^M
exit^M

Script done on 2024-01-09 21:16:33+08:00

Is that possible an issue from ext4 with large_dir?

Zuoru Yang added a comment - 09/Jan/24 1:17 PM @Andreas Dilger Sorry for my late reply, we spend some time to check our RAID to ensure that this is not caused by the storage backend. We consider that it might be a bug in EXT4? Yes, there exists some files in the filesystem of previous experiments, and we remove them and try again with the same test command this time. The issue still occurs, and the info is as follows (since it does not create all files due to this issue): [root@client02 ~] # lfs quota -u root /lustre/ Disk quotas for usr root (uid 0): Filesystem kbytes quota limit grace files quota limit grace /lustre/ 185332124 0 0 - 45025839 0 0 - [Tue Jan 9 20:51:22 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt05_002: directory leaf block found instead of index block [Tue Jan 9 20:51:22 2024] Aborting journal on device ultrapathb-8. [Tue Jan 9 20:51:22 2024] LDISKFS-fs (ultrapathb): Remounting filesystem read-only [Tue Jan 9 20:51:22 2024] LDISKFS-fs error (device ultrapathb): ldiskfs_journal_check_start:61: Detected aborted journal [Tue Jan 9 20:51:22 2024] LustreError: 260307:0:(osd_handler.c:1790:osd_trans_commit_cb()) transaction @0x0000000069cf0f59 commit error: 2 [Tue Jan 9 20:51:22 2024] LustreError: 260307:0:(osd_handler.c:1790:osd_trans_commit_cb()) Skipped 52 previous similar messages [Tue Jan 9 20:51:22 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt21_003: directory leaf block found instead of index block [Tue Jan 9 20:51:22 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt18_002: directory leaf block found instead of index block [Tue Jan 9 20:51:22 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt10_001: directory leaf block found instead of index block [Tue Jan 9 20:51:22 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt05_002: directory leaf block found instead of index block [Tue Jan 9 20:51:22 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt05_002: directory leaf block found instead of index block [Tue Jan 9 20:51:22 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt18_002: directory leaf block found instead of index block [Tue Jan 9 20:51:22 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt07_001: directory leaf block found instead of index block [Tue Jan 9 20:51:22 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt19_002: directory leaf block found instead of index block [Tue Jan 9 20:51:22 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt18_002: directory leaf block found instead of index block [Tue Jan 9 20:51:27 2024] LDISKFS-fs error: 180 callbacks suppressed [Tue Jan 9 20:51:27 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt20_004: directory leaf block found instead of index block [Tue Jan 9 20:51:27 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt05_002: directory leaf block found instead of index block [Tue Jan 9 20:51:27 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt18_002: directory leaf block found instead of index block [Tue Jan 9 20:51:27 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt18_002: directory leaf block found instead of index block [Tue Jan 9 20:51:27 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt10_003: directory leaf block found instead of index block [Tue Jan 9 20:51:27 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt20_004: directory leaf block found instead of index block [Tue Jan 9 20:51:27 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt07_003: directory leaf block found instead of index block [Tue Jan 9 20:51:27 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt19_000: directory leaf block found instead of index block [Tue Jan 9 20:51:27 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt10_003: directory leaf block found instead of index block [Tue Jan 9 20:51:27 2024] LDISKFS-fs error (device ultrapathb): dx_probe:1169: inode #104316384: block 149479: comm mdt05_002: directory leaf block found instead of index block Note that device ultrapathb is the backend of MDT1, and the following is the process record when we do e2fsck for device ultrapathb Script started on 2024-01-09 21:09:49+08:00 []0;root@server02:~^G [root@server02 ~] # e2fsck -f /dev/ut^H [[Klt^Grapathb^M e2fsck 1.46.6-wc1 (10-Jan-2023)^M MMP interval is 10 seconds and total wait time is 42 seconds. Please wait...^M l_lfs-MDT0001: recovering journal^M Pass 1: Checking inodes, blocks, and sizes^M Pass 2: Checking directory structure^M Pass 3: Checking directory connectivity^M Pass 4: Checking reference counts^M Pass 5: Checking group summary information^M Free blocks count wrong (142139109, counted=142167445).^M Fix<y>? yes^M Free inodes count wrong (412219565, counted=412247109).^M Fix<y>? yes^M [QUOTA WARNING] Usage inconsistent for ID 0:actual (72814075904, 17302001) != expected (72899485696, 17302001)^M Update quota info for quota type 0<y>? yes^M [QUOTA WARNING] Usage inconsistent for ID 0:actual (72814075904, 17302001) != expected (72899485696, 17302001)^M Update quota info for quota type 1<y>? yes^M [QUOTA WARNING] Usage inconsistent for ID 0:actual (72814075904, 17302001) != expected (72899485696, 17302001)^M Update quota info for quota type 2<y>? yes^M ^M l_lfs-MDT0001: ***** FILE SYSTEM WAS MODIFIED *****^M l_lfs-MDT0001: 17302011/429549120 files (0.0% non-contiguous), 126261419/268428864 blocks^M ^[]0;root@server02:~^G [root@server02 ~] # exit^M exit^M Script done on 2024-01-09 21:16:33+08:00 Is that possible an issue from ext4 with large_dir?

Andreas Dilger added a comment - 06/Jan/24 9:09 AM

Just to confirm the test being run, each rank is creating 160000 files in a separate subdirectory from the other ranks, and there are 2^3 leaf subdirectories (branching factor 2, depth 3)? That would create about 82M files, but it looks like there are some existing files in the filesystem.

What does e2fsck show when run on the corrupt MDT?

Andreas Dilger added a comment - 06/Jan/24 9:09 AM Just to confirm the test being run, each rank is creating 160000 files in a separate subdirectory from the other ranks, and there are 2^3 leaf subdirectories (branching factor 2, depth 3)? That would create about 82M files, but it looks like there are some existing files in the filesystem. What does e2fsck show when run on the corrupt MDT?

People

Assignee:: WC Triage

Reporter:: Zuoru Yang

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 06/Jan/24 3:30 AM

Updated:: 17/Jan/24 9:16 AM

Resolved:: 17/Jan/24 8:42 AM