[LU-12268] LDISKFS-fs error: ldiskfs_find_dest_de:2066: bad entry in directory: rec_len is smaller than minimal - offset=0( 0), inode=201, rec_len=0, name_len=0 Created: 07/May/19  Updated: 02/Mar/23  Resolved: 24/May/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Minor
Reporter: Matt Rásó-Barnett (Inactive) Assignee: Hongchao Zhang
Resolution: Cannot Reproduce Votes: 0
Labels: None

Issue Links:
Related
is related to LU-12265 LustreError: 141027:0:(osd_iam_lfix.c... Reopened
is related to LU-16610 ldiskfs_find_dest_de bad entry in dir... Resolved
is related to LU-15016 OI Scrub backup and rebuild Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Hello,
I've hit an ldiskfs error today while doing more 'mdtest' benchmarking runs against our all-flash nvme-based filesystem, running Lustre 2.12.0. Below is the syslog output from the affected server:

May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs error (device dm-5): ldiskfs_find_dest_de:2066: inode #25165841: block 15874389: comm mdt00_054: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=201, rec_len=0, name_len=0
May 07 16:47:23 dac-e-3 kernel: Aborting journal on device dm-5-8.
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs (dm-5): Remounting filesystem read-only
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs error (device dm-5) in ldiskfs_write_dquot:5425: IO failure
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs error (device dm-5) in ldiskfs_write_dquot:5425: IO failure
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs error (device dm-5) in ldiskfs_write_dquot:5425: IO failure
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs error (device dm-5) in ldiskfs_write_dquot:5425: IO failure
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs error (device dm-5) in ldiskfs_write_dquot:5425: IO failure
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs error (device dm-5) in ldiskfs_write_dquot:5425: IO failure
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs error (device dm-5) in ldiskfs_write_dquot:5425: IO failure
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs error (device dm-5) in ldiskfs_write_dquot:5425: IO failure
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs error (device dm-5): ldiskfs_journal_check_start:56: Detected aborted journal
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs (dm-5): Remounting filesystem read-only
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs error (device dm-5) in ldiskfs_write_dquot:5425: IO failure
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs error (device dm-5): ldiskfs_journal_check_start:56: Detected aborted journal
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs error (device dm-5): ldiskfs_journal_check_start:56: Detected aborted journal
May 07 16:47:23 dac-e-3 kernel: LustreError: 275116:0:(osd_io.c:2059:osd_ldiskfs_write_record()) journal_get_write_access() returned error -30
May 07 16:47:23 dac-e-3 kernel: LustreError: 275209:0:(tgt_lastrcvd.c:1176:tgt_add_reply_data()) fs1-MDT0002: can't update reply_data file: rc = -30
May 07 16:47:23 dac-e-3 kernel: LustreError: 275116:0:(osd_io.c:2059:osd_ldiskfs_write_record()) Skipped 1 previous similar message
May 07 16:47:23 dac-e-3 kernel: LustreError: 275209:0:(osd_handler.c:2007:osd_trans_stop()) fs1-MDT0002: failed in transaction hook: rc = -30
May 07 16:47:23 dac-e-3 kernel: LustreError: 275116:0:(osd_handler.c:2017:osd_trans_stop()) fs1-MDT0002: failed to stop transaction: rc = -30
May 07 16:47:23 dac-e-3 kernel: LustreError: 275209:0:(osd_handler.c:2017:osd_trans_stop()) fs1-MDT0002: failed to stop transaction: rc = -30
May 07 16:47:23 dac-e-3 kernel: Quota error (device dm-5): qtree_write_dquot: dquota write failed
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs error (device dm-5): ldiskfs_journal_check_start:56: Detected aborted journal
May 07 16:47:23 dac-e-3 kernel: Quota error (device dm-5): qtree_write_dquot: dquota write failed
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs error (device dm-5): ldiskfs_journal_check_start:56: Detected aborted journal
May 07 16:47:23 dac-e-3 kernel: Quota error (device dm-5): qtree_write_dquot: dquota write failed
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs error (device dm-5): ldiskfs_journal_check_start:56: Detected aborted journal
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs error (device dm-5): ldiskfs_journal_check_start:56: Detected aborted journal
May 07 16:47:23 dac-e-3 kernel: Quota error (device dm-5): qtree_write_dquot: dquota write failed
May 07 16:47:23 dac-e-3 kernel: Quota error (device dm-5): qtree_write_dquot: dquota write failed
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs error (device dm-5): ldiskfs_journal_check_start:56: Detected aborted journal
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs error (device dm-5): ldiskfs_journal_check_start:56: Detected aborted journal
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs error (device dm-5): ldiskfs_journal_check_start:56: Detected aborted journal
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs error (device dm-5): ldiskfs_journal_check_start:56: Detected aborted journal
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs error (device dm-5): ldiskfs_journal_check_start:56: Detected aborted journal
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs error (device dm-5): ldiskfs_journal_check_start:56: Detected aborted journal
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs error (device dm-5): ldiskfs_journal_check_start:56: Detected aborted journal
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs error (device dm-5): ldiskfs_journal_check_start:56: Detected aborted journal
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs error (device dm-5): ldiskfs_journal_check_start:56: Detected aborted journal
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs error (device dm-5): ldiskfs_journal_check_start:56: Detected aborted journal
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs error (device dm-5): ldiskfs_journal_check_start:56: Detected aborted journal
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs error (device dm-5): ldiskfs_journal_check_start:56: Detected aborted journal
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs error (device dm-5): ldiskfs_journal_check_start:56: Detected aborted journal
May 07 16:47:23 dac-e-3 kernel: LustreError: 181259:0:(llog_cat.c:576:llog_cat_add_rec()) llog_write_rec -30: lh=ffff913ee0551b00
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs error (device dm-5): ldiskfs_journal_check_start:56: Detected aborted journal
May 07 16:47:23 dac-e-3 kernel: LDISKFS-fs error (device dm-5): ldiskfs_journal_check_start:56: Detected aborted journal
May 07 16:47:23 dac-e-3 kernel: LustreError: 136266:0:(osd_handler.c:1707:osd_trans_commit_cb()) transaction @0xffff913cc5e73e00 commit error: 2

I detailed the environment of this filesystem in https://jira.whamcloud.com/browse/LU-12265 in more detail - I'm doing the exact same tests as in that ticket as part of an IO500 submission, and specifically I'm seeing problems when running mdtest, but this appeared to be a different error so I thought I would raise a new ticket.

The filesystem itself is configured using DNE and specifically we are using DNE2 striped directories for all mdtest runs. We are using a large number of MDTs, 24 at the moment, one-per server, (which other than this problem, is otherwise working excellently), and the directory-stripe is '-1', so we are striping all the directories over all 24 MDTs, one per server. Each server contains 12 NVMe drives, and we partition one of the drives so it has both an OST and MDT partition.

Lustre and Kernel versions are as follows:

Server: kernel-3.10.0-957.el7_lustre.x86_64
Server: lustre-2.12.0-1.el7.x86_64

Clients: kernel-3.10.0-957.10.1.el7.x86_64
Clients: lustre-client-2.10.7-1.el7.x86_64

The test itself:

mdtest-1.9.3 was launched with 4096 total task(s) on 128 node(s)
Command line used: /home/mjr208/projects/benchmarking/io-500-src-stonewall-fix/bin/mdtest "-C" "-n" "70000" "-u" "-L" "-F" "-d" "/dac/fs1/mjr208/job11335245-2019-05-07-1537/mdt_easy"
Path: /dac/fs1/mjr208/job11335245-2019-05-07-1537
FS: 412.6 TiB   Used FS: 24.2%   Inodes: 960.0 Mi   Used Inodes: 0.0%

4096 tasks, 286720000 files

Is there anything from the above log that indicates the cause of the problem? Do I perhaps have a bad device here, or is this perhaps a bug?



 Comments   
Comment by Matt Rásó-Barnett (Inactive) [ 07/May/19 ]

Looking over my server logs I see another instance of this error from yesterday during similar runs, but on a different server/disk:

May 06 12:57:51 dac-e-9 kernel: LDISKFS-fs error (device dm-3): ldiskfs_find_dest_de:2066: inode #26738690: block 16857697: comm mdt01_053: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=191, rec_len=0, name_len=0
May 06 12:57:51 dac-e-9 kernel: Aborting journal on device dm-3-8.
May 06 12:57:51 dac-e-9 kernel: LDISKFS-fs (dm-3): Remounting filesystem read-only
May 06 12:57:51 dac-e-9 kernel: LDISKFS-fs error (device dm-3): ldiskfs_journal_check_start:56: Detected aborted journal
May 06 12:57:51 dac-e-9 kernel: LDISKFS-fs (dm-3): Remounting filesystem read-only
May 06 12:57:51 dac-e-9 kernel: LDISKFS-fs error (device dm-3) in iam_txn_add:575: Journal has aborted
May 06 12:57:51 dac-e-9 kernel: LustreError: 381727:0:(osd_io.c:2059:osd_ldiskfs_write_record()) journal_get_write_access() returned error -30
May 06 12:57:51 dac-e-9 kernel: LustreError: 381727:0:(llog_cat.c:576:llog_cat_add_rec()) llog_write_rec -30: lh=ffff9434a2333100
May 06 12:57:51 dac-e-9 kernel: LustreError: 381727:0:(tgt_lastrcvd.c:1176:tgt_add_reply_data()) fs1-MDT0008: can't update reply_data file: rc = -30
May 06 12:57:51 dac-e-9 kernel: LustreError: 381727:0:(osd_handler.c:2007:osd_trans_stop()) fs1-MDT0008: failed in transaction hook: rc = -30
May 06 12:57:51 dac-e-9 kernel: LustreError: 381727:0:(osd_handler.c:2017:osd_trans_stop()) fs1-MDT0008: failed to stop transaction: rc = -30
May 06 12:57:51 dac-e-9 kernel: LustreError: 112648:0:(osd_handler.c:1707:osd_trans_commit_cb()) transaction @0xffff9431c4a39500 commit error: 2
Comment by Joseph Gmitter (Inactive) [ 08/May/19 ]

Hi Hongchao,

Can you please advise?

Thanks.
Joe

Comment by Alexander Boyko [ 13/Jul/21 ]

I've been working on a similar issue, and I'm adding findings.

 
[2401623.056209] LDISKFS-fs error (device md1): ldiskfs_find_dest_de:2066: inode #1943585330: block 1932812092: comm mdt01_029: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=792330240, rec_len=0, name_len=0

direntry had a next data

ffff8a9b27df3000:  00 00 3a 2f 00 00 00 00 2e 02 00 00 00 00 00 02   ..:/............
ffff8a9b27df3010:  40 00 71 f2 00 00 3a 84 00 00 00 00 32 02 00 00   @.q...:.....2...
ffff8a9b27df3020:  00 00 00 02 40 00 71 f2 00 00 3a d9 00 00 00 00   ....@.q...:.....
ffff8a9b27df3030:  35 02 00 00 00 00 00 02 40 00 71 f2 00 00 3b 2e   5.......@.q...;.
ffff8a9b27df3040:  00 00 00 00 38 02 00 00 00 00 00 02 40 00 71 f2   ....8.......@.q.
ffff8a9b27df3050:  00 00 3b 83 00 00 00 00 3b 02 00 00 00 00 00 02   ..;.....;.......

It was corrupted with a specific pattern, lu_fid + u32. From offset 0xc [0x2400071f2:0x3a84:0], 0x0232
It is very close to dir_data fids, but is not encapsulated at direntry struct. I've checked previous memory page, and it includes the same pattern, and two more behind.
So it is simple buffer overflow. After some memory searches with affected address, only one thread was found.

crash> bt 66295
PID: 66295  TASK: ffff8acb3e7830c0  CPU: 5   COMMAND: "mdt01_042"
 #0 [ffff8acbebf47028] __schedule at ffffffff9276ab17
 #1 [ffff8acbebf470b8] schedule at ffffffff9276b019
 #2 [ffff8acbebf470c8] schedule_timeout at ffffffff92768b21
 #3 [ffff8acbebf47170] io_schedule_timeout at ffffffff9276a6bd
 #4 [ffff8acbebf471a0] io_schedule at ffffffff9276a758
 #5 [ffff8acbebf471b0] bit_wait_io at ffffffff92769141
 #6 [ffff8acbebf471c8] __wait_on_bit_lock at ffffffff92768cf1
 #7 [ffff8acbebf47208] out_of_line_wait_on_bit_lock at ffffffff92768f41
 #8 [ffff8acbebf47280] __lock_buffer at ffffffff922792b2
 #9 [ffff8acbebf47290] ldiskfs_commit_super at ffffffffc1483444 [ldiskfs]
#10 [ffff8acbebf472d8] __ldiskfs_std_error at ffffffffc1483cbd [ldiskfs]
#11 [ffff8acbebf47320] iam_txn_add at ffffffffc151033c [osd_ldiskfs]
#12 [ffff8acbebf47340] iam_add_rec at ffffffffc1512e63 [osd_ldiskfs]
#13 [ffff8acbebf473c0] iam_insert at ffffffffc15139ce [osd_ldiskfs]
#14 [ffff8acbebf47530] osd_oi_iam_refresh at ffffffffc150a325 [osd_ldiskfs]
#15 [ffff8acbebf47580] osd_oi_insert at ffffffffc150ceec [osd_ldiskfs]
#16 [ffff8acbebf47600] osd_create at ffffffffc150558d [osd_ldiskfs]
#17 [ffff8acbebf47690] lod_sub_create at ffffffffc114cd25 [lod]
#18 [ffff8acbebf47738] lod_create at ffffffffc113d519 [lod]
#19 [ffff8acbebf47778] mdd_create_object_internal at ffffffffc11d7f23 [mdd]
#20 [ffff8acbebf477b0] mdd_create_object at ffffffffc11c12fb [mdd]
#21 [ffff8acbebf47828] mdd_create at ffffffffc11cb351 [mdd]
#22 [ffff8acbebf47920] mdt_reint_open at ffffffffc1278da3 [mdt]
#23 [ffff8acbebf47a20] mdt_reint_rec at ffffffffc126b663 [mdt]
#24 [ffff8acbebf47a48] mdt_reint_internal at ffffffffc1245543 [mdt]
#25 [ffff8acbebf47a88] mdt_intent_open at ffffffffc1251ea3 [mdt]

IAM index frame had much more entries than limit was.

crash> iam_iterator ffff8acbebf473c8
struct iam_iterator {
  ii_flags = 2,
  ii_state = IAM_IT_ATTACHED,
  ii_path = {
    ip_container = 0xffff8aaced341c08,
    ip_indirect = 1,
    ip_frames = {{
        bh = 0xffff8acc9c734208,
        entries = 0xffff8acca34b8010,
        at = 0xffff8acca34b8024,
        leaf = 203,
        curidx = 0,
        at_shifted = 0
      }, {
        bh = 0xffff8a9871fe7f48,
        entries = 0xffff8a9b27df0000,
        at = 0xffff8a9b27df10f4,
        leaf = 640,
        curidx = 203,
        at_shifted = 0
      }, {
        bh = 0x0,
...

dx_countlimit 0xffff8a9b27df0000
struct dx_countlimit {
  limit = 204,
  count = 640
}

IAM index started at 0xffff8a9b27df0000 and ends at ffff8a9b27df3xxx. It occupied 3+ pages instead of one. Base on b_end_io=0, buffer head for it was not red from a disk, it was a new. So r/c is not a disk corruption. I have not found a root cause at IAM sources, but all imprints lead to iam_add_rec()->split_index_node()/iam_new_leaf().
 

Comment by Andreas Dilger [ 13/Jul/21 ]

If this IAM code is based on the upstream ext4/ldiskfs htree code, then there were a few patches from Eric Sandeen that fixed bugs in the htree split code. It would be worthwhile to do a "git log fs/ext4/htree.c" to see what has changed since this code was created (which was at least 10 years ago, maybe Lustre 2.0?).

Comment by Gerrit Updater [ 28/Sep/21 ]

"Alexander Boyko <alexander.boyko@hpe.com>" uploaded a new patch: https://review.whamcloud.com/45072
Subject: LU-12268 osd: BUG_ON for IAM corruption
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 3cc624012d272088f98f3f91e155d562a0eff066

Comment by Gerrit Updater [ 10/Oct/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45072/
Subject: LU-12268 osd: BUG_ON for IAM corruption
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 5daf86607877ea81d0295a9d49a1fe06572e0352

Generated at Sat Feb 10 02:51:05 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.