Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-19105

e2fsck corruption after renaming a duplicate dentry

Details

    • Bug
    • Resolution: Unresolved
    • Critical
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      We hit the following bug on a MDS: LU-16405 regression in create that may cause directory entries with the same name

      e2fsck tried to fix this by renaming the duplicate entry in the directory. But it corrupted ext4 dentry "dirdata" and made the MDS crash when reading the directory (with errors=panic):

      [267361.856504] LDISKFS-fs error (device loop0): htree_dirblock_to_tree:1283: inode #25046: block 16721: comm ls: bad entry in directory: rec_len is too small for name_len - offset=40, inode=25047, rec_len=16, name_len=8, size=4096
      [267361.857551] Aborting journal on device loop0-8.
      [267361.857810] Kernel panic - not syncing: LDISKFS-fs (device loop0): panic forced after error
      
      [267361.858346] CPU: 1 PID: 1685 Comm: ls Kdump: loaded Tainted: G           OE  ------------ T 3.10.0-1160.59.1.el7.centos.plus.x86_64 #1
      [267361.858966] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [267361.859279] Call Trace:
      [267361.859601]  [<ffffffff983975b9>] dump_stack+0x19/0x1b
      [267361.859909]  [<ffffffff983912c1>] panic+0xe8/0x21f
      [267361.860238]  [<ffffffffc16449f1>] ldiskfs_handle_error.part.190+0x81/0xb0 [ldiskfs]
      [267361.860553]  [<ffffffffc1644d8a>] __ldiskfs_error_inode+0xaa/0x180 [ldiskfs]
      [267361.860859]  [<ffffffffc1608266>] __ldiskfs_check_dir_entry+0x176/0x180 [ldiskfs]
      [267361.861176]  [<ffffffffc1619612>] htree_dirblock_to_tree+0xe2/0x190 [ldiskfs]
      [267361.861482]  [<ffffffffc161bab5>] ldiskfs_htree_fill_tree+0xb5/0x2f0 [ldiskfs]
      [267361.861796]  [<ffffffff97dbe42b>] ? unlock_page+0x2b/0x30
      [267361.862108]  [<ffffffff97deee49>] ? do_read_fault.isra.63+0x139/0x1b0
      [267361.862410]  [<ffffffff97e293a6>] ? kmem_cache_alloc_trace+0x1d6/0x200
      [267361.862715]  [<ffffffffc16088f2>] ldiskfs_readdir+0x682/0x9f0 [ldiskfs]
      

      To put the FS back in production quickly, we simply remove the corrupted entry (empty directory) with debugfs.

      Reproducer
      Create directories:

      [root@dev lustre]# mv dir1 dir_old
      [root@dev lustre]# lfs mkdir -i1 dir1         
      [root@dev lustre]# cd !$
      cd dir1
      [root@dev dir1]# mkdir subdir{1..3}           
      [root@dev dir1]# lfs path2fid . subdir*          
      .: [0x240000402:0x1:0x0]                         
      subdir1: [0x240000402:0x2:0x0]                   
      subdir2: [0x240000402:0x3:0x0]                   
      subdir3: [0x240000402:0x4:0x0]                   
      

      Get the dir block:

      [root@dev dir1]# debugfs -c -R 'ls -l REMOTE_PARENT_DIR/0x240000402:0x1:0x0' /dev/mapper/mds2_flakey  
      debugfs 1.47.1-wc2 (08-Nov-2024)                                                                      
        25046   40755 (2)      0      0    4096 12-Jun-2025 17:22 .                                         
        25001   40755 (18)      0      0    4096 12-Jun-2025 17:21 ..                                       
        25047   40755 (18)      0      0    4096 12-Jun-2025 17:22 subdir1                                  
        25048   40755 (18)      0      0    4096 12-Jun-2025 17:22 subdir2                                  
        25049   40755 (18)      0      0    4096 12-Jun-2025 17:22 subdir3                                  
      
      [root@dev dir1]# debugfs -c -R 'stat REMOTE_PARENT_DIR/0x240000402:0x1:0x0' /dev/mapper/mds2_flakey            
      debugfs 1.47.1-wc2 (08-Nov-2024)                                                                               
      Inode: 25046   Type: directory    Mode:  0755   Flags: 0x0                                                     
      Generation: 20094447    Version: 0x00000001:00000003                                                           
      User:     0   Group:     0   Project:     0   Size: 4096                                                       
      File ACL: 0                                                                                                    
      Links: 5   Blockcount: 8                                                                                       
      Fragment:  Address: 0    Number: 0    Size: 0                                                                  
       ctime: 0x684af0a8:00000000 -- Thu Jun 12 17:22:16 2025                                                        
       atime: 0x684af089:00000000 -- Thu Jun 12 17:21:45 2025                                                        
       mtime: 0x684af0a8:00000000 -- Thu Jun 12 17:22:16 2025                                                        
      crtime: 0x684af089:6e04d644 -- Thu Jun 12 17:21:45 2025                                                        
      Size of extra inode fields: 32                                                                                 
      Extended attributes:                                                                                           
        lma: fid=[0x240000402:0x1:0x0] compat=0 incompat=4                                                           
        trusted.dmv (48) = d0 0c d3 0c 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 03 00 00 00 0                 
      0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00                                         
        user.job (5) = "lfs.0"                                                                                       
        linkea: idx=0 parent=[0x200000007:0x1:0x0] name='dir1'                                                       
      BLOCKS:                                                                                                        
      (0):16721                                                                                                      
      TOTAL: 1                                                                                                       
      [root@dev ~]# dd if=/dev/mapper/mds2_flakey bs=4K skip=16721 count=1 | xxd | vim -
      0000000: d661 0000 0c00 0102 2e00 0000 a961 0000  .a...........a..
      0000010: 1c00 0212 2e2e 0011 0000 0002 0000 0007  ................
      0000020: 0000 0001 0000 0000 d761 0000 2400 0712  .........a..$...
      0000030: 7375 6264 6972 3100 1100 0000 0240 0004  subdir1......@..
      0000040: 0200 0000 0200 0000 0000 0000 d861 0000  .............a..
      0000050: 2400 0712 7375 6264 6972 3200 1100 0000  $...subdir2.....
      0000060: 0240 0004 0200 0000 0300 0000 0000 0000  .@..............
      0000070: d961 0000 900f 0712 7375 6264 6972 3300  .a......subdir3.
      0000080: 1100 0000 0240 0004 0200 0000 0400 0000  .....@..........
      ....
      

      Create a duplicate entry by manually corrupted the dir block:

      [root@dev ~]# less /tmp/dir_corrupt.xxd
      0000000: d661 0000 0c00 0102 2e00 0000 a961 0000  .a...........a..
      0000010: 1c00 0212 2e2e 0011 0000 0002 0000 0007  ................
      0000020: 0000 0001 0000 0000 d761 0000 2400 0712  .........a..$...
      0000030: 7375 6264 6972 3100 1100 0000 0240 0004  subdir1......@..
      0000040: 0200 0000 0200 0000 0000 0000 d861 0000  .............a..
      0000050: 2400 0712 7375 6264 6972 3100 1100 0000  $...subdir1.....   <------
      0000060: 0240 0004 0200 0000 0300 0000 0000 0000  .@..............
      0000070: d961 0000 900f 0712 7375 6264 6972 3300  .a......subdir3.
      0000080: 1100 0000 0240 0004 0200 0000 0400 0000  .....@..........
      ....
      [root@dev ~]# dd if=<(xxd -r /tmp/dir_corrupt.xxd) of=/dev/mapper/mds2_flakey bs=4K seek=16721 count=1
      [root@dev ~]# debugfs -c -R 'ls -D REMOTE_PARENT_DIR/0x240000402:0x1:0x0' /dev/mapper/mds2_flakey                                                                                                                                             
      debugfs 1.47.1-wc2 (08-Nov-2024)
       25046  (12) .    25001  (28) ..    25047  (36) subdir1   
       25048  (36) subdir1    25049  (3984) subdir3   
      

      Run the latest e2fsck version:

      [root@dev ~]# ~eaujames/e2fsprogs/build/e2fsck/e2fsck -vvvf /dev/mapper/mds2_flakey                          
      e2fsck 1.47.2-wc1 (14-Jan-2025)
      Pass 1: Checking inodes, blocks, and sizes
      Pass 2: Checking directory structure
      Duplicate entry 'subdir1' found.
              Marking /REMOTE_PARENT_DIR/0x240000402:0x1:0x0 (25046) to be rebuilt.
      
      Pass 3: Checking directory connectivity
      Pass 3A: Optimizing directories
      Entry 'subdir1' in /REMOTE_PARENT_DIR/0x240000402:0x1:0x0 (25046) has a non-unique filename.
      Rename to subdir~0<y>? yes
      Pass 4: Checking reference counts
      Pass 5: Checking group summary information
      
      lustre-MDT0001: ***** FILE SYSTEM WAS MODIFIED *****
      ...
      

      subdir~0 dirdata is gone, but the FID extension flag is still set:

      [root@dev ~]# debugfs -c -R 'ls -D REMOTE_PARENT_DIR/0x240000402:0x1:0x0' /dev/mapper/mds2_flakey 
      debugfs 1.47.1-wc2 (08-Nov-2024)
       25046  (12) .    25001  (28) ..    25047  (16) subdir~0   
       25048  (36) subdir1    25049  (4004) subdir3   
      dd if=/dev/mapper/mds2_flakey bs=4K skip=16721 count=1 | xxd | vim -
      0000000: d661 0000 0c00 0102 2e00 0000 a961 0000  .a...........a..
      0000010: 1c00 0212 2e2e 0011 0000 0002 0000 0007  ................
      0000020: 0000 0001 0000 0000 d761 0000 1000 0812  .........a......
      0000030: 7375 6264 6972 7e30 d861 0000 2400 0712  subdir~0.a..$... <----
      0000040: 7375 6264 6972 3100 1100 0000 0240 0004  subdir1......@..
      0000050: 0200 0000 0300 0000 0000 0000 d961 0000  .............a..
      0000060: a40f 0712 7375 6264 6972 3300 1100 0000  ....subdir3.....
      0000070: 0240 0004 0200 0000 0400 0000 0000 0000  .@..............
      ....
      d761 0000 1000 0812
      inode: 0x000061d7 (25047)
      rec_len: 0x0010 (16)
      name_len: 0x08
      file_type: 0x12  (dirdata extension flag: 0x10 (EXT2_DIRENT_LUFID), type: 0x2)
      

      I cannot reproduce this for all the entry name.

      Attachments

        Issue Links

          Activity

            People

              eaujames Etienne Aujames
              eaujames Etienne Aujames
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: