Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15555

directories may become corrupted when 10+ mln files are created

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.15.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Typical symptoms include missing files, i.e.

      [root@test-vm tests]# ./createmany -o /mnt/tmp/createmany-file- 14000000
       - open/close 350000 (time 1644568594.64 total 2.00 last 174814.88)
       - open/close 650000 (time 1644568596.67 total 4.04 last 147388.92)
       - open/close 930000 (time 1644568598.73 total 6.09 last 136137.04)
       - open/close 1210000 (time 1644568600.74 total 8.11 last 138965.69)
       - open/close 1500000 (time 1644568602.76 total 10.12 last 143914.53)
       - open/close 1780000 (time 1644568604.82 total 12.19 last 135819.94)
       - open/close 2060000 (time 1644568606.88 total 14.25 last 135719.88)
       - open/close 2330000 (time 1644568608.90 total 16.27 last 133777.80)
       - open/close 2600000 (time 1644568610.95 total 18.31 last 132000.68)
       - open/close 2870000 (time 1644568612.96 total 20.33 last 133912.64)
       - open/close 3130000 (time 1644568614.98 total 22.35 last 128862.02)
       - open/close 3390000 (time 1644568617.01 total 24.38 last 128003.34)
       - open/close 3650000 (time 1644568619.08 total 26.44 last 125926.41)
       - open/close 3910000 (time 1644568621.13 total 28.50 last 126551.23)
       - open/close 4170000 (time 1644568623.19 total 30.56 last 125985.41)
       - open/close 4420000 (time 1644568625.19 total 32.56 last 124969.56)
       - open/close 4680000 (time 1644568627.23 total 34.60 last 127712.23)
       - open/close 4940000 (time 1644568629.26 total 36.63 last 128046.32)
       - open/close 5200000 (time 1644568631.28 total 38.64 last 129023.11)
       - open/close 5460000 (time 1644568633.31 total 40.68 last 127510.29)
       - open/close 5720000 (time 1644568635.33 total 42.70 last 128703.38)
       - open/close 5970000 (time 1644568637.35 total 44.71 last 124299.51)
       - open/close 6220000 (time 1644568639.39 total 46.76 last 122203.14)
       - open/close 6470000 (time 1644568641.43 total 48.79 last 122784.30)
       - open/close 6720000 (time 1644568643.51 total 50.87 last 120314.82)
       - open/close 6960000 (time 1644568645.51 total 52.87 last 119883.77)
       - open/close 7210000 (time 1644568647.53 total 54.89 last 123773.22)
       - open/close 7460000 (time 1644568649.53 total 56.90 last 124582.72)
       - open/close 7690000 (time 1644568651.54 total 58.90 last 114817.96)
       - open/close 7940000 (time 1644568653.61 total 60.97 last 120831.67)
       - open/close 8180000 (time 1644568655.68 total 63.04 last 116004.68)
       - open/close 8400000 (time 1644568657.72 total 65.08 last 107834.20)
       - open/close 8620000 (time 1644568659.77 total 67.13 last 107313.41)
       - open/close 8840000 (time 1644568661.84 total 69.21 last 105934.61)
       - open/close 9050000 (time 1644568663.85 total 71.22 last 104573.29)
       - open/close 9260000 (time 1644568665.93 total 73.29 last 101112.23)
       - open/close 9470000 (time 1644568668.01 total 75.37 last 100937.03)
       - open/close 9680000 (time 1644568670.03 total 77.40 last 103621.84)
       - open/close 9910000 (time 1644568672.07 total 79.44 last 112830.38)
       - open/close 10130000 (time 1644568674.11 total 81.48 last 107944.73)
       - open/close 10350000 (time 1644568676.19 total 83.56 last 105920.89)
       - open/close 10580000 (time 1644568678.21 total 85.57 last 114009.48)
       - open/close 10820000 (time 1644568680.27 total 87.64 last 116104.81)
       - open/close 11030000 (time 1644568682.29 total 89.66 last 103861.11)
       - open/close 11260000 (time 1644568684.37 total 91.74 last 110763.36)
       - open/close 11490000 (time 1644568686.48 total 93.85 last 108965.23)
       - open/close 11690000 (time 1644568688.49 total 95.86 last 99409.99)
       - open/close 11910000 (time 1644568690.53 total 97.90 last 107825.48)
       - open/close 12110000 (time 1644568692.60 total 99.96 last 96926.18)
       - open/close 12320000 (time 1644568694.62 total 101.98 last 103975.48)
       - open/close 12540000 (time 1644568696.69 total 104.06 last 106035.08)
       - open/close 12730000 (time 1644568698.73 total 106.09 last 93327.57)
       - open/close 12940000 (time 1644568700.80 total 108.17 last 101114.63)
       - open/close 13160000 (time 1644568702.83 total 110.20 last 108580.25)
       - open/close 13410000 (time 1644568704.90 total 112.26 last 121061.98)
       - open/close 13650000 (time 1644568706.97 total 114.34 last 115627.60)
       - open/close 13880000 (time 1644568708.98 total 116.35 last 114575.15)
      total: 14000000 open/close in 117.35 seconds: 119301.22 ops/second
      [root@test-vm tests]# ls -1U /mnt/tmp/ | wc -l
      858342

      and e2fsck errors such as

      Problem in HTREE directory inode 2: block #141889 not referenced
      Problem in HTREE directory inode 2: block #141890 has invalid depth (2)
      Problem in HTREE directory inode 2: block #141890 has bad max hash
      Problem in HTREE directory inode 2: block #141890 not referenced
      Problem in HTREE directory inode 2: block #141891 has invalid depth (2)
      Problem in HTREE directory inode 2: block #141891 has bad max hash
      Problem in HTREE directory inode 2: block #141891 not referenced
      Problem in HTREE directory inode 2: block #141892 has invalid depth (2)
      Problem in HTREE directory inode 2: block #141892 has bad max hash
      Problem in HTREE directory inode 2: block #141892 not referenced
      Problem in HTREE directory inode 2: block #141893 has invalid depth (2)
      Problem in HTREE directory inode 2: block #141893 has bad max hash
      Problem in HTREE directory inode 2: block #141893 not referenced
      Problem in HTREE directory inode 2: block #141894 has invalid depth (2)
      Problem in HTREE directory inode 2: block #141894 has bad max hash
      Problem in HTREE directory inode 2: block #141894 not referenced
      Problem in HTREE directory inode 2: block #141895 has invalid depth (2)
      Problem in HTREE directory inode 2: block #141895 has bad max hash
      Problem in HTREE directory inode 2: block #141895 not referenced 

      The bug is caused by the incorrect indirect levels update in ext4-kill-dx-root.patch, e.g.

      @@ -2360,8 +2364,9 @@ again:
                              /* Set up root */
                              dx_set_count(entries, 1);
                              dx_set_block(entries + 0, newblock);
      -                       dxroot = (struct dx_root *)frames[0].bh->b_data;
      -                       dxroot->info.indirect_levels += 1;
      +                       info = dx_get_dx_info((struct ext4_dir_entry_2 *)
      +                                             frames[0].bh->b_data);
      +                       info->indirect_levels = 1;
                              dxtrace(printk(KERN_DEBUG
                                             "Creating %d level index...\n",
                                             dxroot->info.indirect_levels));
      
      
      
      
      

      So the indirect levels number cannot get above 1.

      For pre-RHEL8 kernels, indirect_levels are properly corrected to +1 by ext4-large-dir.patch. However, RHEL8 and later kernels seem to be affected.

      Attachments

        Activity

          People

            panda Andrew Perepechko
            panda Andrew Perepechko
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: