Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15720

imbalanced file creation in 'crush' striped directory

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.16.0
    • Lustre 2.15.0
    • None
    • 3
    • 9223372036854775807

    Description

      performance regressions in stripe directory on 2.15.0 (commit;4d93fd7) were found against b2_14(commit:d4b9557).
      Here is configuration.

      4 x MDS (1 x MDT per MDS)
      4 x OSS (2 x OSS per OSS)
      40 x client
      
      [root@ec01 ~]# mkdir -p /exafs/d0/d1/d2/mdt_stripe/
      [root@ec01 ~]# lfs setdirstripe -c 4 -D /exafs/d0/d1/d2/mdt_stripe/
      [root@ec01 ~]# salloc -p 40n -N 40 --ntasks-per-node=16 mpirun --allow-run-as-root -oversubscribe -mca btl_openib_if_include mlx5_1:1 -x UCX_NET_DEVICES=mlx5_1:1 /work/tools/bin/mdtest -n 2000 -F -i 3 -p 10 -v -d /exafs/d0/d1/d2/mdt_stripe/
      

      Here is test resutls.

      server: version=2.15.0_RC2_22_g4d93fd7
      client: version=2.15.0_RC2_22_g4d93fd7
      
      SUMMARY rate: (of 3 iterations)
         Operation                     Max            Min           Mean        Std Dev
         ---------                     ---            ---           ----        -------
         File creation              103733.203      76276.410      93728.713      15168.101
         File stat                  693152.731     656461.448     671671.960      19132.425
         File read                  259081.462     247951.008     253393.168       5569.308
         File removal               145137.390     142142.699     143590.068       1499.846
         Tree creation                  48.035          1.922         17.475         26.467
         Tree removal                   35.643         15.861         24.045         10.323
      
      server: version=2.14.0_21_gd4b9557
      client: version=2.14.0_21_gd4b9557
      
      SUMMARY rate: (of 3 iterations)
         Operation                     Max            Min           Mean        Std Dev
         ---------                     ---            ---           ----        -------
         File creation              138939.425      81336.388     117014.695      31167.261
         File stat                 1678888.952    1580356.340    1645190.276      56162.463
         File read                  569731.788     528830.155     546121.363      21170.387
         File removal               191837.291     186597.900     188595.661       2832.527
         Tree creation                 120.108          0.986         51.078         61.778
         Tree removal                   40.863         33.203         37.987          4.171
      

      As far as I observed this, it seems to be server side regression since because performance with lustre-2.15 clients + lustre-2.14 was ok below.

      server: version=2.14.0_21_gd4b9557
      client: version=2.15.0_RC2_22_g4d93fd7
      
      SUMMARY rate: (of 3 iterations)
         Operation                     Max            Min           Mean        Std Dev
         ---------                     ---            ---           ----        -------
         File creation              132009.360      74074.615     106514.108      29585.056
         File stat                 1570754.679    1457120.401    1532703.082      65457.038
         File read                  563710.286     540228.432     553871.772      12194.544
         File removal               189557.092     186065.253     187536.946       1809.374
         Tree creation                  54.678          1.883         19.576         30.399
         Tree removal                   42.065         41.677         41.875          0.194
      

      it seems that the following patch where regressions started.

          LU-14459 lmv: change default hash type to crush
          
          Change the default hash type to CRUSH to minimize the number
          of directory entries that need to be migrated.
      
      server: version=2.14.51_197_gf269497
      client: version=2.15.0_RC2_22_g4d93fd7
      
      SUMMARY rate: (of 3 iterations)
         Operation                     Max            Min           Mean        Std Dev
         ---------                     ---            ---           ----        -------
         File creation              148072.690      87600.145     127000.919      34149.618
         File stat                 1523849.471    1388808.972    1441253.182      72393.681
         File read                  562840.721     505515.837     538333.864      29552.364
         File removal               197259.873     191117.823     194934.244       3331.372
         Tree creation                 111.869          1.707         39.426         62.755
         Tree removal                   44.113         30.518         36.562          6.922
      
      server: version=2.14.2.14.51_198_gbb60caa
      client: version=2.15.0_RC2_22_g4d93fd7
      
      SUMMARY rate: (of 3 iterations)
         Operation                     Max            Min           Mean        Std Dev
         ---------                     ---            ---           ----        -------
         File creation               86531.781      63506.794      72790.003      12142.761
         File stat                  808075.643     746570.771     784071.104      32898.551
         File read                  260064.500     249212.881     256291.924       6135.058
         File removal               159592.539     155603.788     157752.556       2012.224
         Tree creation                 120.060          1.138         41.069         68.410
         Tree removal                   37.780         37.263         37.450          0.287
      V-1: Entering PrintTimestamp...
      

      I just found MDT load balancing seems to be not working well after patch. It's unbalanced file distribution across MDTs at create. For instance, here is just file creation test in a stripe directory.

      Before patch (commit:f269497)

      mpirunp -np 640 mdtest -n 2000 -F -C -i 1 -p 10 -v -d /exafs/d0/d1/d2/mdt_stripe/
      
      [root@ec01 ~]# lfs df -i | grep MDT
      exafs-MDT0000_UUID      83050496      320298    82730198   1% /exafs[MDT:0] 
      exafs-MDT0001_UUID      83050496      320283    82730213   1% /exafs[MDT:1] 
      exafs-MDT0002_UUID      83050496      320334    82730162   1% /exafs[MDT:2] 
      exafs-MDT0003_UUID      83050496      320293    82730203   1% /exafs[MDT:3]  

      After patch (commit:bb60caa)

      [root@ec01 ~]# lfs df -i | grep MDT
      exafs-MDT0000_UUID      83050496      192404    82858092   1% /exafs[MDT:0] 
      exafs-MDT0001_UUID      83050496      190698    82859798   1% /exafs[MDT:1] 
      exafs-MDT0002_UUID      83050496      177266    82873230   1% /exafs[MDT:2] 
      exafs-MDT0003_UUID      83050496      720852    82329644   1% /exafs[MDT:3] 
      

      That's why mdtest's numbers was slower since one of MDS/MDT (MDT3 in this case) is more working longer than others. Eventually, mdtest's elapsed time is longer than balanced case.

      Attachments

        Issue Links

          Activity

            People

              adilger Andreas Dilger
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: