Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11854

Dir operation on DNE2 are slower than DNE1 or non DNE

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • 2.12.0
    • 3
    • 9223372036854775807

    Description

      Directory operations (create and removal) on stripe dir with DNE2 has been significantly slower (more than 50x) than non DNE configuration and DNE1. Here is test results.

      Non DNE configuration

       [root@c01 ~]# mkdir /scratch1/nodne 
       [root@c01 ~]# salloc -N 32 --ntasks-per-node=24 mpirun -np 768 --allow-run-as-root /work/tools/bin/mdtest -n 1000 -u -vv -d /scratch1/nodne 
      SUMMARY: (of 1 iterations)
         Operation                      Max            Min           Mean        Std Dev
         ---------                      ---            ---           ----        -------
         Directory creation:      97259.037      97259.037      97259.037          0.000
         Directory stat    :     347749.775     347749.775     347749.775          0.000
         Directory removal :      96064.306      96064.306      96064.306          0.000
         File creation     :     111509.920     111509.920     111509.920          0.000
         File stat         :     317489.856     317489.856     317489.856          0.000
         File read         :     183132.719     183132.719     183132.719          0.000
         File removal      :     166205.620     166205.620     166205.620          0.000
         Tree creation     :         29.571         29.571         29.571          0.000
         Tree removal      :         26.353         26.353         26.353          0.000
      V-1: Entering print_timestamp...
      

      DNE1 configuration

       [root@c01 ~]# lfs mkdir -i 0 /scratch1/mdt0
       [root@c01 ~]# lfs mkdir -i 1 /scratch1/mdt1
       [root@c01 ~]# salloc -N 32 --ntasks-per-node=24 mpirun -np 768 --allow-run-as-root /work/tools/bin/mdtest -n 1000 -u -vv -d /scratch1/mdt0@/scratch1/mdt1
       SUMMARY: (of 1 iterations)
         Operation                      Max            Min           Mean        Std Dev
         ---------                      ---            ---           ----        -------
         Directory creation:     189546.945     189546.945     189546.945          0.000
         Directory stat    :     688947.817     688947.817     688947.817          0.000
         Directory removal :     255838.417     255838.417     255838.417          0.000
         File creation     :     203077.460     203077.460     203077.460          0.000
         File stat         :     692292.941     692292.941     692292.941          0.000
         File read         :     350911.938     350911.938     350911.938          0.000
         File removal      :     339358.198     339358.198     339358.198          0.000
         Tree creation     :         38.326         38.326         38.326          0.000
         Tree removal      :         43.928         43.928         43.928          0.000
      V-1: Entering print_timestamp...
      

      DNE2 configuration

       [root@c01 ~]# lfs setdirstripe -c 2 /scratch1/stripedir
       [root@c01 ~]# lfs setdirstripe -c 2 -D /scratch1/stripedir
       [root@c01 ~]# salloc -N 32 --ntasks-per-node=24 mpirun -np 768 --allow-run-as-root /work/tools/bin/mdtest -n 1000 -u -vv -d /scratch1/stripedir
       SUMMARY: (of 1 iterations)
         Operation                      Max            Min           Mean        Std Dev
         ---------                      ---            ---           ----        -------
         Directory creation:       6585.023       6585.023       6585.023          0.000  <----- 
         Directory stat    :     247222.630     247222.630     247222.630          0.000
         Directory removal :       3554.469       3554.469       3554.469          0.000  <-----
         File creation     :     250751.232     250751.232     250751.232          0.000
         File stat         :     566094.009     566094.009     566094.009          0.000
         File read         :     362051.872     362051.872     362051.872          0.000
         File removal      :     298873.347     298873.347     298873.347          0.000
         Tree creation     :         13.632         13.632         13.632          0.000
         Tree removal      :          4.830          4.830          4.830          0.000
      V-1: Entering print_timestamp...
      

      Attachments

        Issue Links

          Activity

            [LU-11854] Dir operation on DNE2 are slower than DNE1 or non DNE

            Do we have update for this?

            rdruon Raphael Druon added a comment - Do we have update for this?
            laisiyao Lai Siyao added a comment -

            Okay, I'll implement it and add a tunable option for this.

            laisiyao Lai Siyao added a comment - Okay, I'll implement it and add a tunable option for this.
            ofaaland Olaf Faaland added a comment -

            if the involved MDTs are not located on the same MDS, this may be optimized ... <redacted> ... But this seems not true on current deployment, there are often more than one MDTs on an MDS.

            Hi Lai,
            I'm not sure whose deployment you're referring to, but at least some sites would benefit from that optimization. Our file systems have 16 MDTs each on their own MDS. And if that optimization were implemented and worked well, we would take it into account when designing new file systems.

            ofaaland Olaf Faaland added a comment - if the involved MDTs are not located on the same MDS, this may be optimized ... <redacted> ... But this seems not true on current deployment, there are often more than one MDTs on an MDS. Hi Lai, I'm not sure whose deployment you're referring to, but at least some sites would benefit from that optimization. Our file systems have 16 MDTs each on their own MDS. And if that optimization were implemented and worked well, we would take it into account when designing new file systems.
            laisiyao Lai Siyao added a comment -

            Striped directory creation and removal will start distributed transaction, if the involved MDTs are not located on the same MDS, this may be optimized: if one MDT failed, distributed transactions can be recovered from logs on other MDTs, thus the dependencies between distributed transactions can be removed. This means, only if a distributed transaction depends on a local transaction, it needs to commit the local transaction. But this seems not true on current deployment, there are often more than one MDTs on an MDS.

            laisiyao Lai Siyao added a comment - Striped directory creation and removal will start distributed transaction, if the involved MDTs are not located on the same MDS, this may be optimized: if one MDT failed, distributed transactions can be recovered from logs on other MDTs, thus the dependencies between distributed transactions can be removed. This means, only if a distributed transaction depends on a local transaction, it needs to commit the local transaction. But this seems not true on current deployment, there are often more than one MDTs on an MDS.

            LU-11999 still doesn't solve problem. Dir creation and removal operations in the striped directory are still much slower than remote Dir.

            sihara Shuichi Ihara added a comment - LU-11999 still doesn't solve problem. Dir creation and removal operations in the striped directory are still much slower than remote Dir.

            Could this be fixed by LU-11999?

            adilger Andreas Dilger added a comment - Could this be fixed by LU-11999 ?
            laisiyao Lai Siyao added a comment -

            It's hard to improve this in a short time, because a striped directory creation will often trigger old transactions (previous creations) commit to make DNE recovery easier, which means it causes sync on all MDTs. Without changing DNE recovery, we can't do much right now.

            laisiyao Lai Siyao added a comment - It's hard to improve this in a short time, because a striped directory creation will often trigger old transactions (previous creations) commit to make DNE recovery easier, which means it causes sync on all MDTs. Without changing DNE recovery, we can't do much right now.

            OK, but 100K to 6k dir creations 110K to 3k dir removal, is that level performance drop expected? If this is expected, that's fine when striped dir is enabled by default, but I wonder if we could get a bit better performance for resonable performance on dir operation when stripe dir enabled.

            sihara Shuichi Ihara added a comment - OK, but 100K to 6k dir creations 110K to 3k dir removal, is that level performance drop expected? If this is expected, that's fine when striped dir is enabled by default, but I wonder if we could get a bit better performance for resonable performance on dir operation when stripe dir enabled.

            Ihara, setting lfs setdirstripe -c 2 -D /scratch1/stripedir causes every subdirectory to also be created with 2 stripes, which triggers distributed transactions and is definitely slower than creating a local 1-stripe directory. That is why we are working on the dynamic restriping, so that the directory can be created with 1 stripe at the start, and only move to DNE2 striping if it is needed.

            adilger Andreas Dilger added a comment - Ihara, setting lfs setdirstripe -c 2 -D /scratch1/stripedir causes every subdirectory to also be created with 2 stripes, which triggers distributed transactions and is definitely slower than creating a local 1-stripe directory. That is why we are working on the dynamic restriping, so that the directory can be created with 1 stripe at the start, and only move to DNE2 striping if it is needed.
            pjones Peter Jones added a comment -

            Lai

            Could you please investigate?

            Ihara

            Could you please share some more details about the configuration and the test script used?

            Peter

            pjones Peter Jones added a comment - Lai Could you please investigate? Ihara Could you please share some more details about the configuration and the test script used? Peter

            People

              laisiyao Lai Siyao
              sihara Shuichi Ihara
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated: