Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17182

lctl pool_add is slow when using individual OST

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Using lctl pool_add FS OST0 OST1... is way slower than using an hostlist expression like OST[0-7], see an example below:

      [root@localhost ~]# time lctl pool_add fs.pool_test fs-OST[0-7]
      OST fs-OST0000_UUID added to pool fs.pool_test
      OST fs-OST0001_UUID added to pool fs.pool_test
      OST fs-OST0002_UUID added to pool fs.pool_test
      OST fs-OST0003_UUID added to pool fs.pool_test
      OST fs-OST0004_UUID added to pool fs.pool_test
      OST fs-OST0005_UUID added to pool fs.pool_test
      OST fs-OST0006_UUID added to pool fs.pool_test
      OST fs-OST0007_UUID added to pool fs.pool_test
      
      real    0m9.008s
      user    0m0.000s
      sys     0m0.007s
      [root@localhost ~]# time lctl pool_add fs.pool_test OST0000 OST0001 OST0002 OST0003 OST0004 OST0005 OST0006 OST0007
      OST fs-OST0000_UUID added to pool fs.pool_test
      OST fs-OST0001_UUID added to pool fs.pool_test
      OST fs-OST0002_UUID added to pool fs.pool_test
      OST fs-OST0003_UUID added to pool fs.pool_test
      OST fs-OST0004_UUID added to pool fs.pool_test
      OST fs-OST0005_UUID added to pool fs.pool_test
      OST fs-OST0006_UUID added to pool fs.pool_test
      OST fs-OST0007_UUID added to pool fs.pool_test
      
      real    1m7.024s
      user    0m0.004s
      sys     0m0.014s
      

      One could expect that both command to have the same runtime

      Attachments

        Activity

          [LU-17182] lctl pool_add is slow when using individual OST

          "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55810
          Subject: LU-17182 utils: pool_add send OSTs in one batch
          Project: fs/lustre-release
          Branch: b2_15
          Current Patch Set: 1
          Commit: 1dbb4a80fcba73d960d2c76389b3ddb17bb2a1a7

          gerrit Gerrit Updater added a comment - "Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55810 Subject: LU-17182 utils: pool_add send OSTs in one batch Project: fs/lustre-release Branch: b2_15 Current Patch Set: 1 Commit: 1dbb4a80fcba73d960d2c76389b3ddb17bb2a1a7
          pjones Peter Jones added a comment -

          Landed for 2.16

          pjones Peter Jones added a comment - Landed for 2.16

          "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52654/
          Subject: LU-17182 utils: pool_add send OSTs in one batch
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: c8963c4935168c749896664e40aa4d11be90e0c3

          gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52654/ Subject: LU-17182 utils: pool_add send OSTs in one batch Project: fs/lustre-release Branch: master Current Patch Set: Commit: c8963c4935168c749896664e40aa4d11be90e0c3

          "Feng Lei <flei@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52654
          Subject: LU-17182 utils: pool_add send OSTs in one batch
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: c6e77f54c450d2ca5283af05e54d171006e25a70

          gerrit Gerrit Updater added a comment - "Feng Lei <flei@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52654 Subject: LU-17182 utils: pool_add send OSTs in one batch Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c6e77f54c450d2ca5283af05e54d171006e25a70

          I was going to say that the main problem is that the "add one at a time" case doesn't know whether there will be later commands run or not. They are all treated separately, and the command explicitly waits for the pool layout to be updated. That was added because the pool command would return without the actual update, and give the admin a false impression that the pool had been successfully updated, when it wasn't always the case.

          One option would be to add an "--async" option to the pool commands that skips calling check_pool_cmd_result(), so that if you know a number of them will be executed (e.g. from EMF) that the preliminary ones are executed without waiting (and have a low chance of being done incorrectly by a user), and EMF can check the results afterward.

          However, I now see that the second case is not "lctl pool_add ...; lctl pool_add ...; ..." but rather a single execution with multiple OSTs on the command-line. It should be possible to handle this with a single check at the end. It looks like jt_pool_cmd() would need to build up the OST list and execute all of the adds (removes) at once, instead of calling check_pool_cmd_result() for each argument separately.

          adilger Andreas Dilger added a comment - I was going to say that the main problem is that the "add one at a time" case doesn't know whether there will be later commands run or not. They are all treated separately, and the command explicitly waits for the pool layout to be updated. That was added because the pool command would return without the actual update, and give the admin a false impression that the pool had been successfully updated, when it wasn't always the case. One option would be to add an " --async " option to the pool commands that skips calling check_pool_cmd_result() , so that if you know a number of them will be executed (e.g. from EMF) that the preliminary ones are executed without waiting (and have a low chance of being done incorrectly by a user), and EMF can check the results afterward. However , I now see that the second case is not " lctl pool_add ...; lctl pool_add ...; ... " but rather a single execution with multiple OSTs on the command-line. It should be possible to handle this with a single check at the end. It looks like jt_pool_cmd() would need to build up the OST list and execute all of the adds (removes) at once, instead of calling check_pool_cmd_result() for each argument separately.
          joe.grund Joe Grund added a comment -

          adilger Possible to treat the second version (individual OST lists) the same as the first (indexset)? I.E. both versions run in parallel?

          joe.grund Joe Grund added a comment - adilger Possible to treat the second version (individual OST lists) the same as the first (indexset)? I.E. both versions run in parallel?

          Yes, this is kind of a "known" problem - the "lctl pool*" commands are all waiting on the update of the pool status to arrive at the client, which is asynchronous and takes several seconds to complete. You can see that with the individual OST list that it takes 67s, which is almost exactly 8x 9s taken to do all of them at once.

          adilger Andreas Dilger added a comment - Yes, this is kind of a "known" problem - the " lctl pool* " commands are all waiting on the update of the pool status to arrive at the client, which is asynchronous and takes several seconds to complete. You can see that with the individual OST list that it takes 67s, which is almost exactly 8x 9s taken to do all of them at once.
          pjones Peter Jones added a comment -

          Feng Lei

          Could you please investigate?

          Thanks

          Peter

          pjones Peter Jones added a comment - Feng Lei Could you please investigate? Thanks Peter

          People

            flei Feng Lei
            rdruon Raphael Druon
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: