Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7689

limit lu_site hash table size on clients

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.9.0
    • None
    • 9223372036854775807

    Description

      lu_site_init() will allocate a hash table during the setup of both client and osd,
      which use the same default formula: we assume the lu_site cache can take up to 20% of total memory.

      It makes sense for osd but on client, we are allocating a ~128M hash table on a 32G box per mount. To make it worse we are mounting multiple lustre fs on a box so it can take ~128M * mounts of memory.

      We have lu_cache_percent as a module param but we should limit the hash table size by default on clients.

      Attachments

        Activity

          [LU-7689] limit lu_site hash table size on clients

          Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18048/
          Subject: LU-7689 obdclass: limit lu_site hash table size on clients
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 522c1eb4d2f5faf1fa87be07d9617df1439fc0d6

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18048/ Subject: LU-7689 obdclass: limit lu_site hash table size on clients Project: fs/lustre-release Branch: master Current Patch Set: Commit: 522c1eb4d2f5faf1fa87be07d9617df1439fc0d6

          Dongyang, thanks for sharing the test results.

          yong.fan nasf (Inactive) added a comment - Dongyang, thanks for sharing the test results.

          Here are the results for 1 and 4 stripes:

          1-stripe
          baseline:
          bash-4.1# time ls -l testdir > /dev/null

          real 25m22.670s
          user 1m4.831s
          sys 13m41.840s
          bash-4.1# time ls -l testdir > /dev/null

          real 27m5.056s
          user 1m5.908s
          sys 15m2.135s

          patched:
          bash-4.1# time ls -l testdir > /dev/null

          real 25m42.354s
          user 1m9.786s
          sys 13m57.352s
          bash-4.1# time ls -l testdir > /dev/null

          real 27m3.739s
          user 1m10.196s
          sys 14m59.850s

          4-stripes:

          baseline:
          bash-4.1# time ls -l testdir > /dev/null

          real 41m45.328s
          user 1m14.652s
          sys 32m12.239s
          bash-4.1# time ls -l testdir > /dev/null

          real 46m57.302s
          user 1m15.295s
          sys 36m12.470s

          patched:
          bash-4.1# time ls -l testdir > /dev/null

          real 39m48.663s
          user 1m15.712s
          sys 29m54.248s
          bash-4.1# time ls -l testdir > /dev/null

          real 42m28.805s
          user 1m13.754s
          sys 33m8.527s

          It beats me that with 4 stripes the patched client is actually faster, I did multiple runs to verify that.

          lidongyang Li Dongyang (Inactive) added a comment - Here are the results for 1 and 4 stripes: 1-stripe baseline: bash-4.1# time ls -l testdir > /dev/null real 25m22.670s user 1m4.831s sys 13m41.840s bash-4.1# time ls -l testdir > /dev/null real 27m5.056s user 1m5.908s sys 15m2.135s patched: bash-4.1# time ls -l testdir > /dev/null real 25m42.354s user 1m9.786s sys 13m57.352s bash-4.1# time ls -l testdir > /dev/null real 27m3.739s user 1m10.196s sys 14m59.850s 4-stripes: baseline: bash-4.1# time ls -l testdir > /dev/null real 41m45.328s user 1m14.652s sys 32m12.239s bash-4.1# time ls -l testdir > /dev/null real 46m57.302s user 1m15.295s sys 36m12.470s patched: bash-4.1# time ls -l testdir > /dev/null real 39m48.663s user 1m15.712s sys 29m54.248s bash-4.1# time ls -l testdir > /dev/null real 42m28.805s user 1m13.754s sys 33m8.527s It beats me that with 4 stripes the patched client is actually faster, I did multiple runs to verify that.

          Looking into testing with 1-stripe and 4-stripes files, but I do have a question: how can I create 1-stripe and 4-stripes files using 'createmany'?

          "lfs setstripe -c 1 $TDIR", then all the regular files created under $TDIR (after the setstripe) will be 1-striped unless you specify the file layout manually. Similarly for 4-striped case, specify "-c 4".

          yong.fan nasf (Inactive) added a comment - Looking into testing with 1-stripe and 4-stripes files, but I do have a question: how can I create 1-stripe and 4-stripes files using 'createmany'? "lfs setstripe -c 1 $TDIR", then all the regular files created under $TDIR (after the setstripe) will be 1-striped unless you specify the file layout manually. Similarly for 4-striped case, specify "-c 4".

          Hi all,
          I followed the comment from nash, created 10M files with createmany and here are the results from 0-stripe file:
          without the patch:
          [12:04:58 root@r3:z00] # time ls -l dyl900 > /dev/null

          real 26m57.756s
          user 1m7.876s
          sys 15m39.102s
          [12:32:54 root@r3:z00] # time ls -l dyl900 > /dev/null

          real 27m19.028s
          user 1m6.301s
          sys 15m55.731s

          with the patch:
          [14:15:29 root@r3:z00] # time ls -l dyl900 > /dev/null

          real 25m10.092s
          user 1m6.833s
          sys 13m53.716s
          [14:40:51 root@r3:z00] # time ls -l dyl900 > /dev/null

          real 28m52.916s
          user 1m8.438s
          sys 16m9.032s

          Looking into testing with 1-stripe and 4-stripes files, but I do have a question: how can I create 1-stripe and 4-stripes files using 'createmany'?

          lidongyang Li Dongyang (Inactive) added a comment - Hi all, I followed the comment from nash, created 10M files with createmany and here are the results from 0-stripe file: without the patch: [12:04:58 root@r3:z00] # time ls -l dyl900 > /dev/null real 26m57.756s user 1m7.876s sys 15m39.102s [12:32:54 root@r3:z00] # time ls -l dyl900 > /dev/null real 27m19.028s user 1m6.301s sys 15m55.731s with the patch: [14:15:29 root@r3:z00] # time ls -l dyl900 > /dev/null real 25m10.092s user 1m6.833s sys 13m53.716s [14:40:51 root@r3:z00] # time ls -l dyl900 > /dev/null real 28m52.916s user 1m8.438s sys 16m9.032s Looking into testing with 1-stripe and 4-stripes files, but I do have a question: how can I create 1-stripe and 4-stripes files using 'createmany'?
          yong.fan nasf (Inactive) added a comment - - edited

          mdtest cannot handle the client-side cache properly since it is general test tool, not Lustre special. I would suggest to do the following test for the performance of traversing directory:

          1) Assume the test directory is TDIR, under the TDIR, generate 10M regular files, you can do that via 'touch' (slow), or via Lustre test tool 'createmany' (faster than touch).
          2) After generating the test data set, remount the client to drop the client side cache. Then "ls -l TDIR" to measure the performance without cache.
          3) And then "ls -l TDIR" again to measure the performance with cache.

          You can measure the performance with and without your patch, then compare them. Usually, we use multiple clients to generate the test data set in parallel to increase the create speed. 0-stripe file is the most fast case. Depends on you test environment and time, you also can consider to measure 1-stripe and 4-stirpes cases.

          yong.fan nasf (Inactive) added a comment - - edited mdtest cannot handle the client-side cache properly since it is general test tool, not Lustre special. I would suggest to do the following test for the performance of traversing directory: 1) Assume the test directory is TDIR, under the TDIR, generate 10M regular files, you can do that via 'touch' (slow), or via Lustre test tool 'createmany' (faster than touch). 2) After generating the test data set, remount the client to drop the client side cache. Then "ls -l TDIR" to measure the performance without cache. 3) And then "ls -l TDIR" again to measure the performance with cache. You can measure the performance with and without your patch, then compare them. Usually, we use multiple clients to generate the test data set in parallel to increase the create speed. 0-stripe file is the most fast case. Depends on you test environment and time, you also can consider to measure 1-stripe and 4-stirpes cases.

          I've done mdtest with 16 processes, 262144 objects each, 3 iterations.
          That gives us 4 million objects each iteration.

          without the patch:

          mdtest-1.9.3 was launched with 16 total task(s) on 1 node(s)
          Command line used: /system/Benchmarks/mdtest/1.9.3/mdtest -d /mnt/testfs/z00/dyl900/mdtest -n 262144 -i 3
          Path: /mnt/testfs/z00/dyl900
          FS: 854.4 TiB Used FS: 0.6% Inodes: 854.5 Mi Used Inodes: 0.0%

          16 tasks, 4194304 files/directories

          SUMMARY: (of 3 iterations)
          Operation Max Min Mean Std Dev
          --------- — — ---- -------
          Directory creation: 4169.451 4049.008 4113.665 49.569
          Directory stat : 8553.647 7678.810 8243.482 399.929
          Directory removal : 2312.643 1684.909 1941.188 268.901
          File creation : 3714.316 3666.937 3686.580 20.171
          File stat : 7936.584 7511.260 7787.747 195.697
          File read : 8772.322 7397.213 8055.384 562.922
          File removal : 2814.176 2177.893 2579.797 285.497
          Tree creation : 3472.106 2173.215 2956.409 562.996
          Tree removal : 7.550 6.808 7.093 0.326

          with the patch applied:
          mdtest-1.9.3 was launched with 16 total task(s) on 1 node(s)
          Command line used: /system/Benchmarks/mdtest/1.9.3/mdtest -d /mnt/testfs/z00/dyl900/mdtest -n 262144 -i 3
          Path: /mnt/testfs/z00/dyl900
          FS: 854.4 TiB Used FS: 0.6% Inodes: 854.5 Mi Used Inodes: 0.0%

          16 tasks, 4194304 files/directories

          SUMMARY: (of 3 iterations)
          Operation Max Min Mean Std Dev
          --------- — — ---- -------
          Directory creation: 4258.673 4128.302 4178.702 57.184
          Directory stat : 8203.775 8060.866 8147.118 61.982
          Directory removal : 2356.021 1868.107 2173.262 217.180
          File creation : 3683.373 3608.999 3655.445 33.067
          File stat : 8058.992 7698.923 7898.330 149.529
          File read : 8838.700 8575.945 8680.637 113.714
          File removal : 2857.897 2262.444 2657.039 279.036
          Tree creation : 4132.319 2755.784 3630.101 620.513
          Tree removal : 7.673 6.568 7.151 0.453

          and the hash_bd_depmax is 130 without the patch, 221 while applied.

          Is there any other particular benchmark I should run?
          Thanks

          lidongyang Li Dongyang (Inactive) added a comment - I've done mdtest with 16 processes, 262144 objects each, 3 iterations. That gives us 4 million objects each iteration. without the patch: mdtest-1.9.3 was launched with 16 total task(s) on 1 node(s) Command line used: /system/Benchmarks/mdtest/1.9.3/mdtest -d /mnt/testfs/z00/dyl900/mdtest -n 262144 -i 3 Path: /mnt/testfs/z00/dyl900 FS: 854.4 TiB Used FS: 0.6% Inodes: 854.5 Mi Used Inodes: 0.0% 16 tasks, 4194304 files/directories SUMMARY: (of 3 iterations) Operation Max Min Mean Std Dev --------- — — ---- ------- Directory creation: 4169.451 4049.008 4113.665 49.569 Directory stat : 8553.647 7678.810 8243.482 399.929 Directory removal : 2312.643 1684.909 1941.188 268.901 File creation : 3714.316 3666.937 3686.580 20.171 File stat : 7936.584 7511.260 7787.747 195.697 File read : 8772.322 7397.213 8055.384 562.922 File removal : 2814.176 2177.893 2579.797 285.497 Tree creation : 3472.106 2173.215 2956.409 562.996 Tree removal : 7.550 6.808 7.093 0.326 with the patch applied: mdtest-1.9.3 was launched with 16 total task(s) on 1 node(s) Command line used: /system/Benchmarks/mdtest/1.9.3/mdtest -d /mnt/testfs/z00/dyl900/mdtest -n 262144 -i 3 Path: /mnt/testfs/z00/dyl900 FS: 854.4 TiB Used FS: 0.6% Inodes: 854.5 Mi Used Inodes: 0.0% 16 tasks, 4194304 files/directories SUMMARY: (of 3 iterations) Operation Max Min Mean Std Dev --------- — — ---- ------- Directory creation: 4258.673 4128.302 4178.702 57.184 Directory stat : 8203.775 8060.866 8147.118 61.982 Directory removal : 2356.021 1868.107 2173.262 217.180 File creation : 3683.373 3608.999 3655.445 33.067 File stat : 8058.992 7698.923 7898.330 149.529 File read : 8838.700 8575.945 8680.637 113.714 File removal : 2857.897 2262.444 2657.039 279.036 Tree creation : 4132.319 2755.784 3630.101 620.513 Tree removal : 7.673 6.568 7.151 0.453 and the hash_bd_depmax is 130 without the patch, 221 while applied. Is there any other particular benchmark I should run? Thanks

          Li Dongyang (dongyang.li@anu.edu.au) uploaded a new patch: http://review.whamcloud.com/18048
          Subject: LU-7689 obdclass: limit lu_site hash table size on clients
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: d715c3ad1be054f52c9787472fae9d16b5e6f94e

          gerrit Gerrit Updater added a comment - Li Dongyang (dongyang.li@anu.edu.au) uploaded a new patch: http://review.whamcloud.com/18048 Subject: LU-7689 obdclass: limit lu_site hash table size on clients Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: d715c3ad1be054f52c9787472fae9d16b5e6f94e

          People

            wc-triage WC Triage
            lidongyang Li Dongyang (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: