[LU-6800] Significant performance regression with patch LU-5264 Created: 04/Jul/15  Updated: 13/Jun/18  Resolved: 14/Sep/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Critical
Reporter: Shuichi Ihara (Inactive) Assignee: Bruno Faccini (Inactive)
Resolution: Fixed Votes: 0
Labels: patch
Environment:

master


Issue Links:
Duplicate
Related
is related to LU-5264 ASSERTION( info->oti_r_locks == 0 ) a... Resolved
is related to LU-8130 Migrate from libcfs hash to rhashtable Open
is related to LU-6823 Performance regression on servers wit... Resolved
is related to LU-11089 Performance improvements for lu_objec... Resolved
Severity: 2
Rank (Obsolete): 9223372036854775807

 Description   

Durding our performance testing, we found siginicant metadata performance regression with LU-5264 on master.

# mpirun -np 128 -ppn 4 -hostfile ./hostfile /work/tools/bin/mdtest -n 1000 -p 10 -i 5 -d /scratch1/mdtest.out

master

SUMMARY: (of 5 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   Directory creation:      39552.671      33039.129      37024.828       2875.617
   Directory stat    :      33462.417      29340.691      31662.586       1384.330
   Directory removal :      40938.777      40238.677      40571.960        283.701
   File creation     :      17696.663      17209.531      17542.185        171.470
   File stat         :      33892.041      33429.312      33680.603        170.577
   File read         :      11284.121      11012.694      11220.417        104.978
   File removal      :      39718.200      39449.348      39556.254         90.590
   Tree creation     :       4583.939        700.335       3652.356       1487.449
   Tree removal      :        170.563        156.738        162.935          5.172

keep client version, but revert patch 42fdf8355791cb682c6120f7950bb2ecd50f97aa (LU-5264 obdclass: fix race during key quiescency) on servers.

SUMMARY: (of 5 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   Directory creation:      44937.511      42117.095      43780.402       1335.927
   Directory stat    :     135310.427     129560.951     133625.293       2077.128
   Directory removal :      51525.499      46852.534      49965.297       1611.759
   File creation     :      42978.506      41435.145      42413.409        586.294
   File stat         :     135882.699     133344.886     134466.144        977.577
   File read         :     121788.787     111332.613     116374.190       3351.730
   File removal      :      84827.815      78120.995      80378.741       2522.662
   Tree creation     :       4650.004       3788.893       4268.099        336.241
   Tree removal      :        198.059        129.234        179.980         25.563


 Comments   
Comment by Bruno Faccini (Inactive) [ 06/Jul/15 ]

I am trying to setup a test platform to understand the impact of my original patch from LU-5264 on lu_keys_guard lock usage.

Comment by James A Simmons [ 06/Jul/15 ]

I noticed this performance regression as well. In fact so far when testing a DNE2 directory striped across 2 MDS it performance worst than when just using one MDS.

Comment by Gerrit Updater [ 10/Jul/15 ]

Gu Zheng (gzheng@ddn.com) uploaded a new patch: http://review.whamcloud.com/15558
Subject: LU-6800 obdclass: change spinlock of key to rwlock
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 56ee901b05afbe3eab2fda3eb2a055025f3a7779

Comment by Gerrit Updater [ 19/Jul/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15558/
Subject: LU-6800 obdclass: change spinlock of key to rwlock
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 96d773d993cd48a069da4098b87da7d9ef0dd52e

Comment by Peter Jones [ 19/Jul/15 ]

Landed for 2.8

Comment by Shuichi Ihara (Inactive) [ 19/Jul/15 ]

As far as we know, http://review.whamcloud.com/15558/ is not perfect. It helps to get perforamnce back on most of metadata operation, but the file read operation is still slow before appled LU-5264.
I will post benchmark resutls soon.

Comment by Aurelien Degremont (Inactive) [ 20/Jul/15 ]

FYI, at CEA, we faced heavy load on MDT with several codes. This was introducing bad performance and instability on the filesystem, so we decided to revert the patch from LU-5264 for now, until we get something better.

Comment by Li Xi (Inactive) [ 20/Jul/15 ]

Hi Aurelien,

Did you test with or without 15558? Does it help or still have the same problem?

Comment by Aurelien Degremont (Inactive) [ 20/Jul/15 ]

Unfortunately, we did not test with 15558. Not sure we will be able to do this on the production system.

Comment by Gerrit Updater [ 20/Jul/15 ]

Grégoire Pichon (gregoire.pichon@bull.net) uploaded a new patch: http://review.whamcloud.com/15648
Subject: LU-6800 obdclass: change spinlock of key to rwlock
Project: fs/lustre-release
Branch: b2_5
Current Patch Set: 1
Commit: 5adcce4242802b6be3441b425220bc422926a822

Comment by Bruno Travouillon (Inactive) [ 20/Jul/15 ]

Aurélien,

The issue in the build for bullx has already been reported in duplicate LU-6823. Bull is currently looking at LU-6800 carrefully.

Comment by Shuichi Ihara (Inactive) [ 21/Jul/15 ]

Please re-open LU-6800, we understood http://review.whamcloud.com/15558 helps a lot, but still not all performance back. Here is test resutls. 32 clients, 128 mdtest process.

test1 : master (commit-id: fe60e0135ee2334440247cde167b707b223cf11d) branch (includes LU-5264 and patch 15558 )

# mpirun -np 128 -ppn 4 -hostfile ./hostfile /work/tools/bin/mdtest -i 3 -n 1000 -d /scratch1/mdtest.out

   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   Directory creation:      45237.210      36692.398      40159.293       3669.695
   Directory stat    :     132371.575     129820.230     131383.164       1118.004
   Directory removal :      53873.775      50985.149      52790.576       1285.107
   File creation     :      42732.503      37298.342      40070.221       2219.840
   File stat         :     131527.304     129333.170     130765.529       1013.515
   File read         :      87588.987      67919.964      80344.389       8825.741
   File removal      :      84046.477      80418.268      82668.050       1604.248
   Tree creation     :       4364.520       4032.985       4164.502        143.755
   Tree removal      :        203.587        194.749        200.008          3.799

test2 : master + revert 15558

# mpirun -np 128 -ppn 4 -hostfile ./hostfile /work/tools/bin/mdtest -i 3 -n 1000 -d /scratch1/mdtest.out

   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   Directory creation:      40422.683      20650.668      30457.842       8072.661
   Directory stat    :      33032.600      27110.270      30459.575       2479.308
   Directory removal :      41611.362      39640.289      40887.059        885.442
   File creation     :      17622.819      17537.572      17581.070         34.824
   File stat         :      33991.557      33935.386      33959.396         23.645
   File read         :      11241.112      10994.112      11104.383        102.558
   File removal      :      40024.327      39973.169      39998.669         20.886
   Tree creation     :       4185.932       3705.216       4007.822        215.092
   Tree removal      :        170.327        164.689        167.062          2.386

test3 : master + revert 15558 + revert 13103

# mpirun -np 128 -ppn 4 -hostfile ./hostfile /work/tools/bin/mdtest -i 3 -n 1000 -d /scratch1/mdtest.out

   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   Directory creation:      46423.406      37490.161      43188.774       4041.792
   Directory stat    :     134178.816     126241.328     130085.996       3245.214
   Directory removal :      53737.981      44389.098      50171.405       4125.732
   File creation     :      44199.169      37398.927      40834.020       2776.628
   File stat         :     135524.181     130626.893     132934.894       2009.179
   File read         :     100767.654      76374.732      91483.603      10776.519
   File removal      :      86318.162      82618.862      85021.870       1700.945
   Tree creation     :       4634.590       3557.510       4167.598        451.208
   Tree removal      :        201.814        194.397        197.894          3.043

If we compare test3 and test2 resutls, test2 results are significant bad which means patch 13103 caused this performance regression.
GuZhang at DDN pushed patch 15558 and as far as we can see test1 results, perforamnce was back expect "file read' operation.
So, patch 15558 helps a lot, but even that, we still see perforamnce regression on "file read" operation. We need more investigate on this to back everything performance back.

Comment by Bruno Faccini (Inactive) [ 21/Jul/15 ]

Since I am the creator of patch for LU-5264 and thus the unfortunate guilty of this situation, and based on the fact that DDN team has already produced a very good but partial fix, I would like to work more actively and fix this last read performance regression.

Aurelien, Bruno, since the multi-client competition seems to be the main cause to trigger the issue, could it be possible for me to directly work with you on a site where you heavily hit this problem ?

Comment by Bruno Travouillon (Inactive) [ 04/Aug/15 ]

We have removed patch for LU-5264 from all our file systems. We will discuss the ability to give a try with the current fix for LU-6800 by the end of the month on a test file system.

I will keep you in touch.

Comment by Bruno Faccini (Inactive) [ 28/Aug/15 ]

First tests running with patch #15558, at TGCC site, does not show the same read perfs regression.
Site will soon provide their numbers for this ticket.
More instrumentations will be done.

Comment by Bruno Travouillon (Inactive) [ 31/Aug/15 ]

mdtest have run in restricted 2 and restricted 3 state, respectively without patch and with all patches....

In actual state (revert patch LU-5264 and LU-6049)

$ mpirun -n 128 xx/mdtest -n 1000 -p 10 -i 5 -d xx/run_MDTest_repro3
-- started at 08/13/2015 15:26:03 --

mdtest-1.9.3 was launched with 128 total task(s) on 8 node(s)
Command line used: ./mdtest -n 1000 -p 10 -i 5 -d ./run_MDTest_repro3
Path: xxxxxxxxxxx
FS: 155.1 TiB   Used FS: 7.9%   Inodes: 154.1 Mi   Used Inodes: 0.3%

128 tasks, 128000 files/directories

SUMMARY: (of 5 iterations)
   Operation                      Max            Min Mean        Std Dev
   ---------                      ---            --- ----        -------
   Directory creation:      12044.248       3622.254 7693.558       2787.514
   Directory stat    :      29808.509      28605.578 29428.433        434.277
   Directory removal :      16316.172      15596.360 16069.509        271.041
   File creation     :       8304.285       2475.372 5950.888       2378.499
   File stat         :      28493.314      28090.363 28265.330        130.886
   File read         :      15694.955      15170.723 15435.999        181.395
   File removal      :      15253.714      14426.384 14981.075        305.384
   Tree creation     :       3077.259       1170.939 1855.926        653.607
   Tree removal      :         95.066         61.637 77.186         11.245

-- finished at 08/13/2015 15:34:26 --

With LU-5264, LU-6049 and LU-6800:

$ mpirun -n 128 xx/mdtest -n 1000 -p 10 -i 5 -d xx/run_MDTest_repro2
 -- started at 08/13/2015 15:04:09 --

mdtest-1.9.3 was launched with 128 total task(s) on 8 node(s)
Command line used: ./mdtest -n 1000 -p 10 -i 5 -d ./run_MDTest_repro2
Path: xxxxxxxxxxxxx
FS: 155.1 TiB   Used FS: 7.9%   Inodes: 154.0 Mi   Used Inodes: 0.3%

128 tasks, 128000 files/directories

SUMMARY: (of 5 iterations)
    Operation                      Max            Min Mean Std Dev
    ---------                      ---            --- ---- -------
    Directory creation:      11815.599       6041.768 8297.495       2021.031
    Directory stat    :      29708.108      29290.724 29475.438        147.864
    Directory removal :      16459.019      16182.934 16283.041         93.778
    File creation     :       8561.213       8407.310 8496.989         57.227
    File stat         :      28579.728      28018.041 28328.611        184.786
    File read         :      15066.452      14786.594 14943.476         98.652
    File removal      :      14821.486      14289.802 14645.054        190.972
    Tree creation     :       2746.761       1234.708 1675.881        558.742
    Tree removal      :         63.032         51.565 58.417          3.900

-- finished at 08/13/2015 15:11:16 --

We do not observe significant difference but the tests were launched with 8 nodes only.
We expect a test with 32 nodes and more by the end of the month.

Comment by Andreas Dilger [ 31/Aug/15 ]

Ihara, looking at your test results it seems that the mean performance of the original results (before LU-5264) and the results after the LU-6800 patch are very close, within the standard deviation for the tests:
BEFORE

master + revert 15558 + revert 13103
# mpirun -np 128 -ppn 4 -hostfile ./hostfile /work/tools/bin/mdtest -i 3 -n 1000 -d /scratch1/mdtest.out

   Operation                    Mean        Std Dev
   ---------                      ----        -------
   Directory creation:       40159.293       3669.695
   Directory stat    :      131383.164       1118.004
   Directory removal :       52790.576       1285.107
   File creation     :       40070.221       2219.840
   File stat         :      130765.529       1013.515
   File read         :       80344.389       8825.741
   File removal      :       82668.050       1604.248
   Tree creation     :        4164.502        143.755
   Tree removal      :         200.008          3.799

AFTER

master (commit-id: fe60e0135ee2334440247cde167b707b223cf11d, includes LU-5264 and patch 15558)
# mpirun -np 128 -ppn 4 -hostfile ./hostfile /work/tools/bin/mdtest -i 3 -n 1000 -d /scratch1/mdtest.out

   Operation                      Mean        Std Dev
   ---------                      ----        -------
   Directory creation:       43188.774       4041.792
   Directory stat    :      130085.996       3245.214
   Directory removal :       50171.405       4125.732
   File creation     :       40834.020       2776.628
   File stat         :      132934.894       2009.179
   File read         :       91483.603      10776.519
   File removal      :       85021.870       1700.945
   Tree creation     :        4167.598        451.208
   Tree removal      :         197.894          3.043

The mean Directory removal and Directory stat operations are somewhat slower, but this is within the standard deviation of the three test runs. Conversely, the Directory create, File create, and File removal operations are faster, but are also within the standard deviation of the three test runs.

For the File read it appears that the results are highly variable (stddev more than 10% of the mean). Is this performance loss seen with IO benchmarks like IOR or only the mdtest? What size of files is mdtest using?

Comment by Shuichi Ihara (Inactive) [ 02/Sep/15 ]

We only hit this performance regression on mdtest and all test file size are zero byte.
And, we agreed patch http://review.whamcloud.com/15558 helped and the performance was back even with patch LU-5264, but except "file read" operation.
We still don't know why read operation doesn't come back with patch 15558.

Comment by Peter Jones [ 14/Sep/15 ]

ok then let's close this ticket for now and if we need to make future improvements to read operations track that separately

Comment by James A Simmons [ 07/May/18 ]

With the potential move to rhashtable which have lockless lookups we might be able to resolve these performance issues.

Generated at Sat Feb 10 02:03:21 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.