[LU-9972] Performance regressions on unique directory removal Created: 11/Sep/17  Updated: 01/Mar/18  Resolved: 06/Feb/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.11.0, Lustre 2.10.4

Type: Bug Priority: Major
Reporter: Shuichi Ihara (Inactive) Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: None
Environment:

2.10 (and 2.11)


Attachments: Microsoft Word LU-9972 (2.7.64_X vs 2.10.54) 2017-10-26.xlsx     Microsoft Word LU-9972 (2.7.64_X vs 2.10.54) Oct 19th 2017.xlsx    
Issue Links:
Related
is related to LU-7053 Do not use osd_object_find in osd_ind... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

There is a performance regression on dir removal.

Server and client : RHEL7.3
Lustre version : 2.10.52
Backend filesystem: ldiskfs

mpirun --allow-run-as-root /work/tools/bin/mdtest -n 5000 -v -d /scratch0/mdtest.out -D -i 3 -p 10 -w 0 -u

SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   Directory creation:      89757.381      65618.928      74607.900      10774.356
   Directory stat    :     320946.433     319888.242     320294.264        465.749
   Directory removal :      19028.569      17837.487      18351.200        499.838
   Tree creation     :        434.446        158.826        318.943        116.860
   Tree removal      :         27.018         25.210         26.281          0.775


 Comments   
Comment by Andreas Dilger [ 12/Sep/17 ]

Compared to which version/kernel?

Comment by Shuichi Ihara (Inactive) [ 12/Sep/17 ]

For example lustre-2.7(IEEL3.0)/CentOS7.3

SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   Directory creation:      46577.991      42249.894      44871.081       1881.494
   Directory stat    :     373243.136     367643.706     370043.774       2354.791
   Directory removal :      78530.701      66152.245      72781.092       5091.584
   File creation     :     107283.764      96953.405     103118.187       4447.973
   File stat         :     385082.155     375112.919     379387.910       4191.828
   File read         :     185463.654     177089.199     182367.310       3750.818
   File removal      :     127467.768     113218.809     122566.251       6612.256
   Tree creation     :        349.409         91.996        262.234        120.388
   Tree removal      :         20.765         18.039         19.132          1.176

I'm going to test lustre-2.9 to compare.

Comment by Shuichi Ihara (Inactive) [ 16/Sep/17 ]

Sorry delay response. I needed to change hardware configuration, but here is new results on b2_10 (2.10.1_RC1).
rmdir to unique directories is obviously slow compared to same benchmark to a shared directory.

mpirun -np 128 mdtest -n 5000 -v -d /scratch0/mdtest.out -i 3 -p 30 -D (for shared directory )
mpirun -np 128 mdtest -n 5000 -v -d /scratch0/mdtest.out -i 3 -p 30 -D -u (for unique directory )
32 clients, 128 processes. both of them were collected on exact same hardware configuration.

Here is a directory operations to a shared directory.

SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   Directory creation:      91979.485      69249.863      79842.797       9343.315
   Directory stat    :     197008.811     180039.999     189342.439       7023.422
   Directory removal :     140527.764     128798.718     133567.639       5032.803
   Tree creation     :       5462.720       1034.229       3084.207       1822.788
   Tree removal      :         92.639         74.702         86.019          8.041

And here is unique directory's results.

  
SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   Directory creation:      84094.691      75575.177      80444.764       3583.407
   Directory stat    :     463370.743     431285.266     448299.685      13170.724
   Directory removal :      18722.965      18461.182      18558.573        116.903
   Tree creation     :        593.577        310.356        472.213        119.117
   Tree removal      :         37.275         33.999         35.691          1.340
V-1: Entering timestamp...
Comment by Andreas Dilger [ 18/Sep/17 ]

Cliff, do we have similar mdtest results from the performance test cluster, in particular 2.10.0/1, 2.10.52/53, and 2.9.x? That would give us a ballpark of where this performance regression has been introduced, and allow git bisect to narrow it down to a particular patch.

Comment by Shuichi Ihara (Inactive) [ 19/Sep/17 ]

I think the problem has been exist in b2_9 at least.
Here is results of same test on b2_9. (keep 2.10.1 for client, but just chnged server with b2_9)

mpirun -np 128 mdtest -n 5000 -v -d /scratch0/mdtest.out -i 3 -p 30 -D -u (unique directory)

SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   Directory creation:      91409.935      72314.781      84242.568       8491.267
   Directory stat    :     184806.326     183688.542     184367.927        487.111
   Directory removal :      20718.518      20303.157      20555.893        181.147
   Tree creation     :        552.285        400.441        473.117         62.160
   Tree removal      :         40.413         29.341         35.321          4.563
V-1: Entering timestamp...

mpirun -np 128 mdtest -n 5000 -v -d /scratch0/mdtest.out -i 3 -p 30 -D (shared directory)

SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   Directory creation:      70310.000      45717.790      58926.161      10122.282
   Directory stat    :     178080.331     175913.598     176783.485        934.667
   Directory removal :      86194.900      72838.446      79018.261       5498.119
   Tree creation     :       5527.274       2804.821       3744.496       1261.231
   Tree removal      :         80.959         24.059         61.936         26.784
V-1: Entering timestamp...
Comment by Cliff White (Inactive) [ 19/Sep/17 ]

Our hardware config has changed a bit since 2.9, we have seen noticeable improvements since changing the tuned-adm profile. Of course all our old results are on Sharepoint:
If you look at the most current spreadsheet, you will see the jump in Dir rm with the tuned-adm change:
http://tinyurl.com/ydzx7gxp

If you look at our last EE 3.0 runs from June 2017, you will see Dir rm is 4x better. (b_ieel3_0 build 214) So I would look at some deltas there: http://tinyurl.com/yanedznq

Comment by Andreas Dilger [ 19/Sep/17 ]

Results from master builds (18 threads 16 client mdtestfpp) :

Build          Version        Dir create   Dir stat      Dir rm
master-3596    2.9.58_22      21514        188773        11822
master-3598    2.9.58_57      21570        209599        11653
master-3601    2.9.59         21063        223101        11879
master-2607    2.8.59-35      19328        211813        11797
master-3637    2.10.52_83     25987        234954        15033

Results from EE builds:

Build          Version        Dir create   Dir stat      Dir rm
b_ieel3_0-105  2.7.18         21239        167421        73112
b_ieel3_0-89   2.7.16.1       19758        169050        77119
b_ieel3_0-204  2.7.19.12      28330        288267        59444
b_ieel3_0-214  2.7.20.2       28136        331563        60515
Comment by Peter Jones [ 20/Sep/17 ]

Saurabh

Please can you narrow down where the change occurred?

Thanks

Peter

Comment by Andreas Dilger [ 20/Sep/17 ]

Discussed this with Saurabh and Cliff. Cliff thinks the problem may date back to DNE2 landings, since EE 2.7 predates the DNE2 changes, and they appeared as early as 2.8.0.

Saurabh will try a git bisect starting with v2_7_50 (== 2.7.0) to see if that has good performance on our test cluster (good ~= 70k rmdir/sec) and go from there. We would like to keep the kernel version the same, at RHEL 7.4, to avoid potential interference with the results from changing the kernel or other configuration options.

Comment by Gerrit Updater [ 21/Sep/17 ]

Saurabh Tandan (saurabh.tandan@intel.com) uploaded a new patch: https://review.whamcloud.com/29126
Subject: LU-9972 tests: Build required for LU-9972
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 708fec1e34058ec735be819217498f8cc90aa924

Comment by Shuichi Ihara (Inactive) [ 10/Oct/17 ]

Any progress on finding regression point?

Comment by Saurabh Tandan (Inactive) [ 10/Oct/17 ]

Still working on it Shuichi, will soon post some findings.

Comment by Peter Jones [ 20/Oct/17 ]

Alex

I daresay that Saurabh may elaborate but I understand that he has found that your patch LU-7053 (osd: don't lookup object at insert https://review.whamcloud.com/#/c/17092/) is the one that introduced the performance regression with directory removal

Do you have any ideas on how to avoid this?

Peter

Comment by Saurabh Tandan (Inactive) [ 20/Oct/17 ]

There was approximately a drop of 90% in performance fir "Dir removal" for "mdtestfpp" results from tag 2.7.65. Following is the data for all the runs:

Tag                             Dir removal
2.7.56                          18298
2.7.57                         121954
2.7.61                          64849
2.7.64                         111655 good
2.7.64-g63a3e412 (LU-7419)      74384 good
2.7.64-gc965fc8a (LU-7450)      72374 good
2.7.64-g6765d785 (LU-7408)      92029 good
2.7.64-g9ae3a289 (LU-7053)      11517 bad
2.7.64-g0d3a07a8 (LU-7430)      15114 bad
2.7.64-g959f8f78 (LU-7573)      11530 bad 
2.7.65                          11375 bad
2.7.66                          11403 bad
2.10.53                         12473
2.10.54                          9649


Comment by John Hammond [ 20/Oct/17 ]

There must have been more runs than just these if you were able to isolate https://review.whamcloud.com/#/c/17092/.

Comment by Andreas Dilger [ 20/Oct/17 ]

I've updated the results to show the commit-order test results for the bisect (not the bisect order), to show there is a clear break between LU-7408 and the next patch LU-7053.

Comment by Alex Zhuravlev [ 23/Oct/17 ]

I tried to revert LU-7053 on master and got no difference for rmdir:
with LU-7053: total: 100000 unlinks in 16 seconds: 6250.000000 unlinks/second
without LU-7053: total: 100000 unlinks in 16 seconds: 6250.000000 unlinks/second
the load was simulated with createmany/unlinkmany

going to proceed with mdtest, but that will take some time.

Comment by Gerrit Updater [ 23/Oct/17 ]

Alex Zhuravlev (alexey.zhuravlev@intel.com) uploaded a new patch: https://review.whamcloud.com/29709
Subject: LU-9972 osd: cache OI mapping in dt_ref_

{del|add}

()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ccb8803749e9234e4bd6e5e451a3333d6b17e594

Comment by Alex Zhuravlev [ 23/Oct/17 ]

please, try with the patch above.

Comment by Saurabh Tandan (Inactive) [ 26/Oct/17 ]

Performance results with patch above:

MDTEST RESULTS
000: SUMMARY: (of 3 iterations)
000:    Operation                      Max            Min           Mean        Std Dev
000:    ---------                      ---            ---           ----        -------
000:    Directory creation:      24200.233      15693.923      18542.903       4000.371
000:    Directory stat    :     235384.333     230225.191     233266.050       2204.926
000:    Directory removal :      82809.228      30178.490      51633.084      22559.256
000:    File creation     :      50508.075      33979.054      41331.172       6870.202
000:    File stat         :     232294.752     226222.165     228931.104       2521.978
000:    File read         :     124620.945     110484.961     119398.361       6333.672
000:    File removal      :     128159.287      77027.802     106534.012      21605.386
000:    Tree creation     :        124.922         65.746        103.702         26.901
000:    Tree removal      :          9.709          7.841          8.650          0.783

Results for dir removal with this patch have surely improved but still not to the mark where they were before LU-7053 was landed.
Test run for Lustre version just before landing of LU-7053 showed performance number for Dir removal as "92029.569" (mean) .

Comment by Andreas Dilger [ 26/Oct/17 ]
Build          Version             Dir create   Dir stat   Dir rm
b_ieel3_0-89   2.7.16.1                 19758     169050    77119 
b_ieel3_0-105  2.7.18                   21239     167421    73112
b_ieel3_0-214  2.7.20.2                 28136     331563    60515
master-3601    2.9.59                   21063     223101    11879
master-2607    2.8.59-35                19328     211813    11797
master-3637    2.10.52_83               25987     234954    15033
master         2.10.54                  18676	  210767    10282
review-51618   2.10.54-20-g66bb2d1      18542     233266    51633

So it looks like the rmdir performance is significantly improved, but the mkdir performance is down a bit from 2.10.53, but about on par with 2.10.54. It looks like the file create/stat/unlink performance is a bit lower vs. 2.10.54, but definitely still a lot better than 2.7.64.

Alex, would it make sense to only add directory objects to the cache, instead of adding all objects?  That may give us the best of both worlds.

Comment by Alex Zhuravlev [ 27/Oct/17 ]

Andreas, at the moment I don't quite understand how the cache can decrease performance as it's TLS, lockless, tiny and it replaces a lookup in LU cache which in contrast much larger and needs locking. I keep investigating..

Comment by Alex Zhuravlev [ 27/Oct/17 ]

please, benchmark master with https://review.whamcloud.com/#/c/29821/
this patch reverts just LU-7053.

Comment by John Hammond [ 27/Oct/17 ]

Before LU-7053, when calling osd_index_declare_ea_insert() from orph_declare_index_insert(), we only looked up the FID of mdd->mdd_orphans in the FLDB to determine if it was local or remote (so that we could declare additional credits in the remote case). Now we do osd_idc_find_or_init() which included OI lookup in the local case. I understand that we may need the extra information if osd_index_ea_insert() gets called to add the directory to the orphan dir. But I assume that we are not hitting this case from mdtest. So maybe the difference between FLDB lookup and full IDC initialization (and/or contention effects) could explain the difference.

Comment by Alex Zhuravlev [ 27/Oct/17 ]

IDC is lockless and per-thread while FLDB is a shared structure.

Comment by John Hammond [ 27/Oct/17 ]

Yes, but I'm not talking about the cost of accessing the IDC cache. I mean the extra cost of OI lookup to initialize the IDC entry.

Comment by Alex Zhuravlev [ 27/Oct/17 ]

usually it's initialized from other preceding methods (like osd_declare_ref_

{add|del}

in https://review.whamcloud.com/29709
with no extra lookups.

Comment by Alex Zhuravlev [ 27/Oct/17 ]

to verify that I added printk() to osd_idc_find_or_init() and got zero calls to osd_remote_fid() and osd_oi_lookup() during rmdir.
there are few calls to osd_remote_fid() (which I hope to fix in a separate patch), but those weren't introduced with LU-7053

Comment by Saurabh Tandan (Inactive) [ 27/Oct/17 ]

Results of patch https://review.whamcloud.com/#/c/29821/ which reverts just LU-7053:

MIB RESULTS
MDTEST RESULTS
000: SUMMARY: (of 3 iterations)
000:    Operation                      Max            Min           Mean        Std Dev
000:    ---------                      ---            ---           ----        -------
000:    Directory creation:      21177.729      16443.602      18498.262       1982.554
000:    Directory stat    :     232409.626     229897.859     230874.272       1098.955
000:    Directory removal :     111897.307      40906.454      75055.964      29044.331
000:    File creation     :      44061.464      38438.052      41863.997       2454.596
000:    File stat         :     225941.782     193254.640     210777.086      13448.211
000:    File read         :     146598.308      86669.775     126562.125      28208.247
000:    File removal      :     155512.154     106886.927     136397.357      21168.452
000:    Tree creation     :        120.178         56.129         96.113         28.468
000:    Tree removal      :          8.906          8.122          8.620          0.354
000:

Dir removal for this run has been "75055.964" (mean) approx 45% higher in performance in comparison to the patch https://review.whamcloud.com/29709 under exactly same conditions and number of iterations.

Comment by Alex Zhuravlev [ 29/Oct/17 ]

thanks for the data. I've updated https://review.whamcloud.com/29709
please benchmark master with that. the patch collects additional stats and dump it at umount in the following form:
+ if (atomic_read(&o->od_idc_remotes) ||
+ atomic_read(&o->od_idc_oi))
+ printk("%s: %d checks for remote, %d OI lookups\n",
+ o->od_svname,
+ atomic_read(&o->od_idc_remotes),
+ atomic_read(&o->od_idc_oi));

please attach it here if printed.
thanks in advance.

I'm working on a followup patch to optimize calls to osd_remote_fid(), this isn't directly related to LU-7053, but hopefully can fix the issue.

Comment by Alex Zhuravlev [ 14/Nov/17 ]

please, try with the updated patch.

Comment by Joseph Gmitter (Inactive) [ 04/Jan/18 ]

Hi Ihara,

Have you been able to confirm that Alex's patch resolves the issue?

Thanks.
Joe

Comment by Gerrit Updater [ 06/Feb/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29709/
Subject: LU-9972 osd: cache OI mapping in dt_declare_ref_add
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b6e718def348c53759a12afee9450207fc7ab56f

Comment by Peter Jones [ 06/Feb/18 ]

Landed for 2.11

Comment by Gerrit Updater [ 07/Feb/18 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31211
Subject: LU-9972 osd: cache OI mapping in dt_declare_ref_add
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: d334fd8aa117d9a957365bdfb6792bcaaf6533cc

Comment by Gerrit Updater [ 01/Mar/18 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/31211/
Subject: LU-9972 osd: cache OI mapping in dt_declare_ref_add
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 025d0412599ed9381be4a0ab84d190b59fc2c451

Generated at Sat Feb 10 02:30:55 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.