[LU-4367] unlink performance regression on lustre-2.5.52 client Created: 09/Dec/13  Updated: 13/Oct/16  Resolved: 12/Nov/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: Lustre 2.7.0

Type: Bug Priority: Critical
Reporter: Shuichi Ihara (Inactive) Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: HB

Attachments: Microsoft Word LU-4367.xlsx     File debugfile     Zip Archive unlinkmany-result.zip    
Issue Links:
Related
is related to LU-5426 Add more controls for open file cache Resolved
is related to LU-4906 rm -rf triggers too much MDS_READPAGE Resolved
is related to LU-5197 A performance regression of "FileRead... Resolved
is related to LU-8019 Openlock breakage Resolved
Epic/Theme: Performance
Severity: 2
Rank (Obsolete): 11951

 Description   

lustre-2.5.52 client (and maybe more old client as well) causes metadata performance (unlink files in the single shared directory) regression.
Here is test results on lustre-2.5.52 clients and lustre-2.4.1 clients. lustre-2.5.52 is running on all servers.

1 x MDS, 4 x OSS (32 x OST) and 16 clients(64 processs, 20000 files per process)

lustre-2.4.1 client

4.1-take2.log
-- started at 12/09/2013 07:31:29 --

mdtest-1.9.1 was launched with 64 total task(s) on 16 node(s)
Command line used: /work/tools/bin/mdtest -d /lustre/dir.0 -n 20000 -F -i 3
Path: /lustre
FS: 1141.8 TiB   Used FS: 0.0%   Inodes: 50.0 Mi   Used Inodes: 0.0%

64 tasks, 1280000 files

SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :      58200.265      56783.559      57589.448        594.589
   File stat         :     123351.857     109571.584     114223.612       6455.043
   File read         :     109917.788      83891.903      99965.718      11472.968
   File removal      :      60825.889      59066.121      59782.774        754.599
   Tree creation     :       4048.556       1971.934       3082.293        853.878
   Tree removal      :         21.269         15.069         18.204          2.532

-- finished at 12/09/2013 07:34:53 --
lustre-2.5.5.2 client

-- started at 12/09/2013 07:13:42 --

mdtest-1.9.1 was launched with 64 total task(s) on 16 node(s)
Command line used: /work/tools/bin/mdtest -d /lustre/dir.0 -n 20000 -F -i 3
Path: /lustre
FS: 1141.8 TiB   Used FS: 0.0%   Inodes: 50.0 Mi   Used Inodes: 0.0%

64 tasks, 1280000 files

SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :      58286.631      56689.423      57298.286        705.112
   File stat         :     127671.818     116429.261     121610.854       4631.684
   File read         :     173527.817     158205.242     166676.568       6359.445
   File removal      :      46818.194      45638.851      46118.111        506.151
   Tree creation     :       3844.458       2576.354       3393.050        578.560
   Tree removal      :         21.383         18.329         19.844          1.247

-- finished at 12/09/2013 07:17:07 --

46K ops/sec (lusre-2.5.52) vs 60K ops/sec (lustre-2.4.1). 25% performance drops on Lustre-2.5.52 compared to Lustre-2.4.1.



 Comments   
Comment by Oleg Drokin [ 09/Dec/13 ]

Did this happen on only 2.5.52, as in 2.5.51 servers were fine? Any chance you can arrive at the patch that introduced this with a bit of git bisect?

Comment by Peter Jones [ 09/Dec/13 ]

Cliff

Have you seen any performance drops like this on Hyperion?

Peter

Comment by Shuichi Ihara (Inactive) [ 10/Dec/13 ]

At least 2.5.0 and 2.5.51 are also fine. It seems something happened between 2.5.51 and 2.5.52. I will try git bisect to find exactly commit which caused this performance differences.

2.5.0 client

-- started at 12/09/2013 15:41:13 --

mdtest-1.9.1 was launched with 64 total task(s) on 16 node(s)
Command line used: /work/tools/bin/mdtest -d /lustre/dir.0 -n 20000 -F -i 3
Path: /lustre
FS: 1141.8 TiB   Used FS: 0.0%   Inodes: 50.0 Mi   Used Inodes: 0.0%

64 tasks, 1280000 files

SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :      56576.814      56173.806      56435.397        185.176
   File stat         :     122978.552     108868.929     115211.059       5847.741
   File read         :     108518.269      86626.909      94978.533       9660.755
   File removal      :      61474.088      59462.447      60343.718        839.925
   Tree creation     :       4253.858       2061.083       3124.005        896.447
   Tree removal      :         22.261         14.862         19.262          3.179

-- finished at 12/09/2013 15:44:39 --

2.5.51 client

-- started at 12/09/2013 16:10:46 --

mdtest-1.9.1 was launched with 64 total task(s) on 16 node(s)
Command line used: /work/tools/bin/mdtest -d /lustre/dir.0 -n 20000 -F -i 3
Path: /lustre
FS: 1141.8 TiB   Used FS: 0.0%   Inodes: 50.0 Mi   Used Inodes: 0.0%

64 tasks, 1280000 files

SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :      57207.432      56112.732      56627.502        449.278
   File stat         :     122587.505     110561.252     115014.601       5382.466
   File read         :     105060.899      90757.318      99241.371       6135.844
   File removal      :      61824.540      59560.836      60470.541        976.093
   Tree creation     :       4096.000       1602.715       3181.058       1120.772
   Tree removal      :         20.478         17.985         19.354          1.032

-- finished at 12/09/2013 16:14:10 --
Comment by Shuichi Ihara (Inactive) [ 10/Dec/13 ]

Here is "git bisect" results.

File removal operation to shared directory

commit Removal(ops/sec) result
98ac0fe3a45dde62759ecaa4c84e6250ac2067f8(HEAD) 46818 bad
2.5.51 61824 good
e9a1f308b5359c2de1fda67816ef662ce727d275 45919 bad
cbab0aa32ed2d21f59aae3a28285b49802b734f2 46917 bad
2b13169cd86b4868730f2c45432645b7d2cc0073 62137 good
a9ae2181f3efd811e17843ebf951b00fb9ea0366 63721 good
12d2b04f2204bc087f380cb214a29c126f50d709 63157 good
b17d23fd01557c0e23f5c3b4eeea237c08fe2bc5 44786 bad
55989b17c7391266740d68e3c62418e184364ed7 46392 bad

55989b17c7391266740d68e3c62418e184364ed7 LU-3544 llite: simplify dentry revalidate
This commit is exactly point where we have been getting metadata performance regression.

And, for double check, I also tested curent HDAD of master branch with revert of LU-3544 patch. Here is result. The removal performance is back.

-- started at 12/09/2013 21:28:02 --

mdtest-1.9.1 was launched with 64 total task(s) on 16 node(s)
Command line used: /work/tools/bin/mdtest -d /lustre/dir.0 -n 20000 -F -i 3
Path: /lustre
FS: 1141.8 TiB   Used FS: 0.0%   Inodes: 50.0 Mi   Used Inodes: 0.0%

64 tasks, 1280000 files

SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :      59437.920      56476.490      58310.121       1307.967
   File stat         :     127083.044     115640.003     120232.454       4936.949
   File read         :     110833.651     100376.278     105721.983       4272.411
   File removal      :      64267.994      63221.494      63591.734        478.906
   Tree creation     :       3533.533       1503.874       2724.054        878.023
   Tree removal      :         21.026         18.468         20.149          1.189

-- finished at 12/09/2013 21:31:17 --
Comment by Peter Jones [ 10/Dec/13 ]

Lai

Are you able to comment?

Thanks

Peter

Comment by Lai Siyao [ 11/Dec/13 ]

Hi Ihara, could you test with createmany and unlinkmany? I'm afraid it's not unlink performance drop, but mdtest causes file revalidation failure and relookup, because 55989b17c7391266740d68e3c62418e184364ed7 LU-3544 llite: simplify dentry revalidate only touches the code path of dentry revalidation. I'll do mdtest locally to reproduce this.

Comment by Shuichi Ihara (Inactive) [ 12/Jan/14 ]

Hi Lai,

Sorry delay response on this. I have tested with createmany and unlinkmany on 16 clients and total 64 processes simultaneously.
Here is summary of results. The number was not impacted whether LU-3544 patch applied or not.

  iteration 1 iteration 2 iteration 3
2.5.52 45491 44416 44200
2.5.52/wo LU3544 patch 44157 43648 44182
Comment by Shuichi Ihara (Inactive) [ 28/Jan/14 ]

Hi Lai, any adviseses and updates of this?

Comment by Lai Siyao [ 13/Feb/14 ]

I don't find any clue yet, will need more time on testing, I'll update the progress next week.

Comment by Lai Siyao [ 27/Feb/14 ]

I tested on different setup, but I didn't see the unlink performance drop. If possible, could you use oprofile to find which function consumes more time for 2.5.52 client?

I noticed that you only tested with a small set of files (20000 total files) and iterated three times. Could you test with more files and only one iteration? And could you also test with one client to see if unlink gets slow?

Comment by Andreas Dilger [ 04/Mar/14 ]

Lai,
what kind of system did you test on? I suspect that this slowdown is only visible with a fast MDS and IB network and enough OSTs so that unlinking the OST objects is not the bottleneck. I don't know that changing the parameters of what is being tested is needed, since there is clearly a slowdown in this test which is significantly larger than the standard deviation between tests (-15000 unlinks/sec with stddev 1000).

Also, I think it is important to note that this is only an issue during unlink, and in fact with it is +3000 unlink/sec faster than 2.5.0/2.5.51 once the LU-3544 patch is reverted. However, it does appear that the open-for-read ("File read" with 0 bytes read) performance is +60000 open/sec faster with the LU-3544 patch applied, which is also important not to lose.

I suspect there is some subtle difference in the new ll_revalidate_dentry() code that is only triggering in the unlink case, possibly forcing an extra RPC to the MDS to revalidate the dentry just before it is being unlinked? Rather than spending time trying to reproduce the performance loss, it might make more sense to just get a debug log of unlink with and without the 55989b17c73912 patch applied and see what the difference is in the callpath and RPCs sent. Hopefully, there is just a minor change that can be done to fix the unlink path and not impact the other performance.

Comment by Lai Siyao [ 07/Mar/14 ]

I tested on three testnodes in Toro, one client, one MDS, two OSS on same OST.

I was suspecting that in LU-3544 patch .revalidate just return 0 if dentry is invalid, and let .lookup to do real lookup, instead of lookup in .revalidate in old code, and this may introduce a small overhead. I'll double check the call trace of unlink to see whether there are extra lookups.

Comment by Lai Siyao [ 17/Mar/14 ]

command like `mdtest -d /lustre/dir.0 -n 20000 -F -i 3` executed following syscalls on each file:
1. creat
2. close
3. stat
4. open
5. I/O
6. close
7. unlink

For old code, syscall open in step 4 called .revalidate(IT_OPEN), which opened file, and close in step 6 called .release and did close the file.
While for new code, .revalidate doesn't execute intent any more, but return 1 directly, and later ll_intent_file_open() will open file with MDS_INODELOCK_OPEN, so close in step 6 doesn't really close the file because open lock is cached. And in step 7 unlink needs to close file before unlinking file, and this is the cause of unlink performance drop.

IMHO this is not a real bug, because no extra RPC was sent, but because mdtest opened file twice, so in new code open lock is fetched. A possible fix might be to add a timestamp so .open can know just now .revalidate(IT_OPEN) was called, so no need to fetch open lock. But I'm not sure this is necessary.

Comment by Shuichi Ihara (Inactive) [ 18/Mar/14 ]

Yes, even this might not be bug, we see perforamnce drop under mdtest IO senario at least. mdtest is one of major benchmark tool for metadata and it's one of metadata scenario. we would be keeping (at least) same performance with newer version of Lustre. If you have idea of workaorund, please share with us. I would like to test them.

Comment by Lai Siyao [ 18/Mar/14 ]

During test I saw other places that can be improved to increase file creation, stat, and maybe read performance, and I composed two patches:
http://review.whamcloud.com/#/c/9696/
http://review.whamcloud.com/#/c/9697/

Would you apply these two patches and get some result?

Comment by Shuichi Ihara (Inactive) [ 18/Mar/14 ]

sure, will test those patches very soon and keep you updates! Thanks a lot, again!

Comment by Shuichi Ihara (Inactive) [ 18/Apr/14 ]

Lai, these patches are broken. can't copy file from local filesystem to Lustre.

[root@r21 tmp]# touch /tmp/a
[root@r21 tmp]# cp /tmp/a /lustre/
cp: cannot create regular file `/lustre/a': File exists

This worked.

[root@r21 tmp]# touch /lustre/a
Comment by Shuichi Ihara (Inactive) [ 18/Apr/14 ]

this is debugfile when the problem happens.

echo "+trace" > /proc/sys/lnet/debug
lctl debug_daemon start /tmp/debuglog 100
touch /tmp/a
cp /tmp/a /lustre
lctl debug_daemon stop
echo "-trace" > /proc/sys/lnet/debug

Comment by Lai Siyao [ 21/Apr/14 ]

Thanks Ihara, patches updated, previously I only tested mdtest, and didn't do a full test because they are intended to get mdtest performance data, and may not be final patches yet, sorry for the trouble made.

Comment by Andreas Dilger [ 25/Apr/14 ]

Lai, it looks like the patches http://review.whamcloud.com/9696 and http://review.whamcloud.com/9697 are improving the open performance, but do not address the unlink performance. Is there something that can be done to improve the unlink performance back to the 2.5.0 level so that 2.6.0 does not have a performance regression?

Comment by Lai Siyao [ 28/Apr/14 ]

The root cause is that revalidate(IT_OPEN) enqueued open lock, so that close is deferred to unlink which caused unlink performance drop, but totally there is no extra RPC. I don't find a clear way to handle this, so I think if we can improve open and stat performance a lot, it's worthwhile keeping the status quo.

Comment by Andreas Dilger [ 06/May/14 ]

It might be possible to combine the close and unlink RPCs (unlink with close flag, or close with unlink flag?) so that the number of RPCs is actually reduced? We already do something similar with early lock cancellation, so it might be possible to do something similar with the close.

Comment by Lai Siyao [ 07/May/14 ]

I've thought of that, but considering the complication of open replay, and possibly SOM, I think it's not a trivial work. I'll think about it more and do some test later (maybe next week).

Comment by Lai Siyao [ 21/May/14 ]

Patch to combine close in unlink RPC: http://review.whamcloud.com/#/c/10398/

Ihara, could you apply this only and get results from mdtest?

Comment by Shuichi Ihara (Inactive) [ 30/May/14 ]

Hi Lai,
it seems that there are several updates after you posted initial patches. please adivse me which patches should be applied?

Comment by Andreas Dilger [ 30/May/14 ]

Lai should confirm, but I think the most important patch for addressing the unlink regression is http://review.whamcloud.com/10398 so that one should be tested first.

There is also a potential improvement in http://review.whamcloud.com/9696 that is next, but it doesn't affect unlink. I think the http://review.whamcloud.com/9697 is too complex to land for 2.6.0 at this point, but if it gives a significant improvement then it could be landed for 2.7.0 and IEEL.

Comment by Andreas Dilger [ 05/Jun/14 ]

Ihara, did you get a chance to test if 10398 fixes the unlink regression? We are ready to land that patch.

Comment by Shuichi Ihara (Inactive) [ 09/Jun/14 ]

I'm testing patches. will post results shortly.

Comment by Andreas Dilger [ 16/Jun/14 ]

Ihara, any chance to post the results from your tests?

Comment by Andreas Dilger [ 23/Jun/14 ]

Hi Ihara, is there a chance for you to post the mdtest results for the testing you did on 06-09 for patch http://review.whamcloud.com/10398 ?

Comment by Shuichi Ihara (Inactive) [ 24/Jun/14 ]

Andreas, sorry dely on this... Here is recent our test resutls.

Configuraiton
1 x MDS, 10 x SSD(RAID10) for MDT, 2 x OSS, 10 x OST(100 x NL-SAS)
32 clients, 64 mdtest threads and total 2.56M files creation/stats/removal
master branch(47cde804ddc9019ff0793229030211d536d0612f)
master branch(47cde804ddc9019ff0793229030211d536d0612f) + patch 10426 + patch 10398

Unique Directory Operation
master branch(47cde804ddc9019ff0793229030211d536d0612f)

mdtest-1.9.3 was launched with 64 total task(s) on 32 node(s)
Command line used: ./mdtest -i 3 -n 40000 -u -d /lustre_test/mdtest.out
Path: /lustre_test
FS: 39.0 TiB   Used FS: 0.0%   Inodes: 50.0 Mi   Used Inodes: 0.0%

64 tasks, 2560000 files/directories

SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   Directory creation:      48811.145      39252.347      42446.699       4500.354
   Directory stat    :     299207.829     290254.504     293619.032       3979.199
   Directory removal :      89250.695      86672.466      88049.098       1059.809
   File creation     :      80325.602      71720.354      76539.450       3588.203
   File stat         :     202533.695     202312.144     202430.663         91.108
   File read         :     224391.556     222667.559     223733.260        760.494
   File removal      :      93977.310      81732.593      89128.915       5313.644
   Tree creation     :        487.540        255.237        408.701        108.529
   Tree removal      :          7.483          7.376          7.416          0.048

Unique Directory Operation
master branch(47cde804ddc9019ff0793229030211d536d0612f) + patch 10426 + patch 10398

mdtest-1.9.3 was launched with 64 total task(s) on 32 node(s)
Command line used: ./mdtest -i 3 -n 40000 -u -d /lustre_test/mdtest.out
Path: /lustre_test
FS: 39.0 TiB   Used FS: 0.0%   Inodes: 50.0 Mi   Used Inodes: 0.0%

64 tasks, 2560000 files/directories

SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   Directory creation:      43529.024      38432.682      40492.505       2192.203
   Directory stat    :     295567.203     248965.236     278082.284      20727.046
   Directory removal :      99851.600      97510.819      98692.187        955.746
   File creation     :      76464.252      61260.049      69836.770       6358.281
   File stat         :     210322.996     203751.172     206953.520       2685.537
   File read         :     227658.211     225535.341     226317.238        952.564
   File removal      :      99144.730      98371.321      98765.310        315.911
   Tree creation     :        454.766        187.656        357.198        120.339
   Tree removal      :          7.494          7.383          7.438          0.045

Shared Directory Operation
master branch(47cde804ddc9019ff0793229030211d536d0612f)

mdtest-1.9.3 was launched with 64 total task(s) on 32 node(s)
Command line used: ./mdtest -i 3 -n 40000 -d /lustre_test/mdtest.out
Path: /lustre_test
FS: 39.0 TiB   Used FS: 0.0%   Inodes: 50.0 Mi   Used Inodes: 0.0%

64 tasks, 2560000 files/directories

SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   Directory creation:      28513.564      27700.587      28038.288        345.860
   Directory stat    :     142617.694     139431.318     141316.628       1364.858
   Directory removal :      60164.271      56562.712      58927.059       1672.450
   File creation     :      34568.359      34000.466      34304.269        233.536
   File stat         :     143387.629     140366.792     141459.265       1367.577
   File read         :     229820.877     222497.139     225426.481       3164.288
   File removal      :      66583.172      58133.175      61494.514       3659.539
   Tree creation     :       4132.319       3398.950       3773.387        299.598
   Tree removal      :         11.422          3.327          7.825          3.365

Shared Directory Operation
master branch(47cde804ddc9019ff0793229030211d536d0612f) + patch 10426 + patch 10398

mdtest-1.9.3 was launched with 64 total task(s) on 32 node(s)
Command line used: ./mdtest -i 3 -n 40000 -d /lustre_test/mdtest.out
Path: /lustre_test
FS: 39.0 TiB   Used FS: 0.0%   Inodes: 50.0 Mi   Used Inodes: 0.0%

64 tasks, 2560000 files/directories

SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   Directory creation:      28132.040      26630.642      27487.773        631.154
   Directory stat    :     136965.055     135597.500     136440.164        601.823
   Directory removal :      58149.733      55110.750      56638.405       1240.713
   File creation     :      33170.783      32710.907      32931.837        188.175
   File stat         :     138870.777     136286.854     137743.643       1080.330
   File read         :     234861.197     224503.115     228594.555       4499.710
   File removal      :      77518.626      69571.564      73940.211       3292.142
   Tree creation     :       4116.098       1102.314       2711.725       1238.885
   Tree removal      :          9.879          4.938          7.854          2.114

We see performance improvements with patch for unlink operations to unique directories as well as shared directory. I will also want to check with lustre-2.5 to comapre. btw, file/directory creation to shared directory is lower than I expected.. I will check later on other lustre version (e.g. b2_5) as well.

Comment by Andreas Dilger [ 24/Jun/14 ]

It appears that the unlink performance has gone up, but the create and stat rate have gone down. Can you please test those two patches separately? If the 10398 patch is fixing the unlink performance without hurting the other performance it could land. It might be that 10426 patch is changing the other performance and needs to be reworked.

Comment by Shuichi Ihara (Inactive) [ 24/Jun/14 ]

First, I tried only 10398 patch, but build fails since OBD_CONNECT_UNLINK_CLOSE is defined in 10426 patch. So, I needed both patches at same time to compile.

BTW, here is same mdtest benchmark on same hardware, but lustre version is 2.5.2RC2.

Unique Directory Operation

mdtest-1.9.3 was launched with 64 total task(s) on 32 node(s)
Command line used: ./mdtest -i 3 -n 40000 -u -d /lustre_test/mdtest.out
Path: /lustre_test
FS: 39.0 TiB   Used FS: 0.0%   Inodes: 50.0 Mi   Used Inodes: 0.0%

64 tasks, 2560000 files/directories

SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   Directory creation:      44031.310      41420.993      43125.128       1205.815
   Directory stat    :     346144.788     329854.059     335352.348       7631.863
   Directory removal :      87592.556      86416.906      87118.114        506.033
   File creation     :      82518.567      64962.637      76375.141       8077.749
   File stat         :     215570.997     209551.901     212205.919       2508.198
   File read         :     151377.930     144487.897     147463.085       2890.255
   File removal      :     105964.879      93215.798     101520.782       5877.335
   Tree creation     :        628.925        410.522        542.680         94.889
   Tree removal      :          8.583          8.013          8.284          0.233

Shared Directory Operation

mdtest-1.9.3 was launched with 64 total task(s) on 32 node(s)
Command line used: ./mdtest -i 3 -n 40000 -d /lustre_test/mdtest.out
Path: /lustre_test
FS: 39.0 TiB   Used FS: 0.0%   Inodes: 50.0 Mi   Used Inodes: 0.0%

64 tasks, 2560000 files/directories

SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   Directory creation:      39463.778      38496.147      38986.389        395.138
   Directory stat    :     143006.039     134919.914     138809.226       3308.300
   Directory removal :      78711.817      76206.632      77846.563       1160.196
   File creation     :      75154.225      70792.633      72674.025       1830.264
   File stat         :     142431.366     138650.545     140623.793       1547.953
   File read         :     134643.457     132249.733     133383.879        981.251
   File removal      :      94311.826      83231.516      89991.676       4841.388
   Tree creation     :       4048.556       3437.954       3743.808        249.278
   Tree removal      :          9.098          4.048          6.792          2.084

Unique directory metadata operation, overall, the result of master + 10398 + 10426 patches are close to 2.5.2RC2 results except directory stats. (stats operation, 2.5 is better than master)
However, metadata operations to a shared directory, most of 2.5.2RC2's numbers are still much higher than master or master + 10398 + 10426 patch's resutls. That's oritinal issue on this ticket, but still big performance gap there. Read operation, master branch much improved against 2.5 branch.

Comment by Oleg Drokin [ 30/Jun/14 ]

So it looks like we have all of this extra file handle caching that should not really be happening at all.

Originally when opencache was implemented - it did cache everything and that resulted in performance drop specifically due to slow lock cancellation.
That's when we decided to only restrict this caching to nfs-originated requests and some races - by only setting the flag in ll_file_intent_open where we could only get via nfs.
Now it appears that this assumption is broken?

I am planning to take a deeper lok to understand what is happening with the cache now.

Comment by Andreas Dilger [ 30/Jun/14 ]

Sorry about my earlier confusion with 10426 - I thought that was a different patch, but I see now that it is required for 10398 to work.

It looks like the 10398 patch does improve the unlink performance, but at the expense of almost every other operation. Since unlink is already faster than create, it doesn't make sense to speed it up and slow down create. It looks like there is also some other change(s) that slowed down the create and stat operations on master compared to 2.5.2.

It doesn't seem reasonable to land 10398 for 2.6.0 at this point.

Comment by Lai Siyao [ 02/Jul/14 ]

Oleg, the cause is simplified revalidate (see 7475), originally revalidate will execute IT_OPEN, but this code is replicate of lookup, and this opened handle can be lost if other client canceled this lock. So 7475 simplified revalidate, which just return 1 if dentry is valid, and let .open to really open file, but this can't be differentiate from NFS export open, so both open after revalidate and NFS export open take open lock.

Comment by Oleg Drokin [ 11/Jul/14 ]

So, it looks like we still can infer if the open originated from vfs or not.

When we come from do_filp_open (this is for real open path), we go through filename_lookup with LOOKUP_OPEN set, when we go through dentry_open, LOOKUP_OPEN is not set.

As such the most brute-force way I see to address this is in ll_revalidate_dentry to always return 0 if LOOKUP_OPEN is set and LOOKUP_CONTINUE is NOT set (i.e. we are looking up last component).
We already do a similar trick for LOOKUP_OPEN|LOOKUP_CONTINUE

BTW, while looking at the ll_revalidate_dentry logic, I think we can improve it quite a bit too in the area of intermediate path component lookup.

All of this is in this patch: http://review.whamcloud.com/11062
Ihara-san, please give it a try to see if it helps for your workload?
This patch passes medium level of my testing (does not include any performance testing).

Comment by Shuichi Ihara (Inactive) [ 11/Jul/14 ]

All of this is in this patch: http://review.whamcloud.com/11062
Ihara-san, please give it a try to see if it helps for your workload?

sure, will test that patches as soon as I can run benchmark. maybe early next week, thanks!

Comment by Cliff White (Inactive) [ 22/Jul/14 ]

I ran the patch on Hyperion, 1,32,64,100 clients. Mdtest dir-per-process and single-shared-dir.
Spreadsheet with graphs attached

Comment by Jodi Levi (Inactive) [ 12/Nov/14 ]

Patches landed to Master. Please reopen ticket if more work is needed.

Generated at Sat Feb 10 01:42:05 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.