[LU-4367] unlink performance regression on lustre-2.5.52 client Created: 09/Dec/13 Updated: 13/Oct/16 Resolved: 12/Nov/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0 |
| Fix Version/s: | Lustre 2.7.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Shuichi Ihara (Inactive) | Assignee: | Lai Siyao |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | HB | ||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Epic/Theme: | Performance | ||||||||||||||||||||
| Severity: | 2 | ||||||||||||||||||||
| Rank (Obsolete): | 11951 | ||||||||||||||||||||
| Description |
|
lustre-2.5.52 client (and maybe more old client as well) causes metadata performance (unlink files in the single shared directory) regression. 1 x MDS, 4 x OSS (32 x OST) and 16 clients(64 processs, 20000 files per process) lustre-2.4.1 client 4.1-take2.log -- started at 12/09/2013 07:31:29 -- mdtest-1.9.1 was launched with 64 total task(s) on 16 node(s) Command line used: /work/tools/bin/mdtest -d /lustre/dir.0 -n 20000 -F -i 3 Path: /lustre FS: 1141.8 TiB Used FS: 0.0% Inodes: 50.0 Mi Used Inodes: 0.0% 64 tasks, 1280000 files SUMMARY: (of 3 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 58200.265 56783.559 57589.448 594.589 File stat : 123351.857 109571.584 114223.612 6455.043 File read : 109917.788 83891.903 99965.718 11472.968 File removal : 60825.889 59066.121 59782.774 754.599 Tree creation : 4048.556 1971.934 3082.293 853.878 Tree removal : 21.269 15.069 18.204 2.532 -- finished at 12/09/2013 07:34:53 -- lustre-2.5.5.2 client -- started at 12/09/2013 07:13:42 -- mdtest-1.9.1 was launched with 64 total task(s) on 16 node(s) Command line used: /work/tools/bin/mdtest -d /lustre/dir.0 -n 20000 -F -i 3 Path: /lustre FS: 1141.8 TiB Used FS: 0.0% Inodes: 50.0 Mi Used Inodes: 0.0% 64 tasks, 1280000 files SUMMARY: (of 3 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 58286.631 56689.423 57298.286 705.112 File stat : 127671.818 116429.261 121610.854 4631.684 File read : 173527.817 158205.242 166676.568 6359.445 File removal : 46818.194 45638.851 46118.111 506.151 Tree creation : 3844.458 2576.354 3393.050 578.560 Tree removal : 21.383 18.329 19.844 1.247 -- finished at 12/09/2013 07:17:07 -- 46K ops/sec (lusre-2.5.52) vs 60K ops/sec (lustre-2.4.1). 25% performance drops on Lustre-2.5.52 compared to Lustre-2.4.1. |
| Comments |
| Comment by Oleg Drokin [ 09/Dec/13 ] | ||||||||||||||||||||||||||||||
|
Did this happen on only 2.5.52, as in 2.5.51 servers were fine? Any chance you can arrive at the patch that introduced this with a bit of git bisect? | ||||||||||||||||||||||||||||||
| Comment by Peter Jones [ 09/Dec/13 ] | ||||||||||||||||||||||||||||||
|
Cliff Have you seen any performance drops like this on Hyperion? Peter | ||||||||||||||||||||||||||||||
| Comment by Shuichi Ihara (Inactive) [ 10/Dec/13 ] | ||||||||||||||||||||||||||||||
|
At least 2.5.0 and 2.5.51 are also fine. It seems something happened between 2.5.51 and 2.5.52. I will try git bisect to find exactly commit which caused this performance differences. 2.5.0 client -- started at 12/09/2013 15:41:13 -- mdtest-1.9.1 was launched with 64 total task(s) on 16 node(s) Command line used: /work/tools/bin/mdtest -d /lustre/dir.0 -n 20000 -F -i 3 Path: /lustre FS: 1141.8 TiB Used FS: 0.0% Inodes: 50.0 Mi Used Inodes: 0.0% 64 tasks, 1280000 files SUMMARY: (of 3 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 56576.814 56173.806 56435.397 185.176 File stat : 122978.552 108868.929 115211.059 5847.741 File read : 108518.269 86626.909 94978.533 9660.755 File removal : 61474.088 59462.447 60343.718 839.925 Tree creation : 4253.858 2061.083 3124.005 896.447 Tree removal : 22.261 14.862 19.262 3.179 -- finished at 12/09/2013 15:44:39 -- 2.5.51 client -- started at 12/09/2013 16:10:46 -- mdtest-1.9.1 was launched with 64 total task(s) on 16 node(s) Command line used: /work/tools/bin/mdtest -d /lustre/dir.0 -n 20000 -F -i 3 Path: /lustre FS: 1141.8 TiB Used FS: 0.0% Inodes: 50.0 Mi Used Inodes: 0.0% 64 tasks, 1280000 files SUMMARY: (of 3 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 57207.432 56112.732 56627.502 449.278 File stat : 122587.505 110561.252 115014.601 5382.466 File read : 105060.899 90757.318 99241.371 6135.844 File removal : 61824.540 59560.836 60470.541 976.093 Tree creation : 4096.000 1602.715 3181.058 1120.772 Tree removal : 20.478 17.985 19.354 1.032 -- finished at 12/09/2013 16:14:10 -- | ||||||||||||||||||||||||||||||
| Comment by Shuichi Ihara (Inactive) [ 10/Dec/13 ] | ||||||||||||||||||||||||||||||
|
Here is "git bisect" results. File removal operation to shared directory
55989b17c7391266740d68e3c62418e184364ed7 And, for double check, I also tested curent HDAD of master branch with revert of -- started at 12/09/2013 21:28:02 -- mdtest-1.9.1 was launched with 64 total task(s) on 16 node(s) Command line used: /work/tools/bin/mdtest -d /lustre/dir.0 -n 20000 -F -i 3 Path: /lustre FS: 1141.8 TiB Used FS: 0.0% Inodes: 50.0 Mi Used Inodes: 0.0% 64 tasks, 1280000 files SUMMARY: (of 3 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 59437.920 56476.490 58310.121 1307.967 File stat : 127083.044 115640.003 120232.454 4936.949 File read : 110833.651 100376.278 105721.983 4272.411 File removal : 64267.994 63221.494 63591.734 478.906 Tree creation : 3533.533 1503.874 2724.054 878.023 Tree removal : 21.026 18.468 20.149 1.189 -- finished at 12/09/2013 21:31:17 -- | ||||||||||||||||||||||||||||||
| Comment by Peter Jones [ 10/Dec/13 ] | ||||||||||||||||||||||||||||||
|
Lai Are you able to comment? Thanks Peter | ||||||||||||||||||||||||||||||
| Comment by Lai Siyao [ 11/Dec/13 ] | ||||||||||||||||||||||||||||||
|
Hi Ihara, could you test with createmany and unlinkmany? I'm afraid it's not unlink performance drop, but mdtest causes file revalidation failure and relookup, because 55989b17c7391266740d68e3c62418e184364ed7 | ||||||||||||||||||||||||||||||
| Comment by Shuichi Ihara (Inactive) [ 12/Jan/14 ] | ||||||||||||||||||||||||||||||
|
Hi Lai, Sorry delay response on this. I have tested with createmany and unlinkmany on 16 clients and total 64 processes simultaneously.
| ||||||||||||||||||||||||||||||
| Comment by Shuichi Ihara (Inactive) [ 28/Jan/14 ] | ||||||||||||||||||||||||||||||
|
Hi Lai, any adviseses and updates of this? | ||||||||||||||||||||||||||||||
| Comment by Lai Siyao [ 13/Feb/14 ] | ||||||||||||||||||||||||||||||
|
I don't find any clue yet, will need more time on testing, I'll update the progress next week. | ||||||||||||||||||||||||||||||
| Comment by Lai Siyao [ 27/Feb/14 ] | ||||||||||||||||||||||||||||||
|
I tested on different setup, but I didn't see the unlink performance drop. If possible, could you use oprofile to find which function consumes more time for 2.5.52 client? I noticed that you only tested with a small set of files (20000 total files) and iterated three times. Could you test with more files and only one iteration? And could you also test with one client to see if unlink gets slow? | ||||||||||||||||||||||||||||||
| Comment by Andreas Dilger [ 04/Mar/14 ] | ||||||||||||||||||||||||||||||
|
Lai, Also, I think it is important to note that this is only an issue during unlink, and in fact with it is +3000 unlink/sec faster than 2.5.0/2.5.51 once the I suspect there is some subtle difference in the new ll_revalidate_dentry() code that is only triggering in the unlink case, possibly forcing an extra RPC to the MDS to revalidate the dentry just before it is being unlinked? Rather than spending time trying to reproduce the performance loss, it might make more sense to just get a debug log of unlink with and without the 55989b17c73912 patch applied and see what the difference is in the callpath and RPCs sent. Hopefully, there is just a minor change that can be done to fix the unlink path and not impact the other performance. | ||||||||||||||||||||||||||||||
| Comment by Lai Siyao [ 07/Mar/14 ] | ||||||||||||||||||||||||||||||
|
I tested on three testnodes in Toro, one client, one MDS, two OSS on same OST. I was suspecting that in | ||||||||||||||||||||||||||||||
| Comment by Lai Siyao [ 17/Mar/14 ] | ||||||||||||||||||||||||||||||
|
command like `mdtest -d /lustre/dir.0 -n 20000 -F -i 3` executed following syscalls on each file: For old code, syscall open in step 4 called .revalidate(IT_OPEN), which opened file, and close in step 6 called .release and did close the file. IMHO this is not a real bug, because no extra RPC was sent, but because mdtest opened file twice, so in new code open lock is fetched. A possible fix might be to add a timestamp so .open can know just now .revalidate(IT_OPEN) was called, so no need to fetch open lock. But I'm not sure this is necessary. | ||||||||||||||||||||||||||||||
| Comment by Shuichi Ihara (Inactive) [ 18/Mar/14 ] | ||||||||||||||||||||||||||||||
|
Yes, even this might not be bug, we see perforamnce drop under mdtest IO senario at least. mdtest is one of major benchmark tool for metadata and it's one of metadata scenario. we would be keeping (at least) same performance with newer version of Lustre. If you have idea of workaorund, please share with us. I would like to test them. | ||||||||||||||||||||||||||||||
| Comment by Lai Siyao [ 18/Mar/14 ] | ||||||||||||||||||||||||||||||
|
During test I saw other places that can be improved to increase file creation, stat, and maybe read performance, and I composed two patches: Would you apply these two patches and get some result? | ||||||||||||||||||||||||||||||
| Comment by Shuichi Ihara (Inactive) [ 18/Mar/14 ] | ||||||||||||||||||||||||||||||
|
sure, will test those patches very soon and keep you updates! Thanks a lot, again! | ||||||||||||||||||||||||||||||
| Comment by Shuichi Ihara (Inactive) [ 18/Apr/14 ] | ||||||||||||||||||||||||||||||
|
Lai, these patches are broken. can't copy file from local filesystem to Lustre. [root@r21 tmp]# touch /tmp/a [root@r21 tmp]# cp /tmp/a /lustre/ cp: cannot create regular file `/lustre/a': File exists This worked. [root@r21 tmp]# touch /lustre/a | ||||||||||||||||||||||||||||||
| Comment by Shuichi Ihara (Inactive) [ 18/Apr/14 ] | ||||||||||||||||||||||||||||||
|
this is debugfile when the problem happens. echo "+trace" > /proc/sys/lnet/debug | ||||||||||||||||||||||||||||||
| Comment by Lai Siyao [ 21/Apr/14 ] | ||||||||||||||||||||||||||||||
|
Thanks Ihara, patches updated, previously I only tested mdtest, and didn't do a full test because they are intended to get mdtest performance data, and may not be final patches yet, sorry for the trouble made. | ||||||||||||||||||||||||||||||
| Comment by Andreas Dilger [ 25/Apr/14 ] | ||||||||||||||||||||||||||||||
|
Lai, it looks like the patches http://review.whamcloud.com/9696 and http://review.whamcloud.com/9697 are improving the open performance, but do not address the unlink performance. Is there something that can be done to improve the unlink performance back to the 2.5.0 level so that 2.6.0 does not have a performance regression? | ||||||||||||||||||||||||||||||
| Comment by Lai Siyao [ 28/Apr/14 ] | ||||||||||||||||||||||||||||||
|
The root cause is that revalidate(IT_OPEN) enqueued open lock, so that close is deferred to unlink which caused unlink performance drop, but totally there is no extra RPC. I don't find a clear way to handle this, so I think if we can improve open and stat performance a lot, it's worthwhile keeping the status quo. | ||||||||||||||||||||||||||||||
| Comment by Andreas Dilger [ 06/May/14 ] | ||||||||||||||||||||||||||||||
|
It might be possible to combine the close and unlink RPCs (unlink with close flag, or close with unlink flag?) so that the number of RPCs is actually reduced? We already do something similar with early lock cancellation, so it might be possible to do something similar with the close. | ||||||||||||||||||||||||||||||
| Comment by Lai Siyao [ 07/May/14 ] | ||||||||||||||||||||||||||||||
|
I've thought of that, but considering the complication of open replay, and possibly SOM, I think it's not a trivial work. I'll think about it more and do some test later (maybe next week). | ||||||||||||||||||||||||||||||
| Comment by Lai Siyao [ 21/May/14 ] | ||||||||||||||||||||||||||||||
|
Patch to combine close in unlink RPC: http://review.whamcloud.com/#/c/10398/ Ihara, could you apply this only and get results from mdtest? | ||||||||||||||||||||||||||||||
| Comment by Shuichi Ihara (Inactive) [ 30/May/14 ] | ||||||||||||||||||||||||||||||
|
Hi Lai, | ||||||||||||||||||||||||||||||
| Comment by Andreas Dilger [ 30/May/14 ] | ||||||||||||||||||||||||||||||
|
Lai should confirm, but I think the most important patch for addressing the unlink regression is http://review.whamcloud.com/10398 so that one should be tested first. There is also a potential improvement in http://review.whamcloud.com/9696 that is next, but it doesn't affect unlink. I think the http://review.whamcloud.com/9697 is too complex to land for 2.6.0 at this point, but if it gives a significant improvement then it could be landed for 2.7.0 and IEEL. | ||||||||||||||||||||||||||||||
| Comment by Andreas Dilger [ 05/Jun/14 ] | ||||||||||||||||||||||||||||||
|
Ihara, did you get a chance to test if 10398 fixes the unlink regression? We are ready to land that patch. | ||||||||||||||||||||||||||||||
| Comment by Shuichi Ihara (Inactive) [ 09/Jun/14 ] | ||||||||||||||||||||||||||||||
|
I'm testing patches. will post results shortly. | ||||||||||||||||||||||||||||||
| Comment by Andreas Dilger [ 16/Jun/14 ] | ||||||||||||||||||||||||||||||
|
Ihara, any chance to post the results from your tests? | ||||||||||||||||||||||||||||||
| Comment by Andreas Dilger [ 23/Jun/14 ] | ||||||||||||||||||||||||||||||
|
Hi Ihara, is there a chance for you to post the mdtest results for the testing you did on 06-09 for patch http://review.whamcloud.com/10398 ? | ||||||||||||||||||||||||||||||
| Comment by Shuichi Ihara (Inactive) [ 24/Jun/14 ] | ||||||||||||||||||||||||||||||
|
Andreas, sorry dely on this... Here is recent our test resutls. Configuraiton 1 x MDS, 10 x SSD(RAID10) for MDT, 2 x OSS, 10 x OST(100 x NL-SAS) 32 clients, 64 mdtest threads and total 2.56M files creation/stats/removal master branch(47cde804ddc9019ff0793229030211d536d0612f) master branch(47cde804ddc9019ff0793229030211d536d0612f) + patch 10426 + patch 10398 Unique Directory Operation mdtest-1.9.3 was launched with 64 total task(s) on 32 node(s) Command line used: ./mdtest -i 3 -n 40000 -u -d /lustre_test/mdtest.out Path: /lustre_test FS: 39.0 TiB Used FS: 0.0% Inodes: 50.0 Mi Used Inodes: 0.0% 64 tasks, 2560000 files/directories SUMMARY: (of 3 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation: 48811.145 39252.347 42446.699 4500.354 Directory stat : 299207.829 290254.504 293619.032 3979.199 Directory removal : 89250.695 86672.466 88049.098 1059.809 File creation : 80325.602 71720.354 76539.450 3588.203 File stat : 202533.695 202312.144 202430.663 91.108 File read : 224391.556 222667.559 223733.260 760.494 File removal : 93977.310 81732.593 89128.915 5313.644 Tree creation : 487.540 255.237 408.701 108.529 Tree removal : 7.483 7.376 7.416 0.048 Unique Directory Operation mdtest-1.9.3 was launched with 64 total task(s) on 32 node(s) Command line used: ./mdtest -i 3 -n 40000 -u -d /lustre_test/mdtest.out Path: /lustre_test FS: 39.0 TiB Used FS: 0.0% Inodes: 50.0 Mi Used Inodes: 0.0% 64 tasks, 2560000 files/directories SUMMARY: (of 3 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation: 43529.024 38432.682 40492.505 2192.203 Directory stat : 295567.203 248965.236 278082.284 20727.046 Directory removal : 99851.600 97510.819 98692.187 955.746 File creation : 76464.252 61260.049 69836.770 6358.281 File stat : 210322.996 203751.172 206953.520 2685.537 File read : 227658.211 225535.341 226317.238 952.564 File removal : 99144.730 98371.321 98765.310 315.911 Tree creation : 454.766 187.656 357.198 120.339 Tree removal : 7.494 7.383 7.438 0.045 Shared Directory Operation mdtest-1.9.3 was launched with 64 total task(s) on 32 node(s) Command line used: ./mdtest -i 3 -n 40000 -d /lustre_test/mdtest.out Path: /lustre_test FS: 39.0 TiB Used FS: 0.0% Inodes: 50.0 Mi Used Inodes: 0.0% 64 tasks, 2560000 files/directories SUMMARY: (of 3 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation: 28513.564 27700.587 28038.288 345.860 Directory stat : 142617.694 139431.318 141316.628 1364.858 Directory removal : 60164.271 56562.712 58927.059 1672.450 File creation : 34568.359 34000.466 34304.269 233.536 File stat : 143387.629 140366.792 141459.265 1367.577 File read : 229820.877 222497.139 225426.481 3164.288 File removal : 66583.172 58133.175 61494.514 3659.539 Tree creation : 4132.319 3398.950 3773.387 299.598 Tree removal : 11.422 3.327 7.825 3.365 Shared Directory Operation mdtest-1.9.3 was launched with 64 total task(s) on 32 node(s) Command line used: ./mdtest -i 3 -n 40000 -d /lustre_test/mdtest.out Path: /lustre_test FS: 39.0 TiB Used FS: 0.0% Inodes: 50.0 Mi Used Inodes: 0.0% 64 tasks, 2560000 files/directories SUMMARY: (of 3 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation: 28132.040 26630.642 27487.773 631.154 Directory stat : 136965.055 135597.500 136440.164 601.823 Directory removal : 58149.733 55110.750 56638.405 1240.713 File creation : 33170.783 32710.907 32931.837 188.175 File stat : 138870.777 136286.854 137743.643 1080.330 File read : 234861.197 224503.115 228594.555 4499.710 File removal : 77518.626 69571.564 73940.211 3292.142 Tree creation : 4116.098 1102.314 2711.725 1238.885 Tree removal : 9.879 4.938 7.854 2.114 We see performance improvements with patch for unlink operations to unique directories as well as shared directory. I will also want to check with lustre-2.5 to comapre. btw, file/directory creation to shared directory is lower than I expected.. I will check later on other lustre version (e.g. b2_5) as well. | ||||||||||||||||||||||||||||||
| Comment by Andreas Dilger [ 24/Jun/14 ] | ||||||||||||||||||||||||||||||
|
It appears that the unlink performance has gone up, but the create and stat rate have gone down. Can you please test those two patches separately? If the 10398 patch is fixing the unlink performance without hurting the other performance it could land. It might be that 10426 patch is changing the other performance and needs to be reworked. | ||||||||||||||||||||||||||||||
| Comment by Shuichi Ihara (Inactive) [ 24/Jun/14 ] | ||||||||||||||||||||||||||||||
|
First, I tried only 10398 patch, but build fails since OBD_CONNECT_UNLINK_CLOSE is defined in 10426 patch. So, I needed both patches at same time to compile. BTW, here is same mdtest benchmark on same hardware, but lustre version is 2.5.2RC2. Unique Directory Operation mdtest-1.9.3 was launched with 64 total task(s) on 32 node(s) Command line used: ./mdtest -i 3 -n 40000 -u -d /lustre_test/mdtest.out Path: /lustre_test FS: 39.0 TiB Used FS: 0.0% Inodes: 50.0 Mi Used Inodes: 0.0% 64 tasks, 2560000 files/directories SUMMARY: (of 3 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation: 44031.310 41420.993 43125.128 1205.815 Directory stat : 346144.788 329854.059 335352.348 7631.863 Directory removal : 87592.556 86416.906 87118.114 506.033 File creation : 82518.567 64962.637 76375.141 8077.749 File stat : 215570.997 209551.901 212205.919 2508.198 File read : 151377.930 144487.897 147463.085 2890.255 File removal : 105964.879 93215.798 101520.782 5877.335 Tree creation : 628.925 410.522 542.680 94.889 Tree removal : 8.583 8.013 8.284 0.233 Shared Directory Operation mdtest-1.9.3 was launched with 64 total task(s) on 32 node(s) Command line used: ./mdtest -i 3 -n 40000 -d /lustre_test/mdtest.out Path: /lustre_test FS: 39.0 TiB Used FS: 0.0% Inodes: 50.0 Mi Used Inodes: 0.0% 64 tasks, 2560000 files/directories SUMMARY: (of 3 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation: 39463.778 38496.147 38986.389 395.138 Directory stat : 143006.039 134919.914 138809.226 3308.300 Directory removal : 78711.817 76206.632 77846.563 1160.196 File creation : 75154.225 70792.633 72674.025 1830.264 File stat : 142431.366 138650.545 140623.793 1547.953 File read : 134643.457 132249.733 133383.879 981.251 File removal : 94311.826 83231.516 89991.676 4841.388 Tree creation : 4048.556 3437.954 3743.808 249.278 Tree removal : 9.098 4.048 6.792 2.084 Unique directory metadata operation, overall, the result of master + 10398 + 10426 patches are close to 2.5.2RC2 results except directory stats. (stats operation, 2.5 is better than master) | ||||||||||||||||||||||||||||||
| Comment by Oleg Drokin [ 30/Jun/14 ] | ||||||||||||||||||||||||||||||
|
So it looks like we have all of this extra file handle caching that should not really be happening at all. Originally when opencache was implemented - it did cache everything and that resulted in performance drop specifically due to slow lock cancellation. I am planning to take a deeper lok to understand what is happening with the cache now. | ||||||||||||||||||||||||||||||
| Comment by Andreas Dilger [ 30/Jun/14 ] | ||||||||||||||||||||||||||||||
|
Sorry about my earlier confusion with 10426 - I thought that was a different patch, but I see now that it is required for 10398 to work. It looks like the 10398 patch does improve the unlink performance, but at the expense of almost every other operation. Since unlink is already faster than create, it doesn't make sense to speed it up and slow down create. It looks like there is also some other change(s) that slowed down the create and stat operations on master compared to 2.5.2. It doesn't seem reasonable to land 10398 for 2.6.0 at this point. | ||||||||||||||||||||||||||||||
| Comment by Lai Siyao [ 02/Jul/14 ] | ||||||||||||||||||||||||||||||
|
Oleg, the cause is simplified revalidate (see 7475), originally revalidate will execute IT_OPEN, but this code is replicate of lookup, and this opened handle can be lost if other client canceled this lock. So 7475 simplified revalidate, which just return 1 if dentry is valid, and let .open to really open file, but this can't be differentiate from NFS export open, so both open after revalidate and NFS export open take open lock. | ||||||||||||||||||||||||||||||
| Comment by Oleg Drokin [ 11/Jul/14 ] | ||||||||||||||||||||||||||||||
|
So, it looks like we still can infer if the open originated from vfs or not. When we come from do_filp_open (this is for real open path), we go through filename_lookup with LOOKUP_OPEN set, when we go through dentry_open, LOOKUP_OPEN is not set. As such the most brute-force way I see to address this is in ll_revalidate_dentry to always return 0 if LOOKUP_OPEN is set and LOOKUP_CONTINUE is NOT set (i.e. we are looking up last component). BTW, while looking at the ll_revalidate_dentry logic, I think we can improve it quite a bit too in the area of intermediate path component lookup. All of this is in this patch: http://review.whamcloud.com/11062 | ||||||||||||||||||||||||||||||
| Comment by Shuichi Ihara (Inactive) [ 11/Jul/14 ] | ||||||||||||||||||||||||||||||
sure, will test that patches as soon as I can run benchmark. maybe early next week, thanks! | ||||||||||||||||||||||||||||||
| Comment by Cliff White (Inactive) [ 22/Jul/14 ] | ||||||||||||||||||||||||||||||
|
I ran the patch on Hyperion, 1,32,64,100 clients. Mdtest dir-per-process and single-shared-dir. | ||||||||||||||||||||||||||||||
| Comment by Jodi Levi (Inactive) [ 12/Nov/14 ] | ||||||||||||||||||||||||||||||
|
Patches landed to Master. Please reopen ticket if more work is needed. |