Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.5.0
-
2
-
11951
Description
lustre-2.5.52 client (and maybe more old client as well) causes metadata performance (unlink files in the single shared directory) regression.
Here is test results on lustre-2.5.52 clients and lustre-2.4.1 clients. lustre-2.5.52 is running on all servers.
1 x MDS, 4 x OSS (32 x OST) and 16 clients(64 processs, 20000 files per process)
lustre-2.4.1 client 4.1-take2.log -- started at 12/09/2013 07:31:29 -- mdtest-1.9.1 was launched with 64 total task(s) on 16 node(s) Command line used: /work/tools/bin/mdtest -d /lustre/dir.0 -n 20000 -F -i 3 Path: /lustre FS: 1141.8 TiB Used FS: 0.0% Inodes: 50.0 Mi Used Inodes: 0.0% 64 tasks, 1280000 files SUMMARY: (of 3 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 58200.265 56783.559 57589.448 594.589 File stat : 123351.857 109571.584 114223.612 6455.043 File read : 109917.788 83891.903 99965.718 11472.968 File removal : 60825.889 59066.121 59782.774 754.599 Tree creation : 4048.556 1971.934 3082.293 853.878 Tree removal : 21.269 15.069 18.204 2.532 -- finished at 12/09/2013 07:34:53 --
lustre-2.5.5.2 client -- started at 12/09/2013 07:13:42 -- mdtest-1.9.1 was launched with 64 total task(s) on 16 node(s) Command line used: /work/tools/bin/mdtest -d /lustre/dir.0 -n 20000 -F -i 3 Path: /lustre FS: 1141.8 TiB Used FS: 0.0% Inodes: 50.0 Mi Used Inodes: 0.0% 64 tasks, 1280000 files SUMMARY: (of 3 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 58286.631 56689.423 57298.286 705.112 File stat : 127671.818 116429.261 121610.854 4631.684 File read : 173527.817 158205.242 166676.568 6359.445 File removal : 46818.194 45638.851 46118.111 506.151 Tree creation : 3844.458 2576.354 3393.050 578.560 Tree removal : 21.383 18.329 19.844 1.247 -- finished at 12/09/2013 07:17:07 --
46K ops/sec (lusre-2.5.52) vs 60K ops/sec (lustre-2.4.1). 25% performance drops on Lustre-2.5.52 compared to Lustre-2.4.1.
So, it looks like we still can infer if the open originated from vfs or not.
When we come from do_filp_open (this is for real open path), we go through filename_lookup with LOOKUP_OPEN set, when we go through dentry_open, LOOKUP_OPEN is not set.
As such the most brute-force way I see to address this is in ll_revalidate_dentry to always return 0 if LOOKUP_OPEN is set and LOOKUP_CONTINUE is NOT set (i.e. we are looking up last component).
We already do a similar trick for LOOKUP_OPEN|LOOKUP_CONTINUE
BTW, while looking at the ll_revalidate_dentry logic, I think we can improve it quite a bit too in the area of intermediate path component lookup.
All of this is in this patch: http://review.whamcloud.com/11062
Ihara-san, please give it a try to see if it helps for your workload?
This patch passes medium level of my testing (does not include any performance testing).