[LU-9840] LU-3529 causes 25% metadata performance regressions even without DNE Created: 07/Aug/17  Updated: 15/Nov/17  Resolved: 21/Sep/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.11.0, Lustre 2.10.2

Type: Bug Priority: Minor
Reporter: Shuichi Ihara (Inactive) Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: None
Environment:

master


Attachments: Text File LU-9840.patch    
Issue Links:
Duplicate
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Finally, we found a commit and root cause of 25% metadata performance regression.
(File creation into single shared directory) This regression introduced on middle of lustre-2.5 and lustre-2.6 and this regression are still exist.
After our investigation with "git bisect" the follwoing patch caseus perforamnce regression.

5f3e926ac9ff8ad134ad920d0e8545e16395ef3b is the first bad commit
commit 5f3e926ac9ff8ad134ad920d0e8545e16395ef3b
Author: wang di <di.wang@intel.com>
Date:   Wed Jul 31 00:00:40 2013 -0700

    LU-3529 lod: create striped directory
    
    1. Add "lfs setdirstripe -i -c" to create striped
    directory.
    
    2. client send create request to the master MDT, which
    will allocate FIDs and create slaves. for all of slaves.
    
    3. Client needs to revalidate slaves during intent getattr
    and open request.
    
    4. lmv_stripe_md will include attributes(size, nlink etc)
    from all of stripe, which will be protected by UPDATE lock.
    client needs to merge these attributes when update inode.
    
    5. send create request to the MDT where the file is located,
    which can help creating master stripe of striped directory.
    
    Signed-off-by: wang di <di.wang@intel.com>
    Change-Id: I7ac560e39dcb415e310dc5e6ade531d76227ffae
    Reviewed-on: http://review.whamcloud.com/7196
    Tested-by: Jenkins
    Tested-by: Maloo <hpdd-maloo@intel.com>
    Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
    Reviewed-by: John L. Hammond <john.hammond@intel.com>

Here is test configuration
1 x MDS (2 x E5-2690 v3, 128GB memory)
32 x Client(2 x CPU E5-2650, 128GB memory)
4 x OSS and 40 OST
RHEL6.5

  1. mpirun -np 128 -ppn 4 -hostfile ./hostfile.32 mdtest -n 5000 -v -d /scratch/mdtest.out -p 30 -i 3 -F
[e19b51372ad94818a7a79b1fbae5b55c665ba59f] LU-4196 build: Reenable OFED-3.5 support on SLES11
SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :      88017.734      82493.520      84509.613       2489.825
   File stat         :     151997.740     142649.444     148656.020       4256.270
   File read         :     162847.716     154605.697     158734.759       3364.809
   File removal      :      84993.971      78063.127      80600.677       3118.973
   Tree creation     :       3692.169       2931.030       3262.108        318.519
   Tree removal      :         51.245         47.758         49.678          1.446
V-1: Entering timestamp...

[5f3e926ac9ff8ad134ad920d0e8545e16395ef3b] LU-3529 lod: create striped directory
SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :      66695.074      66242.158      66524.596        201.138
   File stat         :     152026.405     143681.866     148948.529       3741.740
   File read         :     165470.364     163291.085     164307.211        895.740
   File removal      :      86953.641      82117.776      84285.373       2005.726
   Tree creation     :       4165.148       2841.669       3603.150        558.417
   Tree removal      :         59.119         52.690         55.581          2.664
V-1: Entering timestamp...

Even no DNE, we are losing 25% performance regression.



 Comments   
Comment by Peter Jones [ 07/Aug/17 ]

Lai

Can you please advise?

Thanks

Peter

Comment by Di Wang [ 07/Aug/17 ]

Ihara,

Thanks for finding this, but I did not find the patch adding any overhead into the open/create path for non-DNE case. (except adding a few bytes into the LOD object, which should not cause such big performance drop).

Could you please try client (without patch) vs server (with patch) and client (with patch) vs server (without patch) to see who cause the trouble?

Does mdtest support zero-stripe creation? if not, could you please try mdsrate (lustre/tests/mpi/mdsrate.c) ? to see if mknod performance also drops?

Thanks

Comment by Shuichi Ihara (Inactive) [ 07/Aug/17 ]

I don't think this is client side problem. My client version was been Lustre-2.5 for all testing which means no patch included on the client side. I have also tested Lustre-2.7 cleint (whidh should include patch) against server with/without patches, but it's same behaviors and still ~20% regression=s.
Here is results with/without patches in server and tested lustre-2.7 client which included patch.

[e19b51372ad94818a7a79b1fbae5b55c665ba59f] LU-4196 build: Reenable OFED-3.5 support on SLES11
SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :      82368.780      82025.716      82246.435        156.378
   File stat         :     155345.312     149657.195     152438.609       2323.853
   File read         :     153267.971     146148.288     150669.345       3208.767
   File removal      :      84368.998      81469.899      82617.512       1258.223
   Tree creation     :       4236.671       1364.445       3221.788       1315.225
   Tree removal      :         58.109         57.026         57.622          0.449
V-1: Entering timestamp...

[5f3e926ac9ff8ad134ad920d0e8545e16395ef3b] LU-3529 lod: create striped directory
SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :      71733.913      69104.187      70276.415       1092.367
   File stat         :     154928.974     143628.505     150433.332       4893.833
   File read         :     150986.271     149548.465     150302.120        589.036
   File removal      :      90852.671      83976.575      86282.682       3231.517
   Tree creation     :       3938.314       1353.438       2962.975       1146.604
   Tree removal      :         57.287         54.732         56.202          1.078
V-1: Entering timestamp...

Let me try zero-striping.

Comment by Shuichi Ihara (Inactive) [ 07/Aug/17 ]

BTW, what I said this regression is File (maybe DIR too) creation into a single shared directory.
We don't see same regression on unique directory case.
Please see test results on unique dir case.

mpirun -np 128 -ppn 4 -hostfile ./hostfile.32 mdtest -n 5000 -v -d /scratch/mdtest.out -p 30 -i 3 -F -u

[e19b51372ad94818a7a79b1fbae5b55c665ba59f] LU-4196 build: Reenable OFED-3.5 support on SLES11
SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :     102493.883      96797.852      99298.922       2376.595
   File stat         :     334530.859     322827.499     329729.583       5003.479
   File read         :     153050.474     150524.026     152070.676       1106.564
   File removal      :     106763.376     102263.541     104794.116       1879.438
   Tree creation     :        456.002        280.988        389.842         77.565
   Tree removal      :         33.068         32.685         32.915          0.166
V-1: Entering timestamp...
[5f3e926ac9ff8ad134ad920d0e8545e16395ef3b] LU-3529 lod: create striped directory
SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :     103803.524      99922.978     101450.587       1688.300
   File stat         :     340341.346     331701.230     335322.537       3663.119
   File read         :     150589.967     148096.413     149200.655       1037.755
   File removal      :     102280.943      93578.046      97385.654       3635.234
   Tree creation     :        312.007         43.829        221.429        125.590
   Tree removal      :         32.941         32.516         32.795          0.198
V-1: Entering timestamp...

So, it seems something wrong on shared directory operaton.

Comment by Di Wang [ 08/Aug/17 ]

Hello, Ihara

Thanks for these information. Unfortunately I do not have 2.5 environment myself.
Could you please try this patch ?

--- a/lustre/lod/lod_lov.c
+++ b/lustre/lod/lod_lov.c
@@ -860,17 +860,6 @@ int lod_load_striping(const struct lu_env *env, struct lod_object *lo)
                info->lti_buf.lb_buf = info->lti_ea_store;
                info->lti_buf.lb_len = info->lti_ea_store_size;
                rc = lod_parse_striping(env, lo, &info->lti_buf);
-       } else if (lu_object_attr(lod2lu_obj(lo)) & S_IFDIR) {
-               rc = lod_get_lmv_ea(env, lo);
-               if (rc <= 0)
-                       GOTO(out, rc);
-               /*
-                * there is LOV EA (striping information) in this object
-                * let's parse it and create in-core objects for the stripes
-                */
-               info->lti_buf.lb_buf = info->lti_ea_store;
-               info->lti_buf.lb_len = info->lti_ea_store_size;
-               rc = lod_parse_dir_striping(env, lo, &info->lti_buf);
        }
 out:
        dt_write_unlock(env, next);

And also could you also please collect some debug log(-1 mask) on the MDS and post it here ?

Btw: what kernel do you use for the test? rhel-6.3?

Comment by Di Wang [ 08/Aug/17 ]

Ihara
Just to be clear, this is not a real fix (on the last comment), which just help me to figure out where to cause the performance drop. So if u can collect the debug log for me, that would be helpful.

Comment by Shuichi Ihara (Inactive) [ 09/Aug/17 ]

WangDi, Sure, I know what you poined out. I will test them and feedback you sooner.

Comment by Shuichi Ihara (Inactive) [ 14/Aug/17 ]

Could you please try this patch ?

patch improved ~15%, but still another 5% performamnce regressions and patch helped for File creation, but go regressions on unlink.
Here is results.

[e19b51372ad94818a7a79b1fbae5b55c665ba59f] LU-4196 build: Reenable OFED-3.5 support on SLES11
SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :      85568.624      84412.561      84814.381        533.711
   File stat         :     156095.695     145840.993     152336.768       4612.121
   File read         :     161833.820     157226.792     160061.457       2025.264
   File removal      :      89463.742      83852.552      85970.898       2488.413
   Tree creation     :       4185.932       2597.092       3482.057        661.160
   Tree removal      :         58.671         47.865         52.494          4.545
V-1: Entering timestamp...
[5f3e926ac9ff8ad134ad920d0e8545e16395ef3b] LU-3529 lod: create striped directory
SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :      72231.370      71757.985      71941.350        207.453
   File stat         :     156218.994     146766.402     152100.221       3953.490
   File read         :     151082.612     148981.339     149687.568        986.470
   File removal      :      86520.243      85361.115      85847.648        491.161
   Tree creation     :       3731.587       1801.677       3001.790        855.194
   Tree removal      :         59.288         55.627         57.821          1.581
V-1: Entering timestamp...
[5f3e926ac9ff8ad134ad920d0e8545e16395ef3b] LU-3529 lod: create striped directory + patch
SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :      82139.647      80821.907      81312.553        588.222
   File stat         :     157364.652     151950.461     153937.210       2433.799
   File read         :     153314.454     151299.630     152332.901        823.361
   File removal      :      79767.310      77557.914      78420.017        965.027
   Tree creation     :       3905.311       3134.756       3433.015        337.792
   Tree removal      :         53.836         50.449         51.712          1.511
V-1: Entering timestamp...
Comment by Di Wang [ 14/Aug/17 ]

Ihara:

Could you please try this patch? Thanks

LU-9840.patch

Comment by Di Wang [ 22/Aug/17 ]

Any news?

Comment by Shuichi Ihara (Inactive) [ 12/Sep/17 ]

WangDi,
I wonder if your patch could adapt to latest master branch? it seems codes changes a lot..

Comment by Di Wang [ 12/Sep/17 ]

Sure, Ihara

Comment by Gerrit Updater [ 13/Sep/17 ]

wangdi (di.wang@intel.com) uploaded a new patch: https://review.whamcloud.com/28962
Subject: LU-9840 lod: add lod_dir_nonstripe
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d173f7738f5c8053f5e07a685c428f3e263224c3

Comment by Shuichi Ihara (Inactive) [ 13/Sep/17 ]

Thanks WangDi. I did quick test your patches against laster master.

test case: mdtest to a shared directory, 32 clients, 128 processes and 10,000 files per process which means 1.28 Million files for total.
mpirun -np 128 /work/tools/bin/mdtest -n 10000 -v -d /scratch0/mdtest.out -F -i 3 -p 30 -w 0

master: 1fc4ed3ac40ab0e11b1c59d7d147a100636cbda0 without any patches.

SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :      87877.948      77161.656      81981.331       4440.689
   File stat         :     146500.548     139206.464     143471.400       3103.363
   File read         :     157850.695     155719.694     157052.803        948.731
   File removal      :     109180.271     103474.828     105885.648       2411.618
   Tree creation     :       4812.597       1625.810       3218.841       1301.001
   Tree removal      :         41.205         38.867         39.730          1.048
V-1: Entering timestamp...
SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :      94691.817      87748.601      90106.652       3242.642
   File stat         :     145824.020     139999.453     142233.530       2564.019
   File read         :     154804.912     152446.911     153784.582        988.457
   File removal      :     110741.797     105265.586     107834.815       2248.374
   Tree creation     :       4708.453       2226.195       3100.325       1138.556
   Tree removal      :         40.618         21.520         33.672          8.622
V-1: Entering timestamp...

Somehow, master branch without patches are a bit better than lustre-2.7 we saw before. gap between 2.5 and current master is small.
Also, patch 28962 helps performance improvment too. we are seeing average ~10% impromvents at file creation.

Comment by Di Wang [ 13/Sep/17 ]

Ihara, thanks for testing. hmm, I did not expect master could be better here, maybe there are some other optimization here. Anyway it is good news.

Could you please also try https://jira.hpdd.intel.com/secure/attachment/28000/28000_LU-9840.patch directly based on [5f3e926ac9ff8ad134ad920d0e8545e16395ef3b] LU-3529 lod: create striped directory. ? Thanks.

Comment by Gerrit Updater [ 21/Sep/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28962/
Subject: LU-9840 lod: add ldo_dir_stripe_loaded
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 744fe412fa101609a38f7ccc77efc4f1c540e008

Comment by Peter Jones [ 21/Sep/17 ]

This patch has landed to master but is more work still to come?

Comment by Peter Jones [ 21/Sep/17 ]

As per Ihara - this issue is resolved by the patch

Comment by Gerrit Updater [ 21/Sep/17 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/29143
Subject: LU-9840 lod: add ldo_dir_stripe_loaded
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 91f613ee25e5a39e28b1c2ad45fff15cc65b79de

Comment by Gerrit Updater [ 15/Nov/17 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/29143/
Subject: LU-9840 lod: add ldo_dir_stripe_loaded
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 507852d2fb49da5da6ed792e25964dd14169b30d

Generated at Sat Feb 10 02:29:45 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.