[LU-10005] File creation to slave MDT is much slower than primary MDT on DNE1 configuration Created: 19/Sep/17  Updated: 05/Sep/18  Resolved: 17/Dec/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.11.0, Lustre 2.10.4

Type: Bug Priority: Major
Reporter: Shuichi Ihara (Inactive) Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: None
Environment:

b2_10


Issue Links:
Related
is related to LU-10406 sanity-lfsck test_31c: (4) Fail to re... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

There is a MDS and two MDTs. both MDT's hardware setup is symmetric.
This is DNE1 setup and static MDT allocation to each directory below.

[root@c01 ~]# lfs mkdir -i 0 /scratch0/dir0
[root@c01 ~]# lfs mkdir -i 1 /scratch0/dir1

If it run mdtest to each MDT separately, File creation to slave MDT (MDT0001) is much slower than primary MDT (MDT0000).
Here is quick summary

1. MDT0000 on MDT0 : 154K ops/sec
2. MDT0001 on MDT1 : 94K ops/sec

Also tested MDT1 device as MDT0000. reformated MDT1 device as MDT0000 and also reformated MDT0 device as MDT0001. (which means swapped MDT0 and MDT1 device)

3. MDT0000 on original MDT1 devcide : 151K ops/sec
4. MDT0001 on original MDT0 devcide : 106K ops/sec

From those benchmark results, MDT device and backend storage are no problem and it doesn't master. In any case, file creation to MDT0001 is slower than MDT0000.

Here is full mutest results.
mpirun -np 128 /work/tools/bin/mdtest -n 10000 -v -d /scratch0/dir1 -i 3 -p 60 -F -u

Format MDT0 device with MDT0000
SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :     155589.552     152790.159     154009.399       1170.995
   File stat         :     454932.894     444775.351     449009.216       4315.516
   File read         :     233628.858     230038.744     232081.775       1507.029
   File removal      :     188460.588     184435.235     186712.008       1685.251
   Tree creation     :        551.714        444.141        493.856         44.292
   Tree removal      :         19.593         18.601         18.984          0.436
V-1: Entering timestamp...

Format MDT1 device with MDT0001
SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :      97428.133      92734.086      94657.494       2007.797
   File stat         :     463844.746     439627.133     450037.946      10174.240
   File read         :     234910.249     232565.024     233533.717        999.923
   File removal      :     186289.259     181171.423     184208.010       2195.839
   Tree creation     :        476.266         32.866        325.249        206.784
   Tree removal      :         19.429         14.144         17.055          2.191
V-1: Entering timestamp...

Reformat MDT1 device as MDT0000
SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :     155432.973     145656.215     151697.151       4311.335
   File stat         :     436363.906     420914.320     428509.377       6309.935
   File read         :     231848.424     229879.273     230823.486        805.926
   File removal      :     189856.501     186441.697     187710.599       1525.794
   Tree creation     :        564.044        432.872        504.217         54.166
   Tree removal      :         18.839         17.053         17.802          0.757
V-1: Entering timestamp...

Reformat MDT0 device as MDT0001
SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :     110312.440     103512.106     106042.905       3036.285
   File stat         :     443284.493     425246.521     435923.695       7728.341
   File read         :     226239.692     225898.388     226067.629        139.351
   File removal      :     186702.519     181944.612     184773.293       2043.883
   Tree creation     :        533.233         28.863        342.123        223.290
   Tree removal      :         17.901         17.260         17.650          0.280
V-1: Entering timestamp...



 Comments   
Comment by Andreas Dilger [ 19/Sep/17 ]

Hi Ihara,
I'm wondering if there is some extra overhead in looking up the scratch0->dir1 directory entry that is causing the MDT0001 operations to be slower? It would be useful to run mdtest inside the dir1 directory to avoid the extra lookup, and determine if that is the cause of the slowdown:

# cd /scratch/dir1
# mpirun -np 128 /work/tools/bin/mdtest -n 10000 -v -d . -i 3 -p 60 -F -u

The scratch directory should always be cached on the client, but I'm wondering if there is some problem with the locking on dir1 that is preventing it from being cached?

Comment by Di Wang [ 19/Sep/17 ]

Another possible cause is that the default lov stripping cache does not work correctly, which might cause each file open/create (on non-root MDT) tries to get default striping from root MDT (extra RPC). See lod_ah_init()->lod_get_default_lov_striping().

I am not sure OSP cache works in this case, I will check.

Comment by Gerrit Updater [ 19/Sep/17 ]

wangdi (di.wang@intel.com) uploaded a new patch: https://review.whamcloud.com/29078
Subject: LU-10005 osp: cache non-exist EA
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8a7e2c23d4bd35d0d544e91bdc51ce0abff895ac

Comment by Di Wang [ 19/Sep/17 ]

Ihara, please try this patch, thanks.

Comment by Shuichi Ihara (Inactive) [ 20/Sep/17 ]

Thanks WangDi
patch makes better results.

MDT0000

SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :     145472.642     137093.283     140274.791       3706.089
   File stat         :     443154.312     431557.793     436764.649       4807.570
   File read         :     233326.068     229897.041     231796.549       1424.131
   File removal      :     186842.911     186376.008     186627.793        192.368
   Tree creation     :        572.336        436.418        499.243         55.961
   Tree removal      :         19.798         18.276         19.165          0.647
V-1: Entering timestamp...

MDT0001

SUMMARY: (of 3 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :     153350.909     135825.706     143892.288       7222.027
   File stat         :     462687.460     449961.013     457977.445       5697.362
   File read         :     230307.880     224196.385     226475.436       2726.092
   File removal      :     192887.031     187816.726     189799.023       2212.682
   Tree creation     :        514.550        399.017        451.076         47.852
   Tree removal      :         18.876         17.538         18.222          0.546

V-1: Entering timestamp...

btw, I didn't see such performance differences with IEEL3.0. somethjing we did in lusre-2.7, but missed or changed after lustre-2.7 and showed up this issue?

Comment by Shuichi Ihara (Inactive) [ 20/Sep/17 ]

BTW, after applied patch https://review.whamcloud.com/29078.
Overall, average file creation performance to primary MDT (MDT0000) drops.

run mdtest 10 times with 60 sec interval
mpirun --allow-run-as-root /work/tools/bin/mdtest -n 10000 -v -d /scratch0/dir0 -i 10 -p 60 -u -F

without patch

[root@c01 ~]# grep 'V-1:   File creation' mdtest-default-dir0-loop.log 
V-1:   File creation     :          8.723 sec,     146739.418 ops/sec
V-1:   File creation     :          8.855 sec,     144543.110 ops/sec
V-1:   File creation     :          8.829 sec,     144978.787 ops/sec
V-1:   File creation     :          8.803 sec,     145404.742 ops/sec
V-1:   File creation     :          8.637 sec,     148192.295 ops/sec
V-1:   File creation     :          9.084 sec,     140911.792 ops/sec
V-1:   File creation     :          8.837 sec,     144853.049 ops/sec
V-1:   File creation     :          9.288 sec,     137808.205 ops/sec
V-1:   File creation     :          9.046 sec,     141502.448 ops/sec
V-1:   File creation     :          9.392 sec,     136287.278 ops/sec

with patch

[root@c01 ~]# grep 'V-1:   File creation' mdtest-LU10005-dir0-loop.log 
V-1:   File creation     :          8.874 sec,     144246.332 ops/sec
V-1:   File creation     :          8.552 sec,     149675.160 ops/sec
V-1:   File creation     :          9.211 sec,     138957.104 ops/sec
V-1:   File creation     :          9.058 sec,     141315.265 ops/sec
V-1:   File creation     :          9.297 sec,     137673.095 ops/sec
V-1:   File creation     :          9.263 sec,     138185.327 ops/sec
V-1:   File creation     :          9.469 sec,     135184.898 ops/sec
V-1:   File creation     :          9.266 sec,     138134.736 ops/sec
V-1:   File creation     :          9.373 sec,     136563.934 ops/sec
V-1:   File creation     :          9.486 sec,     134930.710 ops/sec
Comment by Di Wang [ 20/Sep/17 ]
btw, I didn't see such performance differences with IEEL3.0. somethjing we did in lusre-2.7, but missed or changed after lustre-2.7 and showed up this issue?

I think this is brought in by LU-8998 https://review.whamcloud.com/#/c/24823, which is landed in 2.10, so 2.7 should be fine.

BTW, after applied patch https://review.whamcloud.com/29078.
Overall, average file creation performance to primary MDT (MDT0000) drops.

Hmm, I did not touch anything in the path of local file open/creation. Is it repeatable? This drop seems unlikely related with the patch, IMHO. I will check again. thanks

Comment by Gerrit Updater [ 17/Dec/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29078/
Subject: LU-10005 osp: cache non-exist EA
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b2fa448050aff0b5230c8cc94e8baf848fbb4ded

Comment by Peter Jones [ 17/Dec/17 ]

Landed for 2.11

Comment by Gerrit Updater [ 18/Dec/17 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30585
Subject: LU-10005 osp: cache non-exist EA
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 96e35818b3c09978f7f05020a8f2641e0de0c92c

Comment by Gerrit Updater [ 22/Dec/17 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30643
Subject: Revert "LU-10005 osp: cache non-exist EA"
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8da4d30a18ce1371624730225ad1b324e5128db1

Comment by Gerrit Updater [ 12/Apr/18 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30585/
Subject: LU-10005 osp: cache non-exist EA
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 3a59349d931250c3ea008a68f8f0121500d984a4

Generated at Sat Feb 10 02:31:12 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.