[LU-11999] DNE performance improvement Created: 24/Feb/19  Updated: 21/Feb/20  Resolved: 21/Mar/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.13.0, Lustre 2.12.1

Type: Improvement Priority: Critical
Reporter: Jinshan Xiong Assignee: Jinshan Xiong
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-6864 DNE3: Support multiple modify RPCs in... Resolved
is related to LU-9436 DNE2 - performance improvement with w... Open
Rank (Obsolete): 9223372036854775807

 Description   

There exists a case that DNE would create remote files slow. When a remote directory is created, and if the process has to walk path to create files under that remote directory, excessive UPDATE lock request and revocation is seen. To reproduce this problem:

 # lfs mkdir -i 0 /mnt/lustre/dir1
 # lfs mkdir -i 1 /mnt/lustre/dir1/dir2
 # touch /mnt/lustre/dir1/dir2/f1
 # touch /mnt/lustre/dir1/dir2/f2
 

Each time when a file is created under /mnt/lustre/dir1/dir2, client will walk the path, and a UPDATE lock is requested in function lmv_intent_remote() since this is a remote directory; as the code shows:

        /*
         * Unfortunately, we have to lie to MDC/MDS to retrieve
         * attributes llite needs and provideproper locking.
         */
        if (it->it_op & IT_LOOKUP)
                it->it_op = IT_GETATTR;

The above code will cause UPDATE + PERM lock to be returned.

Then when the create intent RPC is sent to the MDS, the UPDATE lock will be revoked by the MDT. This pattern will go on over and over again as this operation continues.

This piece of code was written before PERM lock was introduced, and it's causing drastic performance degradation. Now we have PERM lock in place, the code is no longer required.

Thanks Di for providing the insight of DNE code.

Patch and test are coming soon.



 Comments   
Comment by Gerrit Updater [ 24/Feb/19 ]

Jinshan Xiong (jinshan.xiong@gmail.com) uploaded a new patch: https://review.whamcloud.com/34291
Subject: LU-11999 dne: performance improvement for file creation
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5ebe2b12f3e9a623b2b8188ed78792bdbf598352

Comment by Jinshan Xiong [ 24/Feb/19 ]

Di provided the above patch to fix the problem. The test result is as follows:

Test case:
rm -rf /mnt/lustre_purple/testdir
lfs mkdir -i 0 /mnt/lustre_purple/testdir
lfs mkdir -i 2 /mnt/lustre_purple/testdir/dir2
./lustre-release/lustre/tests/createmany -o \
	/mnt/lustre_purple/testdir/dir2/f 10000

Before the patch is applied:
total: 10000 open/close in 12.82 seconds: 780.22 ops/second

After the patch is applied:
total: 10000 open/close in 4.89 seconds: 2044.75 ops/second
Comment by Gerrit Updater [ 21/Mar/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34291/
Subject: LU-11999 dne: performance improvement for file creation
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: bfbd062e6b177cf934b75d6be2db695b9fe1648b

Comment by Peter Jones [ 21/Mar/19 ]

Landed for 2.13

Comment by Gerrit Updater [ 26/Mar/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34505
Subject: LU-11999 dne: performance improvement for file creation
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 9b71ee07073b1316e7949c2f93076c5a89ab592c

Comment by Andreas Dilger [ 27/Mar/19 ]

Another possible performance issue is that for DNE cross-MDT operations (e.g. mkdir() or rename()) there is only a single OSP modify RPC in flight per remote target, since the patch https://review.whamcloud.com/14375 "LU-6864 osp: manage number of modify RPCs in flight" is not landed yet.

Comment by Gerrit Updater [ 08/Apr/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34505/
Subject: LU-11999 dne: performance improvement for file creation
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 182731f92c4a61d2b7793a798950439a467d9365

Generated at Sat Feb 10 02:48:47 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.