[LU-11999] DNE performance improvement Created: 24/Feb/19 Updated: 21/Feb/20 Resolved: 21/Mar/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.13.0, Lustre 2.12.1 |
| Type: | Improvement | Priority: | Critical |
| Reporter: | Jinshan Xiong | Assignee: | Jinshan Xiong |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
There exists a case that DNE would create remote files slow. When a remote directory is created, and if the process has to walk path to create files under that remote directory, excessive UPDATE lock request and revocation is seen. To reproduce this problem: # lfs mkdir -i 0 /mnt/lustre/dir1 # lfs mkdir -i 1 /mnt/lustre/dir1/dir2 # touch /mnt/lustre/dir1/dir2/f1 # touch /mnt/lustre/dir1/dir2/f2 Each time when a file is created under /mnt/lustre/dir1/dir2, client will walk the path, and a UPDATE lock is requested in function lmv_intent_remote() since this is a remote directory; as the code shows:
/*
* Unfortunately, we have to lie to MDC/MDS to retrieve
* attributes llite needs and provideproper locking.
*/
if (it->it_op & IT_LOOKUP)
it->it_op = IT_GETATTR;
The above code will cause UPDATE + PERM lock to be returned. Then when the create intent RPC is sent to the MDS, the UPDATE lock will be revoked by the MDT. This pattern will go on over and over again as this operation continues. This piece of code was written before PERM lock was introduced, and it's causing drastic performance degradation. Now we have PERM lock in place, the code is no longer required. Thanks Di for providing the insight of DNE code. Patch and test are coming soon. |
| Comments |
| Comment by Gerrit Updater [ 24/Feb/19 ] |
|
Jinshan Xiong (jinshan.xiong@gmail.com) uploaded a new patch: https://review.whamcloud.com/34291 |
| Comment by Jinshan Xiong [ 24/Feb/19 ] |
|
Di provided the above patch to fix the problem. The test result is as follows: Test case: rm -rf /mnt/lustre_purple/testdir lfs mkdir -i 0 /mnt/lustre_purple/testdir lfs mkdir -i 2 /mnt/lustre_purple/testdir/dir2 ./lustre-release/lustre/tests/createmany -o \ /mnt/lustre_purple/testdir/dir2/f 10000 Before the patch is applied: total: 10000 open/close in 12.82 seconds: 780.22 ops/second After the patch is applied: total: 10000 open/close in 4.89 seconds: 2044.75 ops/second |
| Comment by Gerrit Updater [ 21/Mar/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34291/ |
| Comment by Peter Jones [ 21/Mar/19 ] |
|
Landed for 2.13 |
| Comment by Gerrit Updater [ 26/Mar/19 ] |
|
Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34505 |
| Comment by Andreas Dilger [ 27/Mar/19 ] |
|
Another possible performance issue is that for DNE cross-MDT operations (e.g. mkdir() or rename()) there is only a single OSP modify RPC in flight per remote target, since the patch https://review.whamcloud.com/14375 " |
| Comment by Gerrit Updater [ 08/Apr/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34505/ |