Description
The DOM lock bit is taken opportunistically at open to improve possible read/write access but it uses PW lock mode and can be combined with other lock bits. The proposed improvements are:
1. when file is closed and there were no read/write, drop DOM bit and downgrade lock mode to less strict lock mode (CR/CW)
2. if there was write then write data on close (LU-11428) and also downgrade lock mode to PR
3. if DOM bit is dropped and there are other bits remains in lock - downgrade lock mode too
Attachments
Issue Links
- is related to
-
LU-11428 Writeback on close for DoM
-
- Open
-
Activity
"Mikhail Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51239
Subject: LU-12325 ldlm: mode convert client changes
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ca37a7c2ced28d66ca966d73f242b25cf626ff42
when it takes 3x locks, could you repeat that with less files (~2-3) and dump locks on client:
lctl set_param -n ldlm.dump_namespaces 1 lctl dk > locks_dumped.txt
and attach locks_dumped.txt? I believe these locks differs by inodebits, e.g. some may have XATTR bit and so on.
Hm.. dom_lock=always also changed behaviors
[root@ai400x2-1-vm1 ~]# lctl set_param mdt.*.dom_lock=always mdt.exafs-MDT0000.dom_lock=always [root@ec01 ~]# mpirun -np 1 --allow-run-as-root /work/tools/bin/mdtest -F -d /exafs/mdtest.out/ -n 10000 -u -C [root@ec01 ~]# lctl get_param ldlm.namespaces.*MDT0000*.lock_count ldlm.namespaces.exafs-MDT0000-mdc-ffff98e1eefd9800.lock_count=10005 [root@ec01 ~]# lctl get_param ldlm.namespaces.*MDT0000*.lock_count ldlm.namespaces.exafs-MDT0000-mdc-ffff98e1eefd9800.lock_count=10005 [root@ec02 ~]# mpirun -np 1 --allow-run-as-root /work/tools/bin/mdtest -F -d /exafs/mdtest.out/ -n 10000 -u -T [root@ec02 ~]# lctl get_param ldlm.namespaces.*MDT0000*.lock_count ldlm.namespaces.exafs-MDT0000-mdc-ffff9d914c191000.lock_count=30004 [root@ec01 ~]# lctl get_param ldlm.namespaces.*MDT0000*.lock_count ldlm.namespaces.exafs-MDT0000-mdc-ffff98e1eefd9800.lock_count=5
it seems that dom_lock=never didn't change.
[root@ai400x2-1-vm1 ~]# lctl set_param mdt.*.dom_lock=never mdt.exafs-MDT0000.dom_lock=never [root@ec01 ~]# mpirun -np 1 --allow-run-as-root /work/tools/bin/mdtest -F -d /exafs/mdtest.out/ -n 10000 -u -C [root@ec01 ~]# lctl get_param ldlm.namespaces.*MDT0000*.lock_count ldlm.namespaces.exafs-MDT0000-mdc-ffff98e1eefd9800.lock_count=10004 [root@ec02 ~]# lctl get_param ldlm.namespaces.*MDT0000*.lock_count ldlm.namespaces.exafs-MDT0000-mdc-ffff9d914c191000.lock_count=0 [root@ec02 ~]# mpirun -np 1 --allow-run-as-root /work/tools/bin/mdtest -F -d /exafs/mdtest.out/ -n 10000 -u -T [root@ec02 ~]# lctl get_param ldlm.namespaces.*MDT0000*.lock_count ldlm.namespaces.exafs-MDT0000-mdc-ffff9d914c191000.lock_count=10004 [root@ec01 ~]# lctl get_param ldlm.namespaces.*MDT0000*.lock_count ldlm.namespaces.exafs-MDT0000-mdc-ffff98e1eefd9800.lock_count=10004
In the end, dom_lock=never make better performance today and we really need https://review.whamcloud.com/37088 and https://review.whamcloud.com/37105 conjunctions with dom_lock=trylock otherwise the performance is worse than dom_lock=never.
It might be related, but dom_lock=trylock behaviors chnaged in recent codes?
I'm testing on 2.15.0-RC3.
Dom lock mode "trylock" which is default.
[root@ai400x2-1-vm1 ~]# lctl get_param mdt.*.dom_lock mdt.exafs-MDT0000.dom_lock=trylock
Create new directory and configured dom for all < 1MB files.
[root@ec01 ~]# mkdir /exafs/md test.out/ [root@ec01 ~]# lfs setdirstrip e -i 0 /exafs/mdtest.out/ [root@ec01 ~]# lfs setstripe -E 1m -L mdt /exafs/mdtest.out
Create 10000 files on client c01
[root@ec01 ~]# mpirun -np 1 --allow-run-as-root /work/tools/bin/mdtest -F -d /exafs/mdtest.out/ -n 10000 -u -C Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation 5047.056 5047.056 5047.056 0.000 File stat 0.000 0.000 0.000 0.000 File read 0.000 0.000 0.000 0.000 File removal 0.000 0.000 0.000 0.000 Tree creation 900.065 900.065 900.065 0.000 Tree removal 0.000 0.000 0.000 0.000 [root@ec01 ~]# lctl get_param ldlm.namespaces.*MDT0000*.lock_count ldlm.namespaces.exafs-MDT0000-mdc-ffff98e1eefd9800.lock_count=10004
locks were taken for 10000 files which is expected.
Another client ec02 issue "stats" to all 10000 files which client ec01 created above.
[root@ec02 ~]# mpirun -np 1 --allow-run-as-root /work/tools/bin/mdtest -F -d /exafs/mdtest.out/ -n 10000 -u -T Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation 0.000 0.000 0.000 0.000 File stat 3353.370 3353.370 3353.370 0.000 File read 0.000 0.000 0.000 0.000 File removal 0.000 0.000 0.000 0.000 Tree creation 0.000 0.000 0.000 0.000 Tree removal 0.000 0.000 0.000 0.000 [root@ec01 ~]# lctl get_param ldlm.namespaces.*MDT0000*.lock_count ldlm.namespaces.exafs-MDT0000-mdc-ffff98e1eefd9800.lock_count=4
all locks for 10000 files on ec01 were canceled which is also expected.
[root@ec02 ~]# lctl get_param ldlm.namespaces.*MDT0000*.lock_count ldlm.namespaces.exafs-MDT0000-mdc-ffff9d914c191000.lock_count=30004
But, why it requires 3 x locks against 10000 files for "stats"?
This was same scenarios of what I original tested in LU-12325, but different results and behaviors..
When I tested before, second client took same number of locks of target files for "stat". But, now it takes 3x..
Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37105
Subject: LU-12325 ldlm: mode downgrade, wire changes
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 074008cdfc8e3e7efffc30e744896cda0da327d5
Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37088
Subject: LU-12325 ldlm: lock convert with mode downgrade
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f872a4cbdaa4f7d52ef8fc9e0cbdcee041e50dad
yes, for write we need OPEN to return PW lock, so that is 'always' mode of dom_lock parameter.
patch https://review.whamcloud.com/35031 + dom_lock=trylock works for open-create, stat workload, but locks are still incompatible for "open-write + stat" workload, right?
patch https://review.whamcloud.com/35031 worked well and it can optimize performance for the following workload.
client1: open-create (zero byte file) client2: stat client1: stat
Here is test resutls.
dom_lock=always
[root@c082 ~]# /work/home/sihara/io-500-dev/bin/mdtest -C -F -d /es90/out -n 10000 -u [root@c082 ~]# lctl get_param ldlm.namespaces.*MDT0000*.lock_count ldlm.namespaces.es90-MDT0000-mdc-ffff92bf94e4d800.lock_count=10004 [root@c083 ~]# /work/home/sihara/io-500-dev/bin/mdtest -T -F -d /es90/out -n 10000 -u [root@c082 ~]# lctl get_param ldlm.namespaces.*MDT0000*.lock_count ldlm.namespaces.es90-MDT0000-mdc-ffff92bf94e4d800.lock_count=4 [root@c082 ~]# /work/home/sihara/io-500-dev/bin/mdtest -T -F -d /es90/out -n 10000 -u SUMMARY rate: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 0.000 0.000 0.000 0.000 File stat : 9145.261 9145.261 9145.261 0.000 File read : 0.000 0.000 0.000 0.000 File removal : 0.000 0.000 0.000 0.000 Tree creation : 0.000 0.000 0.000 0.000 Tree removal : 0.000 0.000 0.000 0.000
dom_lock=never
[root@c082 ~]# /work/home/sihara/io-500-dev/bin/mdtest -C -F -d /es90/out -n 10000 -u [root@c082 ~]# lctl get_param ldlm.namespaces.*MDT0000*.lock_count ldlm.namespaces.es90-MDT0000-mdc-ffff92bf94e4d800.lock_count=10004 [root@c083 ~]# /work/home/sihara/io-500-dev/bin/mdtest -T -F -d /es90/out -n 10000 -u [root@c082 ~]# lctl get_param ldlm.namespaces.*MDT0000*.lock_count ldlm.namespaces.es90-MDT0000-mdc-ffff92bf94e4d800.lock_count=10004 <--- c082 can keep locks, but it's still imcompatible for stat [root@c082 ~]# /work/home/sihara/io-500-dev/bin/mdtest -T -F -d /es90/out -n 10000 -u SUMMARY rate: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 0.000 0.000 0.000 0.000 File stat : 12923.465 12923.465 12923.465 0.000 File read : 0.000 0.000 0.000 0.000 File removal : 0.000 0.000 0.000 0.000 Tree creation : 0.000 0.000 0.000 0.000 Tree removal : 0.000 0.000 0.000 0.000
patch https://review.whamcloud.com/35031 + dom_lock=trylock
[root@c082 ~]# /work/home/sihara/io-500-dev/bin/mdtest -C -F -d /es90/out -n 10000 -u [root@c082 ~]# lctl get_param ldlm.namespaces.*MDT0000*.lock_count ldlm.namespaces.es90-MDT0000-mdc-ffff92bf94e4d800.lock_count=10004 [root@c083 ~]# /work/home/sihara/io-500-dev/bin/mdtest -T -F -d /es90/out -n 10000 -u [root@c082 ~]# lctl get_param ldlm.namespaces.*MDT0000*.lock_count ldlm.namespaces.es90-MDT0000-mdc-ffff92bf94e4d800.lock_count=10004 [root@c082 ~]# /work/home/sihara/io-500-dev/bin/mdtest -T -F -d /es90/out -n 10000 -u SUMMARY rate: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 0.000 0.000 0.000 0.000 File stat : 101680.001 101680.001 101680.001 0.000 <-- 10x faster File read : 0.000 0.000 0.000 0.000 File removal : 0.000 0.000 0.000 0.000 Tree creation : 0.000 0.000 0.000 0.000 Tree removal : 0.000 0.000 0.000 0.000
"Mikhail Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51262
Subject: LU-12325 mdc: reduce DoM lock mode on file close
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 306444123a484729bf6cb435b1c2fb84b6170158