[LU-4398] mdt_object_open_lock() may not flush conflicting handles Created: 19/Dec/13 Updated: 05/Dec/19 Resolved: 06/Sep/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0, Lustre 2.5.1 |
| Fix Version/s: | Lustre 2.13.0, Lustre 2.12.4 |
| Type: | Bug | Priority: | Critical |
| Reporter: | John Hammond | Assignee: | Gu Zheng (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | mdt | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 12075 | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Calls to mdt_object_open_lock() which do not have MDS_OPEN_LOCK in open flags may fail to flush conflicting handles. t:~# rm /mnt/lustre/* t:~# ls /mnt/lustre/ t:~# cp /bin/echo /mnt/lustre/echo t:~# lfs path2fid /mnt/lustre/echo [0x280000401:0x9:0x0] t:~# /mnt/lustre/echo Hi Hi t:~# lctl clear t:~# echo Bye > /mnt/lustre2/echo -bash: /mnt/lustre2/echo: Text file busy t:~# lctl dk > 2.dk 00000004:00000002:0.0:1387473999.026142:0:15064:0:(mdt_open.c:1610:mdt_reint_open()) I am going to open [0x200000007:0x1:0x0]/(echo->[0x280000402:0x4:0x0]) cr_flag=01102 mode=0100666 msg_flag=0x0
** There is a typo in the CDEBUG for the next message, the open count
** is really printed.
CDEBUG(D_INODE, "normal open:"DFID" lease count: %d, lm: %d\n",
PFID(mdt_object_fid(obj)),
atomic_read(&obj->mot_open_count), lm);
00000004:00000002:0.0:1387473999.026365:0:15064:0:(mdt_open.c:1242:mdt_object_open_lock()) normal open:[0x280000401:0x9:0x0] lease count: 1, lm: 16
A CR LOOKUP LAYOUT is granted:
00000004:00000002:0.0:1387473999.026430:0:15064:0:(mdt_open.c:1269:mdt_object_open_lock()) Requested bits lock:[0x280000401:0x9:0x0], ibits = 0x9, open_flags = 01102, try_layout = 1, rc = 0
00000004:00000001:0.0:1387473999.026433:0:15064:0:(mdt_open.c:1332:mdt_object_open_lock()) Process leaving via out (rc=0 : 0 : 0x0)
00000004:00000001:0.0:1387473999.026456:0:15064:0:(mdt_open.c:536:mdt_write_get()) Process leaving (rc=18446744073709551590 : -26 : ffffffffffffffe6)
00000004:00000001:0.0:1387473999.026458:0:15064:0:(mdt_open.c:723:mdt_mfd_open()) Process leaving (rc=18446744073709551590 : -26 : ffffffffffffffe6)
00000004:00000001:0.0:1387473999.026459:0:15064:0:(mdt_open.c:994:mdt_finish_open()) Process leaving (rc=18446744073709551590 : -26 : ffffffffffffffe6)
|
| Comments |
| Comment by John Hammond [ 30/Jan/14 ] |
|
Please see http://review.whamcloud.com/#/c/9063/. |
| Comment by Jodi Levi (Inactive) [ 24/Mar/14 ] |
|
Patch landed to Master. Please reopen ticket if more work is needed. |
| Comment by Manish Patel (Inactive) [ 26/Mar/14 ] |
|
Hi Jodi, We are seeing similar issues with Lustre 2.4.1 & 2.4.2 version. We opened a ticket Thank You, |
| Comment by Bob Glossman (Inactive) [ 27/Mar/14 ] |
|
backport to b2_4: |
| Comment by Andreas Dilger [ 05/May/14 ] |
|
Cherry pick to b2_5: http://review.whamcloud.com/10218 |
| Comment by Jodi Levi (Inactive) [ 23/Jun/14 ] |
|
Patch was reverted due to performance regression |
| Comment by Cédric Dufour [ 30/Oct/14 ] |
|
Running 2.5.2 server-side with reverted patch re-introduced We ran perfomances tests before and after applying the patch and found no BEFORE $ mpirun -np 1 -host localhost ./mdtest-1.9.3 -n 100000 -i 1 -p 5 -u -v -F -d ./mdtest.out SUMMARY: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 1191.533 1191.533 1191.533 0.000 File stat : 975.334 975.334 975.334 0.000 File read : 2502.797 2502.797 2502.797 0.000 File removal : 1299.404 1299.404 1299.404 0.000 Tree creation : 1286.991 1286.991 1286.991 0.000 Tree removal : 320.616 320.616 320.616 0.000 $ mpirun -np 66 -hostfile ./hostfile ./mdtest-1.9.3 -n 10000 -i 1 -p 5 -u -v -F -d ./mdtest.out SUMMARY: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 12310.369 12310.369 12310.369 0.000 File stat : 46494.079 46494.079 46494.079 0.000 File read : 75171.339 75171.339 75171.339 0.000 File removal : 11508.120 11508.120 11508.120 0.000 Tree creation : 41.568 41.568 41.568 0.000 Tree removal : 18.342 18.342 18.342 0.000 AFTER $ mpirun -np 1 -host localhost ./mdtest-1.9.3 -n 100000 -i 1 -p 5 -u -v -F -d ./mdtest.out SUMMARY: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 1179.328 1179.328 1179.328 0.000 File stat : 948.171 948.171 948.171 0.000 File read : 2472.216 2472.216 2472.216 0.000 File removal : 1147.826 1147.826 1147.826 0.000 Tree creation : 689.739 689.739 689.739 0.000 Tree removal : 131.545 131.545 131.545 0.000 $ mpirun -np 66 -hostfile ./hostfile ./mdtest-1.9.3 -n 10000 -i 1 -p 5 -u -v -F -d ./mdtest.out SUMMARY: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 13519.022 13519.022 13519.022 0.000 File stat : 46112.439 46112.439 46112.439 0.000 File read : 78350.583 78350.583 78350.583 0.000 File removal : 11139.655 11139.655 11139.655 0.000 Tree creation : 51.435 51.435 51.435 0.000 Tree removal : 29.340 29.340 29.340 0.000 Shouldn't |
| Comment by Roland Fehrenbacher [ 30/Oct/14 ] |
|
Sorry to sound a bit harsh: But I think such not very well investigated last minute commits, that reintroduce |
| Comment by Vitaly Fertman [ 17/Jun/15 ] |
|
is it still a problem after landing |
| Comment by Andrew Perepechko [ 04/Apr/16 ] |
|
After some testing and profiling, we've found that the main cause of the observable performance drop [with our code base] is the loop over the directory cache entries in ll_invalidate_negative_children(): 1) for an open request, the client creates an UPDATE lock for the directory (see mdc_enqueue()) The directory cache is populated by the mdtest_stat phase which happens right before mdtest_read. |
| Comment by Gerrit Updater [ 17/Apr/18 ] |
|
Gu Zheng (gzheng@ddn.com) uploaded a new patch: https://review.whamcloud.com/32020 |
| Comment by Shuichi Ihara (Inactive) [ 25/Apr/18 ] |
|
New performance validation test results. Single thread (Unique Dir), without patch mdtest -n 100000 -u -d /scratch1/mdtest.out -vv -p 10 -i 5 SUMMARY: (of 5 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation: 8595.078 7781.167 8182.740 304.545 Directory stat : 9535.046 9511.142 9525.650 9.634 Directory removal : 8354.321 8182.212 8298.823 61.522 File creation : 3915.182 3720.051 3778.728 70.700 File stat : 2613.702 2493.075 2558.991 42.179 File read : 4888.819 4808.772 4852.502 27.414 File removal : 6210.268 6107.493 6160.375 40.613 Tree creation : 3740.316 2256.135 3190.418 545.341 Tree removal : 625.867 519.335 581.505 41.098 Single thread (Unique Dir), with patch https://review.whamcloud.com/32020 SUMMARY: (of 5 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation: 8699.744 8526.927 8616.514 56.854 Directory stat : 9570.373 9416.036 9464.414 54.610 Directory removal : 8379.018 8242.002 8322.859 46.589 File creation : 4014.823 3941.957 3965.598 25.432 File stat : 2692.839 2539.824 2604.053 60.191 File read : 4845.603 4786.409 4812.256 23.379 File removal : 6264.305 6127.145 6177.107 50.328 Tree creation : 3685.373 2451.131 3107.921 404.678 Tree removal : 622.043 547.926 592.604 26.152 V-1: Entering timestamp... Single thread (Shared Dir), without patch SUMMARY: (of 5 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation: 8343.542 7633.126 7951.672 257.494 Directory stat : 9662.455 9451.991 9524.349 78.840 Directory removal : 8433.412 8349.430 8409.764 31.368 File creation : 3821.459 3651.260 3730.709 60.543 File stat : 2560.355 2504.332 2523.879 19.464 File read : 4872.563 4815.895 4848.758 21.397 File removal : 6184.575 6077.141 6132.553 37.831 Tree creation : 3660.403 2843.131 3295.895 340.903 Tree removal : 635.018 549.101 606.398 30.889 V-1: Entering timestamp... Single thread (Shared Dir) with patch https://review.whamcloud.com/32020 SUMMARY: (of 5 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation: 8765.643 8633.499 8696.714 52.255 Directory stat : 9628.551 9475.161 9522.981 54.698 Directory removal : 8417.031 8354.338 8393.621 26.821 File creation : 4009.460 3919.571 3948.138 32.438 File stat : 2703.470 2495.517 2581.902 72.822 File read : 4862.837 4798.651 4831.803 26.651 File removal : 6199.040 6103.446 6134.598 34.802 Tree creation : 3844.778 2212.522 3423.913 620.257 Tree removal : 631.484 591.656 613.972 14.408 V-1: Entering timestamp... I didn't see any performance regression with https://review.whamcloud.com/32020 |
| Comment by Shuichi Ihara (Inactive) [ 25/Apr/18 ] |
|
More test results. multiple threads (Unique Dir) without patch SUMMARY: (of 5 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation: 85738.040 48215.179 64299.844 13178.781 Directory stat : 466394.960 422832.513 439886.545 15451.794 Directory removal : 110023.446 91823.376 97835.719 6354.000 File creation : 151871.035 134663.671 141730.993 7939.270 File stat : 442389.625 337812.992 399920.271 44641.353 File read : 205108.969 162572.210 188515.094 18793.151 File removal : 167529.376 131252.310 157454.414 13315.442 Tree creation : 70.777 44.100 59.640 8.891 Tree removal : 14.663 12.669 14.087 0.739 V-1: Entering timestamp... multiple threads (Unique Dir) with patch https://review.whamcloud.com/32020 SUMMARY: (of 5 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation: 82652.610 44091.794 63787.686 14880.388 Directory stat : 458097.169 422083.254 439946.300 11681.468 Directory removal : 105728.113 93508.243 98876.882 4884.346 File creation : 159377.676 135557.207 143571.336 9767.230 File stat : 428955.890 356154.397 408412.207 26973.229 File read : 211791.753 198140.228 201966.601 5047.480 File removal : 162460.647 148476.247 157409.212 4936.475 Tree creation : 117.207 62.022 85.996 21.154 Tree removal : 14.477 10.220 13.435 1.615 V-1: Entering timestamp... multiple threads (Shared Dir) without patch SUMMARY: (of 5 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation: 70293.387 50636.259 57773.291 6601.573 Directory stat : 194831.080 191888.767 193959.691 1081.169 Directory removal : 103340.293 90382.180 96926.031 4897.878 File creation : 100293.806 88347.843 93043.994 4061.926 File stat : 190135.471 188264.680 188906.038 650.702 File read : 217199.991 210154.673 214794.152 2750.552 File removal : 115820.668 111482.599 114100.211 1600.743 Tree creation : 3568.114 1769.035 2862.177 679.311 Tree removal : 37.178 32.179 35.844 1.862 V-1: Entering timestamp... multiple threads (Shared Dir) with patch https://review.whamcloud.com/32020 SUMMARY: (of 5 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- Directory creation: 77052.637 44096.251 64670.828 11525.211 Directory stat : 195513.946 188466.102 192211.917 2755.850 Directory removal : 101404.091 87970.363 94185.517 4870.481 File creation : 95458.096 85660.725 91935.194 3301.302 File stat : 192118.268 187095.123 190492.049 1752.886 File read : 213193.456 206416.942 211264.017 2566.669 File removal : 120906.202 112162.616 115835.603 2884.534 Tree creation : 3579.879 2028.805 3009.914 518.948 Tree removal : 36.566 35.289 36.036 0.422 V-1: Entering timestamp... So far, https://review.whamcloud.com/32020 doesn't reproduce performance regressions. |
| Comment by Gerrit Updater [ 03/May/18 ] |
|
Jinshan Xiong (jinshan.xiong@gmail.com) uploaded a new patch: https://review.whamcloud.com/32264 |
| Comment by Gerrit Updater [ 03/May/18 ] |
|
Jinshan Xiong (jinshan.xiong@gmail.com) uploaded a new patch: https://review.whamcloud.com/32265 |
| Comment by Colin Faber [X] (Inactive) [ 28/Nov/18 ] |
|
Has this been abandoned? |
| Comment by Andreas Dilger [ 04/Dec/18 ] |
|
Colin, I agree it seems the patches on this ticket have not made any progress lately. It looks like the most recent version of https://review.whamcloud.com/32020 indicate that the patch is causing a regression in sanity-hsm, and needs to be refreshed in any case. It looks like there is also a test case in https://pastebin.com/GHj0rqxT that would be sueful to include. |
| Comment by Gu Zheng (Inactive) [ 07/Jun/19 ] |
|
I back ported Jinshan's new fix patch (https://review.whamcloud.com/#/c/32265) to master, which always sends an real mds close from the client, if the executable file was opened in write mode. But it's strange that, the nfs test always failed on aarch64 client (other arches clients are fine), and I can easy reproduce it on trevis-80vm3. + systemctl restart nfs-server.service + mount -t lustre 10.9.6.161@tcp:/lustre /mnt/lustre + mkdir /mnt/lustre/nfs_test + exportfs -orw,no_root_squash localhost:/mnt/lustre/nfs_test + mount -t nfs localhost:/mnt/lustre/nfs_test /tmp/nfs-60b4UP/ + cp /bin/true /tmp/nfs-60b4UP/ + /mnt/lustre/nfs_test/true LU-4398.sh: line 10: /mnt/lustre/nfs_test/true: Text file busy + umount /tmp/nfs-60b4UP/ + systemctl stop nfs-server.service + rm -rf /mnt/lustre/nfs_test + umount /mnt/lustre kernel: 4.14.0-115.2.2.el7a.aarch64 |
| Comment by Gu Zheng (Inactive) [ 22/Jul/19 ] |
|
Seems nfs is problem on aarch64, the script failed even ran without lustre. [root@trevis-80vm4 ~]# bash run_test.sh + systemctl restart nfs-server.service + rm -rf /tmp/nfs_test /tmp/nfs-60b4UP/ + mkdir /tmp/nfs_test /tmp/nfs-60b4UP/ + exportfs -o rw,no_root_squash localhost:/tmp/nfs_test + mount -t nfs localhost:/tmp/nfs_test /tmp/nfs-60b4UP/ + cp /bin/true /tmp/nfs-60b4UP/ + /tmp/nfs_test/true run_test.sh: line 14: /tmp/nfs_test/true: Text file busy + umount /tmp/nfs-60b4UP/ + systemctl stop nfs-server.service Even use nfsV3: [root@trevis-80vm4 ~]# bash run_test.sh + systemctl restart nfs-server.service + rm -rf /tmp/nfs_test /tmp/nfs-60b4UP/ + mkdir /tmp/nfs_test /tmp/nfs-60b4UP/ + exportfs -o rw,no_root_squash localhost:/tmp/nfs_test + mount -t nfs -o vers=3 localhost:/tmp/nfs_test /tmp/nfs-60b4UP/ + cp /bin/true /tmp/nfs-60b4UP/ + /tmp/nfs_test/true run_test.sh: line 13: /tmp/nfs_test/true: Text file busy + umount /tmp/nfs-60b4UP/ + systemctl stop nfs-server.service Will skip the test case for aarch64, temporarily. |
| Comment by Gerrit Updater [ 30/Jul/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32265/ |
| Comment by Gu Zheng (Inactive) [ 06/Sep/19 ] |
|
Fix patch has been landed, it's fine to close this ticket. |
| Comment by Gerrit Updater [ 10/Sep/19 ] |
|
James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/36142 |
| Comment by Gerrit Updater [ 05/Nov/19 ] |
|
Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36680 |
| Comment by Gerrit Updater [ 05/Dec/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36680/ |