[LU-4398] mdt_object_open_lock() may not flush conflicting handles Created: 19/Dec/13  Updated: 05/Dec/19  Resolved: 06/Sep/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0, Lustre 2.5.1
Fix Version/s: Lustre 2.13.0, Lustre 2.12.4

Type: Bug Priority: Critical
Reporter: John Hammond Assignee: Gu Zheng (Inactive)
Resolution: Fixed Votes: 0
Labels: mdt

Issue Links:
Duplicate
duplicates LU-4773 "Text file busy" error when creating ... Resolved
is duplicated by LU-4996 Interop 2.5.1<->2.6 failure on test s... Resolved
Related
is related to LU-4152 layout locks can cause deadlock Resolved
is related to LU-10300 Can the Lustre 2.10.x clients support... Resolved
is related to LU-12661 sanity test_817: FAIL: failed to exec... Open
is related to LU-5197 A performance regression of "FileRead... Resolved
is related to LU-4520 Text file busy error -- mainline 3.12... Resolved
is related to LU-6232 Text file busy error -- lustre 2.6.0 ... Resolved
is related to LU-10457 open_by_handle_at() in write mode tri... Resolved
is related to LU-4671 sanity test_43: multiop still running Resolved
is related to LU-10948 client cache open lock after N opens Open
Severity: 3
Rank (Obsolete): 12075

 Description   

Calls to mdt_object_open_lock() which do not have MDS_OPEN_LOCK in open flags may fail to flush conflicting handles.

t:~# rm /mnt/lustre/*
t:~# ls /mnt/lustre/
t:~# cp /bin/echo /mnt/lustre/echo
t:~# lfs path2fid /mnt/lustre/echo
[0x280000401:0x9:0x0]
t:~# /mnt/lustre/echo Hi
Hi
t:~# lctl clear
t:~# echo Bye > /mnt/lustre2/echo
-bash: /mnt/lustre2/echo: Text file busy
t:~# lctl dk > 2.dk
00000004:00000002:0.0:1387473999.026142:0:15064:0:(mdt_open.c:1610:mdt_reint_open()) I am going to open [0x200000007:0x1:0x0]/(echo->[0x280000402:0x4:0x0]) cr_flag=01102 mode=0100666 msg_flag=0x0

** There is a typo in the CDEBUG for the next message, the open count
** is really printed.

        CDEBUG(D_INODE, "normal open:"DFID" lease count: %d, lm: %d\n",
               PFID(mdt_object_fid(obj)),
               atomic_read(&obj->mot_open_count), lm);

00000004:00000002:0.0:1387473999.026365:0:15064:0:(mdt_open.c:1242:mdt_object_open_lock()) normal open:[0x280000401:0x9:0x0] lease count: 1, lm: 16

A CR LOOKUP LAYOUT is granted:

00000004:00000002:0.0:1387473999.026430:0:15064:0:(mdt_open.c:1269:mdt_object_open_lock()) Requested bits lock:[0x280000401:0x9:0x0], ibits = 0x9, open_flags = 01102, try_layout = 1, rc = 0
00000004:00000001:0.0:1387473999.026433:0:15064:0:(mdt_open.c:1332:mdt_object_open_lock()) Process leaving via out (rc=0 : 0 : 0x0)

00000004:00000001:0.0:1387473999.026456:0:15064:0:(mdt_open.c:536:mdt_write_get()) Process leaving (rc=18446744073709551590 : -26 : ffffffffffffffe6)
00000004:00000001:0.0:1387473999.026458:0:15064:0:(mdt_open.c:723:mdt_mfd_open()) Process leaving (rc=18446744073709551590 : -26 : ffffffffffffffe6)
00000004:00000001:0.0:1387473999.026459:0:15064:0:(mdt_open.c:994:mdt_finish_open()) Process leaving (rc=18446744073709551590 : -26 : ffffffffffffffe6)


 Comments   
Comment by John Hammond [ 30/Jan/14 ]

Please see http://review.whamcloud.com/#/c/9063/.

Comment by Jodi Levi (Inactive) [ 24/Mar/14 ]

Patch landed to Master. Please reopen ticket if more work is needed.

Comment by Manish Patel (Inactive) [ 26/Mar/14 ]

Hi Jodi,

We are seeing similar issues with Lustre 2.4.1 & 2.4.2 version. We opened a ticket LU-4773 and it was set as duplicate, so can we have a patch for 2.4 branch.

Thank You,
Manish

Comment by Bob Glossman (Inactive) [ 27/Mar/14 ]

backport to b2_4:
http://review.whamcloud.com/9826

Comment by Andreas Dilger [ 05/May/14 ]

Cherry pick to b2_5: http://review.whamcloud.com/10218

Comment by Jodi Levi (Inactive) [ 23/Jun/14 ]

Patch was reverted due to performance regression

Comment by Cédric Dufour [ 30/Oct/14 ]

Running 2.5.2 server-side with reverted patch re-introduced LU-4520 (ETXTBSY error) => grossly put, no way to edit/use scripts on Lustre (!...).
So we re-patched 2.5.2 in order to get rid of LU-4520 again.

We ran perfomances tests before and after applying the patch and found no LU-5197-like significant performances regression:

BEFORE
$ mpirun -np 1 -host localhost ./mdtest-1.9.3 -n 100000 -i 1 -p 5 -u -v -F -d ./mdtest.out
SUMMARY: (of 1 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :       1191.533       1191.533       1191.533          0.000
   File stat         :        975.334        975.334        975.334          0.000
   File read         :       2502.797       2502.797       2502.797          0.000
   File removal      :       1299.404       1299.404       1299.404          0.000
   Tree creation     :       1286.991       1286.991       1286.991          0.000
   Tree removal      :        320.616        320.616        320.616          0.000

$ mpirun -np 66 -hostfile ./hostfile ./mdtest-1.9.3 -n 10000 -i 1 -p 5 -u -v -F -d ./mdtest.out
SUMMARY: (of 1 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :      12310.369      12310.369      12310.369          0.000
   File stat         :      46494.079      46494.079      46494.079          0.000
   File read         :      75171.339      75171.339      75171.339          0.000
   File removal      :      11508.120      11508.120      11508.120          0.000
   Tree creation     :         41.568         41.568         41.568          0.000
   Tree removal      :         18.342         18.342         18.342          0.000
AFTER
$ mpirun -np 1 -host localhost ./mdtest-1.9.3 -n 100000 -i 1 -p 5 -u -v -F -d ./mdtest.out
SUMMARY: (of 1 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :       1179.328       1179.328       1179.328          0.000
   File stat         :        948.171        948.171        948.171          0.000
   File read         :       2472.216       2472.216       2472.216          0.000
   File removal      :       1147.826       1147.826       1147.826          0.000
   Tree creation     :        689.739        689.739        689.739          0.000
   Tree removal      :        131.545        131.545        131.545          0.000

$ mpirun -np 66 -hostfile ./hostfile ./mdtest-1.9.3 -n 10000 -i 1 -p 5 -u -v -F -d ./mdtest.out
SUMMARY: (of 1 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   File creation     :      13519.022      13519.022      13519.022          0.000
   File stat         :      46112.439      46112.439      46112.439          0.000
   File read         :      78350.583      78350.583      78350.583          0.000
   File removal      :      11139.655      11139.655      11139.655          0.000
   Tree creation     :         51.435         51.435         51.435          0.000
   Tree removal      :         29.340         29.340         29.340          0.000

Shouldn't LU-4520 (bug/fix) have priority over LU-5197 (...) ?

Comment by Roland Fehrenbacher [ 30/Oct/14 ]

Sorry to sound a bit harsh: But I think such not very well investigated last minute commits, that reintroduce
known problems, should really be avoided at any cost. In my view delaying the release and properly investigating
the issue would have been the right way to handle this.

Comment by Vitaly Fertman [ 17/Jun/15 ]

is it still a problem after landing
LU-4367 llite: Make revalidate return 0 for opens
? if not, please consider to re-land.

Comment by Andrew Perepechko [ 04/Apr/16 ]

After some testing and profiling, we've found that the main cause of the observable performance drop [with our code base] is the loop over the directory cache entries in ll_invalidate_negative_children():

1) for an open request, the client creates an UPDATE lock for the directory (see mdc_enqueue())
2) the MDS takes and releases an OPEN lock for the requested file (see mdt_object_open_lock()/mdt_object_open_unlock())
3) the client receives the lock abort from server and calls failed_lock_cleanup() for the earlier created UPDATE lock
4) ll_md_blocking_ast() truncates the directory and calls ll_invalidate_negative_children(), which loops over the whole directory, and it takes more time than for the rest open processing both on the client and the server

The directory cache is populated by the mdtest_stat phase which happens right before mdtest_read.

Comment by Gerrit Updater [ 17/Apr/18 ]

Gu Zheng (gzheng@ddn.com) uploaded a new patch: https://review.whamcloud.com/32020
Subject: LU-4398 mdt: acquire an open lock for write or execute
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9fd2111b898bbe73c15b21ac919a97baf5ffbcee

Comment by Shuichi Ihara (Inactive) [ 25/Apr/18 ]

New performance validation test results.
MDS: (1 x Platinum 8160, 96GB memory, FDR Infiniband)
MDT: 4 x SSD(2 x RAID0)
OSS: 4 x OSS(1 x E5-2695 v4, 90GB memory, EDR Infiniband)
OST: 32 x OST(SSD)
OS: CentOS7.4
branch: master (commit 68add0d98d2299a73ee846d141fa78c146a4e206)

Single thread (Unique Dir), without patch

mdtest -n 100000 -u -d /scratch1/mdtest.out -vv -p 10 -i 5
SUMMARY: (of 5 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   Directory creation:       8595.078       7781.167       8182.740        304.545
   Directory stat    :       9535.046       9511.142       9525.650          9.634
   Directory removal :       8354.321       8182.212       8298.823         61.522
   File creation     :       3915.182       3720.051       3778.728         70.700
   File stat         :       2613.702       2493.075       2558.991         42.179
   File read         :       4888.819       4808.772       4852.502         27.414
   File removal      :       6210.268       6107.493       6160.375         40.613
   Tree creation     :       3740.316       2256.135       3190.418        545.341
   Tree removal      :        625.867        519.335        581.505         41.098

Single thread (Unique Dir), with patch https://review.whamcloud.com/32020

SUMMARY: (of 5 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   Directory creation:       8699.744       8526.927       8616.514         56.854
   Directory stat    :       9570.373       9416.036       9464.414         54.610
   Directory removal :       8379.018       8242.002       8322.859         46.589
   File creation     :       4014.823       3941.957       3965.598         25.432
   File stat         :       2692.839       2539.824       2604.053         60.191
   File read         :       4845.603       4786.409       4812.256         23.379
   File removal      :       6264.305       6127.145       6177.107         50.328
   Tree creation     :       3685.373       2451.131       3107.921        404.678
   Tree removal      :        622.043        547.926        592.604         26.152
V-1: Entering timestamp...

Single thread (Shared Dir), without patch

SUMMARY: (of 5 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   Directory creation:       8343.542       7633.126       7951.672        257.494
   Directory stat    :       9662.455       9451.991       9524.349         78.840
   Directory removal :       8433.412       8349.430       8409.764         31.368
   File creation     :       3821.459       3651.260       3730.709         60.543
   File stat         :       2560.355       2504.332       2523.879         19.464
   File read         :       4872.563       4815.895       4848.758         21.397
   File removal      :       6184.575       6077.141       6132.553         37.831
   Tree creation     :       3660.403       2843.131       3295.895        340.903
   Tree removal      :        635.018        549.101        606.398         30.889
V-1: Entering timestamp...

Single thread (Shared Dir) with patch https://review.whamcloud.com/32020

SUMMARY: (of 5 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   Directory creation:       8765.643       8633.499       8696.714         52.255
   Directory stat    :       9628.551       9475.161       9522.981         54.698
   Directory removal :       8417.031       8354.338       8393.621         26.821
   File creation     :       4009.460       3919.571       3948.138         32.438
   File stat         :       2703.470       2495.517       2581.902         72.822
   File read         :       4862.837       4798.651       4831.803         26.651
   File removal      :       6199.040       6103.446       6134.598         34.802
   Tree creation     :       3844.778       2212.522       3423.913        620.257
   Tree removal      :        631.484        591.656        613.972         14.408
V-1: Entering timestamp...

I didn't see any performance regression with https://review.whamcloud.com/32020

Comment by Shuichi Ihara (Inactive) [ 25/Apr/18 ]

More test results.
this is aggregated medata performance from 640 processes on 32 clients.
mdtest -u -n 2000 -d /scratch1/mdtest.out -vv -p 10 -i 5

multiple threads (Unique Dir) without patch

SUMMARY: (of 5 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   Directory creation:      85738.040      48215.179      64299.844      13178.781
   Directory stat    :     466394.960     422832.513     439886.545      15451.794
   Directory removal :     110023.446      91823.376      97835.719       6354.000
   File creation     :     151871.035     134663.671     141730.993       7939.270
   File stat         :     442389.625     337812.992     399920.271      44641.353
   File read         :     205108.969     162572.210     188515.094      18793.151
   File removal      :     167529.376     131252.310     157454.414      13315.442
   Tree creation     :         70.777         44.100         59.640          8.891
   Tree removal      :         14.663         12.669         14.087          0.739
V-1: Entering timestamp...

multiple threads (Unique Dir) with patch https://review.whamcloud.com/32020

SUMMARY: (of 5 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   Directory creation:      82652.610      44091.794      63787.686      14880.388
   Directory stat    :     458097.169     422083.254     439946.300      11681.468
   Directory removal :     105728.113      93508.243      98876.882       4884.346
   File creation     :     159377.676     135557.207     143571.336       9767.230
   File stat         :     428955.890     356154.397     408412.207      26973.229
   File read         :     211791.753     198140.228     201966.601       5047.480
   File removal      :     162460.647     148476.247     157409.212       4936.475
   Tree creation     :        117.207         62.022         85.996         21.154
   Tree removal      :         14.477         10.220         13.435          1.615
V-1: Entering timestamp...

multiple threads (Shared Dir) without patch

SUMMARY: (of 5 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   Directory creation:      70293.387      50636.259      57773.291       6601.573
   Directory stat    :     194831.080     191888.767     193959.691       1081.169
   Directory removal :     103340.293      90382.180      96926.031       4897.878
   File creation     :     100293.806      88347.843      93043.994       4061.926
   File stat         :     190135.471     188264.680     188906.038        650.702
   File read         :     217199.991     210154.673     214794.152       2750.552
   File removal      :     115820.668     111482.599     114100.211       1600.743
   Tree creation     :       3568.114       1769.035       2862.177        679.311
   Tree removal      :         37.178         32.179         35.844          1.862
V-1: Entering timestamp...

multiple threads (Shared Dir) with patch https://review.whamcloud.com/32020

SUMMARY: (of 5 iterations)
   Operation                      Max            Min           Mean        Std Dev
   ---------                      ---            ---           ----        -------
   Directory creation:      77052.637      44096.251      64670.828      11525.211
   Directory stat    :     195513.946     188466.102     192211.917       2755.850
   Directory removal :     101404.091      87970.363      94185.517       4870.481
   File creation     :      95458.096      85660.725      91935.194       3301.302
   File stat         :     192118.268     187095.123     190492.049       1752.886
   File read         :     213193.456     206416.942     211264.017       2566.669
   File removal      :     120906.202     112162.616     115835.603       2884.534
   Tree creation     :       3579.879       2028.805       3009.914        518.948
   Tree removal      :         36.566         35.289         36.036          0.422
V-1: Entering timestamp...

So far, https://review.whamcloud.com/32020 doesn't reproduce performance regressions.

Comment by Gerrit Updater [ 03/May/18 ]

Jinshan Xiong (jinshan.xiong@gmail.com) uploaded a new patch: https://review.whamcloud.com/32264
Subject: LU-4398 llite: do not cache write open lock for exec file
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: aa9c711b7d6e459e5c259a0e7b5721aa8f0ade80

Comment by Gerrit Updater [ 03/May/18 ]

Jinshan Xiong (jinshan.xiong@gmail.com) uploaded a new patch: https://review.whamcloud.com/32265
Subject: LU-4398 llite: do not cache write open lock for exec file
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ba79fe03a7fcdbc1c7e8ef186c987905c2cb6e85

Comment by Colin Faber [X] (Inactive) [ 28/Nov/18 ]

Has this been abandoned?

Comment by Andreas Dilger [ 04/Dec/18 ]

Colin, I agree it seems the patches on this ticket have not made any progress lately. It looks like the most recent version of https://review.whamcloud.com/32020 indicate that the patch is causing a regression in sanity-hsm, and needs to be refreshed in any case. It looks like there is also a test case in https://pastebin.com/GHj0rqxT that would be sueful to include.

Comment by Gu Zheng (Inactive) [ 07/Jun/19 ]

I back ported Jinshan's new fix patch (https://review.whamcloud.com/#/c/32265) to master,  which always sends an real mds close from the client, if the executable file was opened in write mode. But it's strange that, the nfs test always failed on aarch64 client (other arches clients are fine), and I can easy reproduce it on trevis-80vm3.

+ systemctl restart nfs-server.service
+ mount -t lustre 10.9.6.161@tcp:/lustre /mnt/lustre
+ mkdir /mnt/lustre/nfs_test
+ exportfs -orw,no_root_squash localhost:/mnt/lustre/nfs_test
+ mount -t nfs localhost:/mnt/lustre/nfs_test /tmp/nfs-60b4UP/
+ cp /bin/true /tmp/nfs-60b4UP/
+ /mnt/lustre/nfs_test/true
LU-4398.sh: line 10: /mnt/lustre/nfs_test/true: Text file busy
+ umount /tmp/nfs-60b4UP/
+ systemctl stop nfs-server.service
+ rm -rf /mnt/lustre/nfs_test
+ umount /mnt/lustre

kernel: 4.14.0-115.2.2.el7a.aarch64

Comment by Gu Zheng (Inactive) [ 22/Jul/19 ]

Seems nfs is problem on aarch64, the script failed even ran without lustre.

[root@trevis-80vm4 ~]# bash run_test.sh
+ systemctl restart nfs-server.service
+ rm -rf /tmp/nfs_test /tmp/nfs-60b4UP/
+ mkdir /tmp/nfs_test /tmp/nfs-60b4UP/
+ exportfs -o rw,no_root_squash localhost:/tmp/nfs_test
+ mount -t nfs localhost:/tmp/nfs_test /tmp/nfs-60b4UP/
+ cp /bin/true /tmp/nfs-60b4UP/
+ /tmp/nfs_test/true
run_test.sh: line 14: /tmp/nfs_test/true: Text file busy
+ umount /tmp/nfs-60b4UP/
+ systemctl stop nfs-server.service

Even use nfsV3:

[root@trevis-80vm4 ~]# bash run_test.sh
+ systemctl restart nfs-server.service
+ rm -rf /tmp/nfs_test /tmp/nfs-60b4UP/
+ mkdir /tmp/nfs_test /tmp/nfs-60b4UP/
+ exportfs -o rw,no_root_squash localhost:/tmp/nfs_test
+ mount -t nfs -o vers=3 localhost:/tmp/nfs_test /tmp/nfs-60b4UP/
+ cp /bin/true /tmp/nfs-60b4UP/
+ /tmp/nfs_test/true
run_test.sh: line 13: /tmp/nfs_test/true: Text file busy
+ umount /tmp/nfs-60b4UP/
+ systemctl stop nfs-server.service

Will skip the test case for aarch64, temporarily.

Comment by Gerrit Updater [ 30/Jul/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32265/
Subject: LU-4398 llite: do not cache write open lock for exec file
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 6dd9d57bc006a37731d34409ce43de13c192e0cc

Comment by Gu Zheng (Inactive) [ 06/Sep/19 ]

Fix patch has been landed, it's fine to close this ticket.

Comment by Gerrit Updater [ 10/Sep/19 ]

James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/36142
Subject: LU-4398 tests: re-enable sanity test 817
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: e800ae379b506fb6f5f4917fe4e896ff5873dfb3

Comment by Gerrit Updater [ 05/Nov/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36680
Subject: LU-4398 llite: do not cache write open lock for exec file
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 639b51789ebcae59ba8ed1ba383bdc9e3126d67d

Comment by Gerrit Updater [ 05/Dec/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36680/
Subject: LU-4398 llite: do not cache write open lock for exec file
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 07f782635c7b5de59a13371c6d14aa2c4910d257

Generated at Sat Feb 10 01:42:21 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.