Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4398

mdt_object_open_lock() may not flush conflicting handles

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.13.0, Lustre 2.12.4
    • Lustre 2.6.0, Lustre 2.5.1
    • 3
    • 12075

    Description

      Calls to mdt_object_open_lock() which do not have MDS_OPEN_LOCK in open flags may fail to flush conflicting handles.

      t:~# rm /mnt/lustre/*
      t:~# ls /mnt/lustre/
      t:~# cp /bin/echo /mnt/lustre/echo
      t:~# lfs path2fid /mnt/lustre/echo
      [0x280000401:0x9:0x0]
      t:~# /mnt/lustre/echo Hi
      Hi
      t:~# lctl clear
      t:~# echo Bye > /mnt/lustre2/echo
      -bash: /mnt/lustre2/echo: Text file busy
      t:~# lctl dk > 2.dk
      
      00000004:00000002:0.0:1387473999.026142:0:15064:0:(mdt_open.c:1610:mdt_reint_open()) I am going to open [0x200000007:0x1:0x0]/(echo->[0x280000402:0x4:0x0]) cr_flag=01102 mode=0100666 msg_flag=0x0
      
      ** There is a typo in the CDEBUG for the next message, the open count
      ** is really printed.
      
              CDEBUG(D_INODE, "normal open:"DFID" lease count: %d, lm: %d\n",
                     PFID(mdt_object_fid(obj)),
                     atomic_read(&obj->mot_open_count), lm);
      
      00000004:00000002:0.0:1387473999.026365:0:15064:0:(mdt_open.c:1242:mdt_object_open_lock()) normal open:[0x280000401:0x9:0x0] lease count: 1, lm: 16
      
      A CR LOOKUP LAYOUT is granted:
      
      00000004:00000002:0.0:1387473999.026430:0:15064:0:(mdt_open.c:1269:mdt_object_open_lock()) Requested bits lock:[0x280000401:0x9:0x0], ibits = 0x9, open_flags = 01102, try_layout = 1, rc = 0
      00000004:00000001:0.0:1387473999.026433:0:15064:0:(mdt_open.c:1332:mdt_object_open_lock()) Process leaving via out (rc=0 : 0 : 0x0)
      
      00000004:00000001:0.0:1387473999.026456:0:15064:0:(mdt_open.c:536:mdt_write_get()) Process leaving (rc=18446744073709551590 : -26 : ffffffffffffffe6)
      00000004:00000001:0.0:1387473999.026458:0:15064:0:(mdt_open.c:723:mdt_mfd_open()) Process leaving (rc=18446744073709551590 : -26 : ffffffffffffffe6)
      00000004:00000001:0.0:1387473999.026459:0:15064:0:(mdt_open.c:994:mdt_finish_open()) Process leaving (rc=18446744073709551590 : -26 : ffffffffffffffe6)
      

      Attachments

        Issue Links

          Activity

            [LU-4398] mdt_object_open_lock() may not flush conflicting handles

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36680/
            Subject: LU-4398 llite: do not cache write open lock for exec file
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 07f782635c7b5de59a13371c6d14aa2c4910d257

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36680/ Subject: LU-4398 llite: do not cache write open lock for exec file Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 07f782635c7b5de59a13371c6d14aa2c4910d257

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36680
            Subject: LU-4398 llite: do not cache write open lock for exec file
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 639b51789ebcae59ba8ed1ba383bdc9e3126d67d

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36680 Subject: LU-4398 llite: do not cache write open lock for exec file Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 639b51789ebcae59ba8ed1ba383bdc9e3126d67d

            James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/36142
            Subject: LU-4398 tests: re-enable sanity test 817
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: e800ae379b506fb6f5f4917fe4e896ff5873dfb3

            gerrit Gerrit Updater added a comment - James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/36142 Subject: LU-4398 tests: re-enable sanity test 817 Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: e800ae379b506fb6f5f4917fe4e896ff5873dfb3

            Fix patch has been landed, it's fine to close this ticket.

            guzheng Gu Zheng (Inactive) added a comment - Fix patch has been landed, it's fine to close this ticket.

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32265/
            Subject: LU-4398 llite: do not cache write open lock for exec file
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 6dd9d57bc006a37731d34409ce43de13c192e0cc

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32265/ Subject: LU-4398 llite: do not cache write open lock for exec file Project: fs/lustre-release Branch: master Current Patch Set: Commit: 6dd9d57bc006a37731d34409ce43de13c192e0cc

            Seems nfs is problem on aarch64, the script failed even ran without lustre.

            [root@trevis-80vm4 ~]# bash run_test.sh
            + systemctl restart nfs-server.service
            + rm -rf /tmp/nfs_test /tmp/nfs-60b4UP/
            + mkdir /tmp/nfs_test /tmp/nfs-60b4UP/
            + exportfs -o rw,no_root_squash localhost:/tmp/nfs_test
            + mount -t nfs localhost:/tmp/nfs_test /tmp/nfs-60b4UP/
            + cp /bin/true /tmp/nfs-60b4UP/
            + /tmp/nfs_test/true
            run_test.sh: line 14: /tmp/nfs_test/true: Text file busy
            + umount /tmp/nfs-60b4UP/
            + systemctl stop nfs-server.service
            

            Even use nfsV3:

            [root@trevis-80vm4 ~]# bash run_test.sh
            + systemctl restart nfs-server.service
            + rm -rf /tmp/nfs_test /tmp/nfs-60b4UP/
            + mkdir /tmp/nfs_test /tmp/nfs-60b4UP/
            + exportfs -o rw,no_root_squash localhost:/tmp/nfs_test
            + mount -t nfs -o vers=3 localhost:/tmp/nfs_test /tmp/nfs-60b4UP/
            + cp /bin/true /tmp/nfs-60b4UP/
            + /tmp/nfs_test/true
            run_test.sh: line 13: /tmp/nfs_test/true: Text file busy
            + umount /tmp/nfs-60b4UP/
            + systemctl stop nfs-server.service
            

            Will skip the test case for aarch64, temporarily.

            guzheng Gu Zheng (Inactive) added a comment - Seems nfs is problem on aarch64, the script failed even ran without lustre. [root@trevis-80vm4 ~]# bash run_test.sh + systemctl restart nfs-server.service + rm -rf /tmp/nfs_test /tmp/nfs-60b4UP/ + mkdir /tmp/nfs_test /tmp/nfs-60b4UP/ + exportfs -o rw,no_root_squash localhost:/tmp/nfs_test + mount -t nfs localhost:/tmp/nfs_test /tmp/nfs-60b4UP/ + cp /bin/ true /tmp/nfs-60b4UP/ + /tmp/nfs_test/ true run_test.sh: line 14: /tmp/nfs_test/ true : Text file busy + umount /tmp/nfs-60b4UP/ + systemctl stop nfs-server.service Even use nfsV3: [root@trevis-80vm4 ~]# bash run_test.sh + systemctl restart nfs-server.service + rm -rf /tmp/nfs_test /tmp/nfs-60b4UP/ + mkdir /tmp/nfs_test /tmp/nfs-60b4UP/ + exportfs -o rw,no_root_squash localhost:/tmp/nfs_test + mount -t nfs -o vers=3 localhost:/tmp/nfs_test /tmp/nfs-60b4UP/ + cp /bin/ true /tmp/nfs-60b4UP/ + /tmp/nfs_test/ true run_test.sh: line 13: /tmp/nfs_test/ true : Text file busy + umount /tmp/nfs-60b4UP/ + systemctl stop nfs-server.service Will skip the test case for aarch64, temporarily.

            I back ported Jinshan's new fix patch (https://review.whamcloud.com/#/c/32265) to master,  which always sends an real mds close from the client, if the executable file was opened in write mode. But it's strange that, the nfs test always failed on aarch64 client (other arches clients are fine), and I can easy reproduce it on trevis-80vm3.

            + systemctl restart nfs-server.service
            + mount -t lustre 10.9.6.161@tcp:/lustre /mnt/lustre
            + mkdir /mnt/lustre/nfs_test
            + exportfs -orw,no_root_squash localhost:/mnt/lustre/nfs_test
            + mount -t nfs localhost:/mnt/lustre/nfs_test /tmp/nfs-60b4UP/
            + cp /bin/true /tmp/nfs-60b4UP/
            + /mnt/lustre/nfs_test/true
            LU-4398.sh: line 10: /mnt/lustre/nfs_test/true: Text file busy
            + umount /tmp/nfs-60b4UP/
            + systemctl stop nfs-server.service
            + rm -rf /mnt/lustre/nfs_test
            + umount /mnt/lustre
            

            kernel: 4.14.0-115.2.2.el7a.aarch64

            guzheng Gu Zheng (Inactive) added a comment - I back ported Jinshan's new fix patch ( https://review.whamcloud.com/#/c/32265)  to master,  which always sends an real mds close from the client, if the executable file was opened in write mode. But it's strange that, the nfs test always failed on aarch64 client (other arches clients are fine), and I can easy reproduce it on trevis-80vm3. + systemctl restart nfs-server.service + mount -t lustre 10.9.6.161@tcp:/lustre /mnt/lustre + mkdir /mnt/lustre/nfs_test + exportfs -orw,no_root_squash localhost:/mnt/lustre/nfs_test + mount -t nfs localhost:/mnt/lustre/nfs_test /tmp/nfs-60b4UP/ + cp /bin/ true /tmp/nfs-60b4UP/ + /mnt/lustre/nfs_test/ true LU-4398.sh: line 10: /mnt/lustre/nfs_test/ true : Text file busy + umount /tmp/nfs-60b4UP/ + systemctl stop nfs-server.service + rm -rf /mnt/lustre/nfs_test + umount /mnt/lustre kernel: 4.14.0-115.2.2.el7a.aarch64

            Colin, I agree it seems the patches on this ticket have not made any progress lately. It looks like the most recent version of https://review.whamcloud.com/32020 indicate that the patch is causing a regression in sanity-hsm, and needs to be refreshed in any case. It looks like there is also a test case in https://pastebin.com/GHj0rqxT that would be sueful to include.

            adilger Andreas Dilger added a comment - Colin, I agree it seems the patches on this ticket have not made any progress lately. It looks like the most recent version of https://review.whamcloud.com/32020 indicate that the patch is causing a regression in sanity-hsm, and needs to be refreshed in any case. It looks like there is also a test case in https://pastebin.com/GHj0rqxT that would be sueful to include.

            Has this been abandoned?

            cfaber#1 Colin Faber [X] (Inactive) added a comment - Has this been abandoned?

            Jinshan Xiong (jinshan.xiong@gmail.com) uploaded a new patch: https://review.whamcloud.com/32265
            Subject: LU-4398 llite: do not cache write open lock for exec file
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: ba79fe03a7fcdbc1c7e8ef186c987905c2cb6e85

            gerrit Gerrit Updater added a comment - Jinshan Xiong (jinshan.xiong@gmail.com) uploaded a new patch: https://review.whamcloud.com/32265 Subject: LU-4398 llite: do not cache write open lock for exec file Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: ba79fe03a7fcdbc1c7e8ef186c987905c2cb6e85

            People

              guzheng Gu Zheng (Inactive)
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              22 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: