Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10457

open_by_handle_at() in write mode triggers ETXTBSY

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      If open_by_handle_at() is called in O_WRONLY or O_RDWR mode and then the file descriptor is closed, other lustre clients will still report ETXTBSY.

      Example:

      On cn16
      =======
      bschubert@cn16 ~>sudo ~/src/test/open-test /mnt/lustre_client-ES24/bschubert/ime/test7 1
      Opened /mnt/lustre_client-ES24/bschubert/ime/test7/test7, fd: 4
      Closed d: 4

      Now on cn41
      =========
      bschubert@cn41 ~>/mnt/lustre_client-ES24/bschubert/ime//test7
      -bash: /mnt/lustre_client-ES24/bschubert/ime//test7: Text file busy

      test7 is just any file which has the the execution bit set.

      Attachments

        Issue Links

          Activity

            [LU-10457] open_by_handle_at() in write mode triggers ETXTBSY

            This might have been fixed by patch https://review.whamcloud.com/36641 "LU-8585 llite: don't cache MDS_OPEN_LOCK for volatile files".

            adilger Andreas Dilger added a comment - This might have been fixed by patch https://review.whamcloud.com/36641 " LU-8585 llite: don't cache MDS_OPEN_LOCK for volatile files ".
            green Oleg Drokin added a comment -

            the unlink issue is somewhat trivially fixable by just ensuring that unlink also revokes open bit (we already revoke the lookup bit). The slowdown would only happen for files that actually have openlock cached on a client.

            green Oleg Drokin added a comment - the unlink issue is somewhat trivially fixable by just ensuring that unlink also revokes open bit (we already revoke the lookup bit). The slowdown would only happen for files that actually have openlock cached on a client.

            I was missing test of quota issue. As Bernd mentioned above, at the end, we might need Patrick's patch to solve both problems...

            ihara Shuichi Ihara (Inactive) added a comment - I was missing test of quota issue. As Bernd mentioned above, at the end, we might need Patrick's patch to solve both problems...

            similar fix in https://review.whamcloud.com/32265 for LU-4398 and I've already confimred patch solves ETXTBSY issue without performance regressions.

            Here is test resutls

            [root@c01 ~]#  cat /scratch1/test 
            echo hello
            
            [root@c02 ~]# /scratch1/a.out /scratch1/test 1
            Opened /scratch1/test/test, fd: 4
            Closed d: 4
            
            [root@c01 ~]# /scratch1/test 
            hello
            [root@c01 ~]# /scratch1/test 
            hello
            [root@c01 ~]# /scratch1/test 
            hello
            
            16 client, 256 process (mdtest -n 2000 -u -vv -F -d /scratch1/mdtest.out/ -i 3 -p 10)
            
            master
            SUMMARY: (of 3 iterations)
               Operation                      Max            Min           Mean        Std Dev
               ---------                      ---            ---           ----        -------
               File creation     :      89389.726      78571.859      83605.731       4448.114
               File stat         :     263650.433     221026.947     238222.946      18348.631
               File read         :     113141.781     111882.782     112494.749        514.582
               File removal      :     121785.424     109912.532     114749.674       5090.305
               Tree creation     :        204.674         27.510        140.893         80.383
               Tree removal      :         30.096         28.858         29.363          0.531
            V-1: Entering timestamp...
            
            master + patch https://review.whamcloud.com/32265
            SUMMARY: (of 3 iterations)
               Operation                      Max            Min           Mean        Std Dev
               ---------                      ---            ---           ----        -------
               File creation     :      84851.987      82996.117      83819.233        772.021
               File stat         :     262610.064     215623.544     244595.023      20687.611
               File read         :     115494.747     112069.322     113774.145       1398.468
               File removal      :     121395.003     115276.620     118372.989       2498.373
               Tree creation     :        223.484         65.453        156.498         66.722
               Tree removal      :         28.673         17.037         22.827          4.751
            V-1: Entering timestamp...
            
            ihara Shuichi Ihara (Inactive) added a comment - similar fix in https://review.whamcloud.com/32265 for LU-4398 and I've already confimred patch solves ETXTBSY issue without performance regressions. Here is test resutls [root@c01 ~]# cat /scratch1/test echo hello [root@c02 ~]# /scratch1/a.out /scratch1/test 1 Opened /scratch1/test/test, fd: 4 Closed d: 4 [root@c01 ~]# /scratch1/test hello [root@c01 ~]# /scratch1/test hello [root@c01 ~]# /scratch1/test hello 16 client, 256 process (mdtest -n 2000 -u -vv -F -d /scratch1/mdtest.out/ -i 3 -p 10) master SUMMARY: (of 3 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 89389.726 78571.859 83605.731 4448.114 File stat : 263650.433 221026.947 238222.946 18348.631 File read : 113141.781 111882.782 112494.749 514.582 File removal : 121785.424 109912.532 114749.674 5090.305 Tree creation : 204.674 27.510 140.893 80.383 Tree removal : 30.096 28.858 29.363 0.531 V-1: Entering timestamp... master + patch https://review.whamcloud.com/32265 SUMMARY: (of 3 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 84851.987 82996.117 83819.233 772.021 File stat : 262610.064 215623.544 244595.023 20687.611 File read : 115494.747 112069.322 113774.145 1398.468 File removal : 121395.003 115276.620 118372.989 2498.373 Tree creation : 223.484 65.453 156.498 66.722 Tree removal : 28.673 17.037 22.827 4.751 V-1: Entering timestamp...

            Ah.  Interesting, I see your point.  Yes, that seems possible.

            paf Patrick Farrell (Inactive) added a comment - Ah.  Interesting, I see your point.  Yes, that seems possible.

            Without any prove if the root cause of LU-10457 is also the root of our quota issue - I would assume that a file that gets unlinked, but that is still open on the MDS will not release the data and so also will not release quotas.

            aakef Bernd Schubert added a comment - Without any prove if the root cause of LU-10457 is also the root of our quota issue - I would assume that a file that gets unlinked, but that is still open on the MDS will not release the data and so also will not release quotas.

            It's a pretty significant optimization, but that aside, yes, I think it should be OK.

            We have (at least for now) decided to live with the problem.

            Why do you think this affects quotas, though?  I'm a little puzzled about the connection?

            paf Patrick Farrell (Inactive) added a comment - It's a pretty significant optimization, but that aside, yes, I think it should be OK. We have (at least for now) decided to live with the problem. Why do you think this affects quotas, though?  I'm a little puzzled about the connection?

            Hi all, I think there another implication of this issue. Our customer is complaining that quotas are not correctly released. We have basically mostly worked around the ETXTBSY issue, but I don't think we can do anything about quotas on our side.
            Looking at the patches, I think this patch https://review.whamcloud.com/32020 will not help, as it will try to release conflicting locks on an O_EXEC attempt. The alternative patch from Pattrick  https://review.whamcloud.com/#/c/31304/ should work, as it always sends an mds close from the client, if the file was opened in write mode. Is there any side effect? It should just remove an NFS optimization?

            aakef Bernd Schubert added a comment - Hi all, I think there another implication of this issue. Our customer is complaining that quotas are not correctly released. We have basically mostly worked around the ETXTBSY issue, but I don't think we can do anything about quotas on our side. Looking at the patches, I think  this patch https://review.whamcloud.com/32020  will not help, as it will try to release conflicting locks on an O_EXEC attempt. The alternative patch from Pattrick   https://review.whamcloud.com/#/c/31304/  should work, as it always sends an mds close from the client, if the file was opened in write mode. Is there any side effect? It should just remove an NFS optimization?

            Hi all, I just resubmit LU-4398 (https://review.whamcloud.com/32020) as Jinshan suggested, with it applied, the problem is gone, and with some simple tests, no significant regression found, but still, please feel free to try and test it more, thanks.

            cengku9660 Gu Zheng (Inactive) added a comment - Hi all, I just resubmit  LU-4398 ( https://review.whamcloud.com/32020 ) as Jinshan suggested, with it applied, the problem is gone, and with some simple tests, no significant regression found, but still, please feel free to try and test it more, thanks.

            Oleg pointed me at this, I reported a duplicate and contributed a patch and test case:
            https://review.whamcloud.com/#/c/31304/

            If we limited my patch to executable files as Oleg suggested, that might fit the bill. Curious what others think. I'll refresh tomorrow.

            paf Patrick Farrell (Inactive) added a comment - Oleg pointed me at this, I reported a duplicate and contributed a patch and test case: https://review.whamcloud.com/#/c/31304/ If we limited my patch to executable files as Oleg suggested, that might fit the bill. Curious what others think. I'll refresh tomorrow.

            People

              green Oleg Drokin
              diegom Diego Moreno (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: