Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10457

open_by_handle_at() in write mode triggers ETXTBSY

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      If open_by_handle_at() is called in O_WRONLY or O_RDWR mode and then the file descriptor is closed, other lustre clients will still report ETXTBSY.

      Example:

      On cn16
      =======
      bschubert@cn16 ~>sudo ~/src/test/open-test /mnt/lustre_client-ES24/bschubert/ime/test7 1
      Opened /mnt/lustre_client-ES24/bschubert/ime/test7/test7, fd: 4
      Closed d: 4

      Now on cn41
      =========
      bschubert@cn41 ~>/mnt/lustre_client-ES24/bschubert/ime//test7
      -bash: /mnt/lustre_client-ES24/bschubert/ime//test7: Text file busy

      test7 is just any file which has the the execution bit set.

      Attachments

        Issue Links

          Activity

            [LU-10457] open_by_handle_at() in write mode triggers ETXTBSY
            green Oleg Drokin added a comment -

            Hm, I tested this on the latest master on rhel7.2 (disregard the centos6 in the hostname) and don't see any problems, what version are you testing on, what kernel:

            [root@centos6-16 tests]# cat /tmp/test.sh
            #!/bin/bash
            
            cp /bin/ls /mnt/lustre
            /mnt/lustre/ls -ld .
            /tmp/open-test /mnt/lustre/ls 2
            
            TIME=0
            while ! /mnt/lustre2/ls -ld . ; do echo nope ; TIME=$((TIME + 1)) ; sleep 1 ; done
            
            echo Waited $TIME seconds for the open to clear
            [root@centos6-16 tests]# bash /tmp/test.sh
            drwxrwxr-x 12 green green 12288 Jan  4 01:37 .
            Opened /mnt/lustre/ls/ls, fd: 4
            Closed d: 4
            /tmp/test.sh: line 8: /mnt/lustre2/ls: Text file busy
            nope
            drwxrwxr-x 12 green green 12288 Jan  4 01:37 .
            Waited 1 seconds for the open to clear
            
            green Oleg Drokin added a comment - Hm, I tested this on the latest master on rhel7.2 (disregard the centos6 in the hostname) and don't see any problems, what version are you testing on, what kernel: [root@centos6-16 tests]# cat /tmp/test.sh #!/bin/bash cp /bin/ls /mnt/lustre /mnt/lustre/ls -ld . /tmp/open-test /mnt/lustre/ls 2 TIME=0 while ! /mnt/lustre2/ls -ld . ; do echo nope ; TIME=$((TIME + 1)) ; sleep 1 ; done echo Waited $TIME seconds for the open to clear [root@centos6-16 tests]# bash /tmp/test.sh drwxrwxr-x 12 green green 12288 Jan 4 01:37 . Opened /mnt/lustre/ls/ls, fd: 4 Closed d: 4 /tmp/test.sh: line 8: /mnt/lustre2/ls: Text file busy nope drwxrwxr-x 12 green green 12288 Jan 4 01:37 . Waited 1 seconds for the open to clear

            The open-test get the LDLM lock when opening file because ldd->lld_nfs_dentry is set to 1 in ll_iget_for_nfs().

             

            lixi Li Xi (Inactive) added a comment - The open-test get the LDLM lock when opening file because ldd->lld_nfs_dentry is set to 1 in ll_iget_for_nfs().  

            So following things happend in sequence:

            1. open-test got LDLM lock of the file ./test
            2. open-test closed the file, yet do not call ll_md_real_close() because holding the LDLM lock
            3. ./test program tried to open the file with EXEC yet got -ETXTBUSY in mdt_write_deny()
            4. The mdt_blocking_ast() was triggered, it cancels of the LDLM lock got by open-test program, and mdt_mfd_close()/mdt_write_put() is called to release the reference
            5. The next run of open-test can succeed because of step 4.
            lixi Li Xi (Inactive) added a comment - So following things happend in sequence: open-test got LDLM lock of the file ./test open-test closed the file, yet do not call ll_md_real_close() because holding the LDLM lock ./test program tried to open the file with EXEC yet got -ETXTBUSY in mdt_write_deny() The mdt_blocking_ast() was triggered, it cancels of the LDLM lock got by open-test program, and mdt_mfd_close()/mdt_write_put() is called to release the reference The next run of open-test can succeed because of step 4.

            When open-test closing the file, no mdt_mfd_close() is called on MDS side. And the trace on client side confirmed that  ll_md_close() dosn't call ll_md_real_close() because md_lock_match() returns 8. So, this program holds some LDLM lock when closing?

            00000002:00000001:0.0:1515116985.549766:0:22146:0:(mdc_locks.c:149:mdc_lock_match()) Process leaving (rc=8 : 8 : 8)
            00800000:00000001:0.0:1515116985.549766:0:22146:0:(obd_class.h:1740:md_lock_match()) Process leaving (rc=8 : 8 : 8)
            00800000:00000001:0.0:1515116985.549767:0:22146:0:(lmv_obd.c:2965:lmv_lock_match()) Process leaving (rc=8 : 8 : 8)
            00000080:00000001:0.0:1515116985.549768:0:22146:0:(obd_class.h:1740:md_lock_match()) Process leaving (rc=8 : 8 : 8)
            00000080:00000001:0.0:1515116985.549771:0:22146:0:(file.c:314:ll_md_close()) Process leaving (rc=0 : 0 : 0)

            lixi Li Xi (Inactive) added a comment - When open-test closing the file, no mdt_mfd_close() is called on MDS side. And the trace on client side confirmed that  ll_md_close() dosn't call ll_md_real_close() because md_lock_match() returns 8. So, this program holds some LDLM lock when closing? 00000002:00000001:0.0:1515116985.549766:0:22146:0:(mdc_locks.c:149:mdc_lock_match()) Process leaving (rc=8 : 8 : 8) 00800000:00000001:0.0:1515116985.549766:0:22146:0:(obd_class.h:1740:md_lock_match()) Process leaving (rc=8 : 8 : 8) 00800000:00000001:0.0:1515116985.549767:0:22146:0:(lmv_obd.c:2965:lmv_lock_match()) Process leaving (rc=8 : 8 : 8) 00000080:00000001:0.0:1515116985.549768:0:22146:0:(obd_class.h:1740:md_lock_match()) Process leaving (rc=8 : 8 : 8) 00000080:00000001:0.0:1515116985.549771:0:22146:0:(file.c:314:ll_md_close()) Process leaving (rc=0 : 0 : 0)

            As expected, mdt_write_deny() returns -ETXTBUSY when the failure happens

             
            00000004:00000001:0.0:1515116333.246172:0:12336:0:(mdt_open.c:342:mdt_mfd_open()) Process entered
            00000004:00000001:0.0:1515116333.246173:0:12336:0:(mdt_open.c:174:mdt_write_deny()) Process entered
            00000004:00000001:0.0:1515116333.246174:0:12336:0:(mdt_open.c:181:mdt_write_deny()) Process leaving (rc=18446744073709551590 : -26 : ffffffffffffffe6)
            00000004:00000001:0.0:1515116333.246175:0:12336:0:(mdt_open.c:391:mdt_mfd_open()) Process leaving (rc=18446744073709551590 : -26 : ffffffffffffffe6)
            00000004:00000001:0.0:1515116333.246176:0:12336:0:(mdt_open.c:615:mdt_finish_open()) Process leaving (rc=18446744073709551590 : -26 : ffffffffffffffe6)
            00000004:00000001:0.0:1515116333.246178:0:12336:0:(mdt_open.c:1589:mdt_reint_open()) Process leaving

            lixi Li Xi (Inactive) added a comment - As expected, mdt_write_deny() returns -ETXTBUSY when the failure happens   00000004:00000001:0.0:1515116333.246172:0:12336:0:(mdt_open.c:342:mdt_mfd_open()) Process entered 00000004:00000001:0.0:1515116333.246173:0:12336:0:(mdt_open.c:174:mdt_write_deny()) Process entered 00000004:00000001:0.0:1515116333.246174:0:12336:0:(mdt_open.c:181:mdt_write_deny()) Process leaving (rc=18446744073709551590 : -26 : ffffffffffffffe6) 00000004:00000001:0.0:1515116333.246175:0:12336:0:(mdt_open.c:391:mdt_mfd_open()) Process leaving (rc=18446744073709551590 : -26 : ffffffffffffffe6) 00000004:00000001:0.0:1515116333.246176:0:12336:0:(mdt_open.c:615:mdt_finish_open()) Process leaving (rc=18446744073709551590 : -26 : ffffffffffffffe6) 00000004:00000001:0.0:1515116333.246178:0:12336:0:(mdt_open.c:1589:mdt_reint_open()) Process leaving

            I reproduced the problem easily. Interesting, the first run would return ETXTBUSY, but the next run succeeds.

            [root@server17-el7-vm1 LU-10457]# ./open-test /mnt/lustre/file 1
            Opened /mnt/lustre/file/file, fd: 4
            Closed d: 4

            [root@server17-el7-vm3 lustre]# ./file
            -bash: ./file: /bin/bash: bad interpreter: Text file busy
            [root@server17-el7-vm3 lustre]# ./file
            hello!

            lixi Li Xi (Inactive) added a comment - I reproduced the problem easily. Interesting, the first run would return ETXTBUSY, but the next run succeeds. [root@server17-el7-vm1 LU-10457] # ./open-test /mnt/lustre/file 1 Opened /mnt/lustre/file/file, fd: 4 Closed d: 4 [root@server17-el7-vm3 lustre] # ./file -bash: ./file: /bin/bash: bad interpreter: Text file busy [root@server17-el7-vm3 lustre] # ./file hello!

            People

              green Oleg Drokin
              diegom Diego Moreno (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: