Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10457

open_by_handle_at() in write mode triggers ETXTBSY

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      If open_by_handle_at() is called in O_WRONLY or O_RDWR mode and then the file descriptor is closed, other lustre clients will still report ETXTBSY.

      Example:

      On cn16
      =======
      bschubert@cn16 ~>sudo ~/src/test/open-test /mnt/lustre_client-ES24/bschubert/ime/test7 1
      Opened /mnt/lustre_client-ES24/bschubert/ime/test7/test7, fd: 4
      Closed d: 4

      Now on cn41
      =========
      bschubert@cn41 ~>/mnt/lustre_client-ES24/bschubert/ime//test7
      -bash: /mnt/lustre_client-ES24/bschubert/ime//test7: Text file busy

      test7 is just any file which has the the execution bit set.

      Attachments

        Issue Links

          Activity

            [LU-10457] open_by_handle_at() in write mode triggers ETXTBSY
            green Oleg Drokin added a comment -

            I guess I did not read it far enough, yes there's one ETXTBUSY report due to the open lock.

            it appears that the name_to_handle_at/open_by_handle_at use nfs-encoded export operation leading to the nfs detecting logic to trigger so the system sort of operates as designed.

            It's going to be tricky to separate real nfs from these users I guess and we don't want the extra lock hit when opening the file. I guess the new downgrade logic might help us here to get a bigger lock and then just drop the unneeded bit.

            green Oleg Drokin added a comment - I guess I did not read it far enough, yes there's one ETXTBUSY report due to the open lock. it appears that the name_to_handle_at/open_by_handle_at use nfs-encoded export operation leading to the nfs detecting logic to trigger so the system sort of operates as designed. It's going to be tricky to separate real nfs from these users I guess and we don't want the extra lock hit when opening the file. I guess the new downgrade logic might help us here to get a bigger lock and then just drop the unneeded bit.
            green Oleg Drokin added a comment -

            Hm, I tested this on the latest master on rhel7.2 (disregard the centos6 in the hostname) and don't see any problems, what version are you testing on, what kernel:

            [root@centos6-16 tests]# cat /tmp/test.sh
            #!/bin/bash
            
            cp /bin/ls /mnt/lustre
            /mnt/lustre/ls -ld .
            /tmp/open-test /mnt/lustre/ls 2
            
            TIME=0
            while ! /mnt/lustre2/ls -ld . ; do echo nope ; TIME=$((TIME + 1)) ; sleep 1 ; done
            
            echo Waited $TIME seconds for the open to clear
            [root@centos6-16 tests]# bash /tmp/test.sh
            drwxrwxr-x 12 green green 12288 Jan  4 01:37 .
            Opened /mnt/lustre/ls/ls, fd: 4
            Closed d: 4
            /tmp/test.sh: line 8: /mnt/lustre2/ls: Text file busy
            nope
            drwxrwxr-x 12 green green 12288 Jan  4 01:37 .
            Waited 1 seconds for the open to clear
            
            green Oleg Drokin added a comment - Hm, I tested this on the latest master on rhel7.2 (disregard the centos6 in the hostname) and don't see any problems, what version are you testing on, what kernel: [root@centos6-16 tests]# cat /tmp/test.sh #!/bin/bash cp /bin/ls /mnt/lustre /mnt/lustre/ls -ld . /tmp/open-test /mnt/lustre/ls 2 TIME=0 while ! /mnt/lustre2/ls -ld . ; do echo nope ; TIME=$((TIME + 1)) ; sleep 1 ; done echo Waited $TIME seconds for the open to clear [root@centos6-16 tests]# bash /tmp/test.sh drwxrwxr-x 12 green green 12288 Jan 4 01:37 . Opened /mnt/lustre/ls/ls, fd: 4 Closed d: 4 /tmp/test.sh: line 8: /mnt/lustre2/ls: Text file busy nope drwxrwxr-x 12 green green 12288 Jan 4 01:37 . Waited 1 seconds for the open to clear

            The open-test get the LDLM lock when opening file because ldd->lld_nfs_dentry is set to 1 in ll_iget_for_nfs().

             

            lixi Li Xi (Inactive) added a comment - The open-test get the LDLM lock when opening file because ldd->lld_nfs_dentry is set to 1 in ll_iget_for_nfs().  

            So following things happend in sequence:

            1. open-test got LDLM lock of the file ./test
            2. open-test closed the file, yet do not call ll_md_real_close() because holding the LDLM lock
            3. ./test program tried to open the file with EXEC yet got -ETXTBUSY in mdt_write_deny()
            4. The mdt_blocking_ast() was triggered, it cancels of the LDLM lock got by open-test program, and mdt_mfd_close()/mdt_write_put() is called to release the reference
            5. The next run of open-test can succeed because of step 4.
            lixi Li Xi (Inactive) added a comment - So following things happend in sequence: open-test got LDLM lock of the file ./test open-test closed the file, yet do not call ll_md_real_close() because holding the LDLM lock ./test program tried to open the file with EXEC yet got -ETXTBUSY in mdt_write_deny() The mdt_blocking_ast() was triggered, it cancels of the LDLM lock got by open-test program, and mdt_mfd_close()/mdt_write_put() is called to release the reference The next run of open-test can succeed because of step 4.

            When open-test closing the file, no mdt_mfd_close() is called on MDS side. And the trace on client side confirmed that  ll_md_close() dosn't call ll_md_real_close() because md_lock_match() returns 8. So, this program holds some LDLM lock when closing?

            00000002:00000001:0.0:1515116985.549766:0:22146:0:(mdc_locks.c:149:mdc_lock_match()) Process leaving (rc=8 : 8 : 8)
            00800000:00000001:0.0:1515116985.549766:0:22146:0:(obd_class.h:1740:md_lock_match()) Process leaving (rc=8 : 8 : 8)
            00800000:00000001:0.0:1515116985.549767:0:22146:0:(lmv_obd.c:2965:lmv_lock_match()) Process leaving (rc=8 : 8 : 8)
            00000080:00000001:0.0:1515116985.549768:0:22146:0:(obd_class.h:1740:md_lock_match()) Process leaving (rc=8 : 8 : 8)
            00000080:00000001:0.0:1515116985.549771:0:22146:0:(file.c:314:ll_md_close()) Process leaving (rc=0 : 0 : 0)

            lixi Li Xi (Inactive) added a comment - When open-test closing the file, no mdt_mfd_close() is called on MDS side. And the trace on client side confirmed that  ll_md_close() dosn't call ll_md_real_close() because md_lock_match() returns 8. So, this program holds some LDLM lock when closing? 00000002:00000001:0.0:1515116985.549766:0:22146:0:(mdc_locks.c:149:mdc_lock_match()) Process leaving (rc=8 : 8 : 8) 00800000:00000001:0.0:1515116985.549766:0:22146:0:(obd_class.h:1740:md_lock_match()) Process leaving (rc=8 : 8 : 8) 00800000:00000001:0.0:1515116985.549767:0:22146:0:(lmv_obd.c:2965:lmv_lock_match()) Process leaving (rc=8 : 8 : 8) 00000080:00000001:0.0:1515116985.549768:0:22146:0:(obd_class.h:1740:md_lock_match()) Process leaving (rc=8 : 8 : 8) 00000080:00000001:0.0:1515116985.549771:0:22146:0:(file.c:314:ll_md_close()) Process leaving (rc=0 : 0 : 0)

            As expected, mdt_write_deny() returns -ETXTBUSY when the failure happens

             
            00000004:00000001:0.0:1515116333.246172:0:12336:0:(mdt_open.c:342:mdt_mfd_open()) Process entered
            00000004:00000001:0.0:1515116333.246173:0:12336:0:(mdt_open.c:174:mdt_write_deny()) Process entered
            00000004:00000001:0.0:1515116333.246174:0:12336:0:(mdt_open.c:181:mdt_write_deny()) Process leaving (rc=18446744073709551590 : -26 : ffffffffffffffe6)
            00000004:00000001:0.0:1515116333.246175:0:12336:0:(mdt_open.c:391:mdt_mfd_open()) Process leaving (rc=18446744073709551590 : -26 : ffffffffffffffe6)
            00000004:00000001:0.0:1515116333.246176:0:12336:0:(mdt_open.c:615:mdt_finish_open()) Process leaving (rc=18446744073709551590 : -26 : ffffffffffffffe6)
            00000004:00000001:0.0:1515116333.246178:0:12336:0:(mdt_open.c:1589:mdt_reint_open()) Process leaving

            lixi Li Xi (Inactive) added a comment - As expected, mdt_write_deny() returns -ETXTBUSY when the failure happens   00000004:00000001:0.0:1515116333.246172:0:12336:0:(mdt_open.c:342:mdt_mfd_open()) Process entered 00000004:00000001:0.0:1515116333.246173:0:12336:0:(mdt_open.c:174:mdt_write_deny()) Process entered 00000004:00000001:0.0:1515116333.246174:0:12336:0:(mdt_open.c:181:mdt_write_deny()) Process leaving (rc=18446744073709551590 : -26 : ffffffffffffffe6) 00000004:00000001:0.0:1515116333.246175:0:12336:0:(mdt_open.c:391:mdt_mfd_open()) Process leaving (rc=18446744073709551590 : -26 : ffffffffffffffe6) 00000004:00000001:0.0:1515116333.246176:0:12336:0:(mdt_open.c:615:mdt_finish_open()) Process leaving (rc=18446744073709551590 : -26 : ffffffffffffffe6) 00000004:00000001:0.0:1515116333.246178:0:12336:0:(mdt_open.c:1589:mdt_reint_open()) Process leaving

            I reproduced the problem easily. Interesting, the first run would return ETXTBUSY, but the next run succeeds.

            [root@server17-el7-vm1 LU-10457]# ./open-test /mnt/lustre/file 1
            Opened /mnt/lustre/file/file, fd: 4
            Closed d: 4

            [root@server17-el7-vm3 lustre]# ./file
            -bash: ./file: /bin/bash: bad interpreter: Text file busy
            [root@server17-el7-vm3 lustre]# ./file
            hello!

            lixi Li Xi (Inactive) added a comment - I reproduced the problem easily. Interesting, the first run would return ETXTBUSY, but the next run succeeds. [root@server17-el7-vm1 LU-10457] # ./open-test /mnt/lustre/file 1 Opened /mnt/lustre/file/file, fd: 4 Closed d: 4 [root@server17-el7-vm3 lustre] # ./file -bash: ./file: /bin/bash: bad interpreter: Text file busy [root@server17-el7-vm3 lustre] # ./file hello!

            People

              green Oleg Drokin
              diegom Diego Moreno (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: