[LU-10457] open_by_handle_at() in write mode triggers ETXTBSY - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: None
Affects Version/s: None
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

If open_by_handle_at() is called in O_WRONLY or O_RDWR mode and then the file descriptor is closed, other lustre clients will still report ETXTBSY.

Example:

On cn16
=======
bschubert@cn16 ~>sudo ~/src/test/open-test /mnt/lustre_client-ES24/bschubert/ime/test7 1
Opened /mnt/lustre_client-ES24/bschubert/ime/test7/test7, fd: 4
Closed d: 4

Now on cn41
=========
bschubert@cn41 ~>/mnt/lustre_client-ES24/bschubert/ime//test7
-bash: /mnt/lustre_client-ES24/bschubert/ime//test7: Text file busy

test7 is just any file which has the the execution bit set.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

open-test.c
2 kB
04/Jan/18 7:42 PM

Issue Links

duplicates

LU-8585 All Lustre test suites should pass with subdirectory mount

Open

is duplicated by

LU-10667 Open by handle and normal open results in incorrect ETXTBSY behavior

Resolved

is related to

LU-12661 sanity test_817: FAIL: failed to execute 'true' command

Open

is related to

LU-4398 mdt_object_open_lock() may not flush conflicting handles

Resolved

Activity

[LU-10457] open_by_handle_at() in write mode triggers ETXTBSY

Bernd Schubert added a comment - 31/May/18 4:34 PM

Hi all, I think there another implication of this issue. Our customer is complaining that quotas are not correctly released. We have basically mostly worked around the ETXTBSY issue, but I don't think we can do anything about quotas on our side.
Looking at the patches, I think this patch https://review.whamcloud.com/32020 will not help, as it will try to release conflicting locks on an O_EXEC attempt. The alternative patch from Pattrick https://review.whamcloud.com/#/c/31304/ should work, as it always sends an mds close from the client, if the file was opened in write mode. Is there any side effect? It should just remove an NFS optimization?

Bernd Schubert added a comment - 31/May/18 4:34 PM Hi all, I think there another implication of this issue. Our customer is complaining that quotas are not correctly released. We have basically mostly worked around the ETXTBSY issue, but I don't think we can do anything about quotas on our side. Looking at the patches, I think this patch https://review.whamcloud.com/32020 will not help, as it will try to release conflicting locks on an O_EXEC attempt. The alternative patch from Pattrick https://review.whamcloud.com/#/c/31304/ should work, as it always sends an mds close from the client, if the file was opened in write mode. Is there any side effect? It should just remove an NFS optimization?

Gu Zheng (Inactive) added a comment - 17/Apr/18 5:19 AM

Hi all, I just resubmit ~~LU-4398~~ (https://review.whamcloud.com/32020) as Jinshan suggested, with it applied, the problem is gone, and with some simple tests, no significant regression found, but still, please feel free to try and test it more, thanks.

Gu Zheng (Inactive) added a comment - 17/Apr/18 5:19 AM Hi all, I just resubmit LU-4398 ( https://review.whamcloud.com/32020 ) as Jinshan suggested, with it applied, the problem is gone, and with some simple tests, no significant regression found, but still, please feel free to try and test it more, thanks.

Patrick Farrell (Inactive) added a comment - 15/Feb/18 4:51 AM

Oleg pointed me at this, I reported a duplicate and contributed a patch and test case:
https://review.whamcloud.com/#/c/31304/

If we limited my patch to executable files as Oleg suggested, that might fit the bill. Curious what others think. I'll refresh tomorrow.

Patrick Farrell (Inactive) added a comment - 15/Feb/18 4:51 AM Oleg pointed me at this, I reported a duplicate and contributed a patch and test case: https://review.whamcloud.com/#/c/31304/ If we limited my patch to executable files as Oleg suggested, that might fit the bill. Curious what others think. I'll refresh tomorrow.

Gu Zheng (Inactive) added a comment - 08/Jan/18 8:15 AM

Seems https://review.whamcloud.com/#/c/9063/ (~~LU-4398~~ mdt: acquire an open lock for write or execute) can resolve the problem, after applied it back on latest master, never reproduced the issue.
[root@vm3 ~]# ./open_test /mnt/lustre/file 1
Opened /mnt/lustre/file/file, fd: 4
Closed d: 4

[root@vm6 ~]# /mnt/lustre/file
hello lustre
[root@vm6 ~]# /mnt/lustre/file
hello lustre
[root@vm6 ~]# /mnt/lustre/file
hello lustre
[root@vm6 ~]# /mnt/lustre/file
hello lustre
[root@vm6 ~]# /mnt/lustre/file
hello lustre

Gu Zheng (Inactive) added a comment - 08/Jan/18 8:15 AM Seems https://review.whamcloud.com/#/c/9063/ ( LU-4398 mdt: acquire an open lock for write or execute) can resolve the problem, after applied it back on latest master, never reproduced the issue. [root@vm3 ~] # ./open_test /mnt/lustre/file 1 Opened /mnt/lustre/file/file, fd: 4 Closed d: 4 [root@vm6 ~] # /mnt/lustre/file hello lustre [root@vm6 ~] # /mnt/lustre/file hello lustre [root@vm6 ~] # /mnt/lustre/file hello lustre [root@vm6 ~] # /mnt/lustre/file hello lustre [root@vm6 ~] # /mnt/lustre/file hello lustre

Li Xi (Inactive) added a comment - 08/Jan/18 1:26 AM

> Which Lustre version is this with?

I was testing on master branch. I guess you are using IEEL3 (2.7)? Something might have been changed between them.

Li Xi (Inactive) added a comment - 08/Jan/18 1:26 AM > Which Lustre version is this with? I was testing on master branch. I guess you are using IEEL3 (2.7)? Something might have been changed between them.

John Hammond added a comment - 05/Jan/18 9:55 PM

This is resolved by https://review.whamcloud.com/#/c/9063/ (~~LU-4398~~ mdt: acquire an open lock for write or execute). But that change was reverted from master due to the metadata performance impact described by DDN in ~~LU-5197~~.

Perhaps the exiting functionality of open leases could be used in the open by handle path to address this issue without incurring the performance drop.

John Hammond added a comment - 05/Jan/18 9:55 PM This is resolved by https://review.whamcloud.com/#/c/9063/ ( LU-4398 mdt: acquire an open lock for write or execute). But that change was reverted from master due to the metadata performance impact described by DDN in LU-5197 . Perhaps the exiting functionality of open leases could be used in the open by handle path to address this issue without incurring the performance drop.

Oleg Drokin added a comment - 05/Jan/18 8:09 PM

Please note that /mnt/lustre and /mnt/lustre 2 are different mountpoints, so they act the same as two different nodes, just more convenient to test.

Oleg Drokin added a comment - 05/Jan/18 8:09 PM Please note that /mnt/lustre and /mnt/lustre 2 are different mountpoints, so they act the same as two different nodes, just more convenient to test.

Bernd Schubert added a comment - 05/Jan/18 9:55 AM

Ah, actually Li was also using two different nodes. Sorry, I only saw server17-el7 and didn't notice the differentiation between -vm1 and -vm3.

Which Lustre version is this with? On the systems I tested with, it would never succeed to execute on the other node, until either

the node the had opened the file would execute it itself
the node the had opened the file would unmount lustre
I would be patient and wait for a very long time (> 30min)

Bernd Schubert added a comment - 05/Jan/18 9:55 AM Ah, actually Li was also using two different nodes. Sorry, I only saw server17-el7 and didn't notice the differentiation between -vm1 and -vm3. Which Lustre version is this with? On the systems I tested with, it would never succeed to execute on the other node, until either the node the had opened the file would execute it itself the node the had opened the file would unmount lustre I would be patient and wait for a very long time (> 30min)

Bernd Schubert added a comment - 05/Jan/18 9:48 AM

Hmm, I can't imagine how that this works as it is supposed to, even in the NFS case. Maybe I should have pointed this out in more detail, but in my initial example I used two different nodes.
For NFS or any other overlay file system, one can expect that there are multiple nodes involved. For NFS users typically would create/modify files on their desktop to later on execute them natively on Lustre.
For the IME use case, the file is opened for multiple reasons in RW mode on the ime server, but users also later on want to use the files natively on Lustre.

Bernd Schubert added a comment - 05/Jan/18 9:48 AM Hmm, I can't imagine how that this works as it is supposed to, even in the NFS case. Maybe I should have pointed this out in more detail, but in my initial example I used two different nodes . For NFS or any other overlay file system, one can expect that there are multiple nodes involved. For NFS users typically would create/modify files on their desktop to later on execute them natively on Lustre. For the IME use case, the file is opened for multiple reasons in RW mode on the ime server, but users also later on want to use the files natively on Lustre.

Oleg Drokin added a comment - 05/Jan/18 4:23 AM

I guess I did not read it far enough, yes there's one ETXTBUSY report due to the open lock.

it appears that the name_to_handle_at/open_by_handle_at use nfs-encoded export operation leading to the nfs detecting logic to trigger so the system sort of operates as designed.

It's going to be tricky to separate real nfs from these users I guess and we don't want the extra lock hit when opening the file. I guess the new downgrade logic might help us here to get a bigger lock and then just drop the unneeded bit.

Oleg Drokin added a comment - 05/Jan/18 4:23 AM I guess I did not read it far enough, yes there's one ETXTBUSY report due to the open lock. it appears that the name_to_handle_at/open_by_handle_at use nfs-encoded export operation leading to the nfs detecting logic to trigger so the system sort of operates as designed. It's going to be tricky to separate real nfs from these users I guess and we don't want the extra lock hit when opening the file. I guess the new downgrade logic might help us here to get a bigger lock and then just drop the unneeded bit.

Oleg Drokin added a comment - 05/Jan/18 4:17 AM

Hm, I tested this on the latest master on rhel7.2 (disregard the centos6 in the hostname) and don't see any problems, what version are you testing on, what kernel:

[root@centos6-16 tests]# cat /tmp/test.sh
#!/bin/bash

cp /bin/ls /mnt/lustre
/mnt/lustre/ls -ld .
/tmp/open-test /mnt/lustre/ls 2

TIME=0
while ! /mnt/lustre2/ls -ld . ; do echo nope ; TIME=$((TIME + 1)) ; sleep 1 ; done

echo Waited $TIME seconds for the open to clear
[root@centos6-16 tests]# bash /tmp/test.sh
drwxrwxr-x 12 green green 12288 Jan  4 01:37 .
Opened /mnt/lustre/ls/ls, fd: 4
Closed d: 4
/tmp/test.sh: line 8: /mnt/lustre2/ls: Text file busy
nope
drwxrwxr-x 12 green green 12288 Jan  4 01:37 .
Waited 1 seconds for the open to clear

Oleg Drokin added a comment - 05/Jan/18 4:17 AM Hm, I tested this on the latest master on rhel7.2 (disregard the centos6 in the hostname) and don't see any problems, what version are you testing on, what kernel: [root@centos6-16 tests]# cat /tmp/test.sh #!/bin/bash cp /bin/ls /mnt/lustre /mnt/lustre/ls -ld . /tmp/open-test /mnt/lustre/ls 2 TIME=0 while ! /mnt/lustre2/ls -ld . ; do echo nope ; TIME=$((TIME + 1)) ; sleep 1 ; done echo Waited $TIME seconds for the open to clear [root@centos6-16 tests]# bash /tmp/test.sh drwxrwxr-x 12 green green 12288 Jan 4 01:37 . Opened /mnt/lustre/ls/ls, fd: 4 Closed d: 4 /tmp/test.sh: line 8: /mnt/lustre2/ls: Text file busy nope drwxrwxr-x 12 green green 12288 Jan 4 01:37 . Waited 1 seconds for the open to clear

People

Assignee:: Oleg Drokin

Reporter:: Diego Moreno (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 04/Jan/18 7:48 PM

Updated:: 09/Jan/20 7:48 PM

Resolved:: 09/Jan/20 7:47 PM