[LU-146] executable files on NFS share failed with "Text file busy" when executed Created: 18/Mar/11  Updated: 03/Oct/11  Due: 31/Mar/11  Resolved: 03/Oct/11

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.6
Fix Version/s: Lustre 1.8.7

Type: Bug Priority: Blocker
Reporter: Lai Siyao Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: None
Environment:

Use lustre as exported filesystem for NFSd.


Story Points: 2
Severity: 3
Bugzilla ID: 24,437
Epic: NFS, export, ldlm
Rank (Obsolete): 9725

 Description   

We have exported the lustre filesystem over nfs from one of the lustre clients(say c1). We mount
the nfs filesystem on another node(say n1). On the node n1, we create a executable script file(f1).
We make a copy of this file f1 to f1.cpy. These files run on the nodes n1 as well as c1. But on
other lustre clients, which have the lustre file system mounted, the script file f1.cpy does not
execute where as the file f1 does.

The following is the error message we get when we try the above.

[root@sfsclient8 testfs]# ./hi_world_copy
./hi_world_copy: Text file busy

The same problem will occur if this is compiled binary as well.



 Comments   
Comment by Oleg Drokin [ 18/Mar/11 ]

There is a patch in bz 24437 for that, though I am not entirely happy with it.

Comment by Build Master (Inactive) [ 22/Mar/11 ]

Integrated in reviews-centos5 #535
LU-146 executable files on NFS share failed with "Text file busy"

Lai Siyao : cb21d418e2a56413e47dd70304cd87334902a6a4
Files :

  • lustre/mds/mds_open.c
Comment by Build Master (Inactive) [ 23/Mar/11 ]

Integrated in reviews-centos5 #537
LU-146 executable files on NFS share failed with "Text file busy"

Lai Siyao : d0d5945f8ed6d15bf028a26d02f09cfc69abcf4f
Files :

  • lustre/mds/mds_open.c
Comment by Build Master (Inactive) [ 23/Mar/11 ]

Integrated in reviews-centos5 #545
LU-146 executable files on NFS share failed with "Text file busy"

Lai Siyao : ebea6122a9feca28fb82de066d432dd829a02704
Files :

  • lustre/mds/mds_open.c
Comment by Peter Jones [ 25/Mar/11 ]

Lai

Could you please attach this patch to the bz ticket so that Oracle can land it upstream?

Thanks

Peter

Comment by Lai Siyao [ 25/Mar/11 ]

Yes, Peter.

Comment by Peter Jones [ 29/Mar/11 ]

Lai

Landing permission has been granted by Oracle for this change. Can you please send the patch to lustre-gate-18@sun.com

Thanks

Peter

Comment by Peter Jones [ 30/Mar/11 ]

Lai,

Oracle have landed this fix upstream for 1.8.6. Does the same change need to be made to master?

Peter

Comment by Oleg Drokin [ 30/Mar/11 ]

No, I checked and 2.x is fine, it already always gets the lock.

Comment by Peter Jones [ 30/Mar/11 ]

Great then we can resolve this one

Comment by Cory Spitz [ 12/Aug/11 ]

Vladimir S. thinks that the fix landed to 1.8.6 is breaking lock ordering.

See bz 24437 comment #50.

Comment by Peter Jones [ 12/Aug/11 ]

Lai

Can you please look into the reported issues with this patch as your top priority?

Thanks

Peter

Comment by Oleg Drokin [ 12/Aug/11 ]

When inspecting the original patch the lock ordering did not match what I though would be happening (and not matching what I originally done I think).

Anyway even without that patch the race is there too, just more narrow.

The proper way to address all of this is to drop second child lock getting all the way down and instead add all the conditions before original mds_get_parent_child_locked() call that will do the proper ordering of the parent/child locks, we just need to extend the cases where we request the child lock there.

Comment by Lai Siyao [ 17/Aug/11 ]

Review is on http://review.whamcloud.com/#change,1259

Comment by Vladimir V. Saveliev [ 02/Sep/11 ]

with this patch (http://review.whamcloud.com/#change,1259) racer fails with the below LBUG on MDS:

2011-09-02 07:10:12 LustreError: 3959:0:(handler.c:2512:mds_intent_policy()) ASSERTION(new_lock !=
NULL) failed: op 0x8 lockh 0x0

More details in https://bugzilla.lustre.org/show_bug.cgi?id=24525#c13

Comment by Lai Siyao [ 02/Sep/11 ]

Good catch, the patch may skip fetching child lock if it doesn't find child inode at the first time, see mds_get_parent_child_locked() (mds_reint.c line 1649):

        if (inode == NULL) {
1649                child_lockh = NULL;
1650                goto retry_locks;
1651    }

child_lockh should be set to NULL only for (it_op == IT_OPEN && !(flags & MDS_OPEN_LOCK)). I will commit a patch to review soon.

Comment by Build Master (Inactive) [ 03/Oct/11 ]

Integrated in lustre-b1_8 » i686,client,el5,ofa #131
LU-146 mds_open() may deadlock

Johann Lombardi : bec818434c27bb390b4c8866e73d1afb0dd9e884
Files :

  • lustre/mds/handler.c
  • lustre/mds/mds_reint.c
  • lustre/mds/mds_open.c
  • lustre/mds/mds_internal.h
Comment by Build Master (Inactive) [ 03/Oct/11 ]

Integrated in lustre-b1_8 » x86_64,client,el5,inkernel #131
LU-146 mds_open() may deadlock

Johann Lombardi : bec818434c27bb390b4c8866e73d1afb0dd9e884
Files :

  • lustre/mds/mds_open.c
  • lustre/mds/mds_internal.h
  • lustre/mds/handler.c
  • lustre/mds/mds_reint.c
Comment by Build Master (Inactive) [ 03/Oct/11 ]

Integrated in lustre-b1_8 » x86_64,client,el5,ofa #131
LU-146 mds_open() may deadlock

Johann Lombardi : bec818434c27bb390b4c8866e73d1afb0dd9e884
Files :

  • lustre/mds/mds_reint.c
  • lustre/mds/handler.c
  • lustre/mds/mds_internal.h
  • lustre/mds/mds_open.c
Comment by Build Master (Inactive) [ 03/Oct/11 ]

Integrated in lustre-b1_8 » x86_64,server,el5,ofa #131
LU-146 mds_open() may deadlock

Johann Lombardi : bec818434c27bb390b4c8866e73d1afb0dd9e884
Files :

  • lustre/mds/handler.c
  • lustre/mds/mds_reint.c
  • lustre/mds/mds_open.c
  • lustre/mds/mds_internal.h
Comment by Build Master (Inactive) [ 03/Oct/11 ]

Integrated in lustre-b1_8 » x86_64,client,el6,inkernel #131
LU-146 mds_open() may deadlock

Johann Lombardi : bec818434c27bb390b4c8866e73d1afb0dd9e884
Files :

  • lustre/mds/mds_reint.c
  • lustre/mds/mds_open.c
  • lustre/mds/mds_internal.h
  • lustre/mds/handler.c
Comment by Build Master (Inactive) [ 03/Oct/11 ]

Integrated in lustre-b1_8 » i686,client,el5,inkernel #131
LU-146 mds_open() may deadlock

Johann Lombardi : bec818434c27bb390b4c8866e73d1afb0dd9e884
Files :

  • lustre/mds/handler.c
  • lustre/mds/mds_internal.h
  • lustre/mds/mds_open.c
  • lustre/mds/mds_reint.c
Comment by Build Master (Inactive) [ 03/Oct/11 ]

Integrated in lustre-b1_8 » x86_64,client,ubuntu1004,inkernel #131
LU-146 mds_open() may deadlock

Johann Lombardi : bec818434c27bb390b4c8866e73d1afb0dd9e884
Files :

  • lustre/mds/mds_reint.c
  • lustre/mds/handler.c
  • lustre/mds/mds_open.c
  • lustre/mds/mds_internal.h
Comment by Build Master (Inactive) [ 03/Oct/11 ]

Integrated in lustre-b1_8 » i686,client,el6,inkernel #131
LU-146 mds_open() may deadlock

Johann Lombardi : bec818434c27bb390b4c8866e73d1afb0dd9e884
Files :

  • lustre/mds/mds_reint.c
  • lustre/mds/mds_internal.h
  • lustre/mds/handler.c
  • lustre/mds/mds_open.c
Comment by Build Master (Inactive) [ 03/Oct/11 ]

Integrated in lustre-b1_8 » i686,server,el5,ofa #131
LU-146 mds_open() may deadlock

Johann Lombardi : bec818434c27bb390b4c8866e73d1afb0dd9e884
Files :

  • lustre/mds/mds_internal.h
  • lustre/mds/mds_open.c
  • lustre/mds/mds_reint.c
  • lustre/mds/handler.c
Comment by Build Master (Inactive) [ 03/Oct/11 ]

Integrated in lustre-b1_8 » x86_64,server,el5,inkernel #131
LU-146 mds_open() may deadlock

Johann Lombardi : bec818434c27bb390b4c8866e73d1afb0dd9e884
Files :

  • lustre/mds/mds_open.c
  • lustre/mds/mds_reint.c
  • lustre/mds/handler.c
  • lustre/mds/mds_internal.h
Comment by Build Master (Inactive) [ 03/Oct/11 ]

Integrated in lustre-b1_8 » i686,server,el5,inkernel #131
LU-146 mds_open() may deadlock

Johann Lombardi : bec818434c27bb390b4c8866e73d1afb0dd9e884
Files :

  • lustre/mds/mds_reint.c
  • lustre/mds/mds_open.c
  • lustre/mds/handler.c
  • lustre/mds/mds_internal.h
Comment by Peter Jones [ 03/Oct/11 ]

Landed for 1.8.7. Not needed for 2.x.

Generated at Sat Feb 10 01:04:12 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.