[LU-872] 'Text file busy' error when creating executable on NFS share and then running it on Lustre node. Created: 21/Nov/11  Updated: 07/Dec/11  Resolved: 07/Dec/11

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.7
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Evgeny Repekto Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: None

Attachments: File repr.sh    
Severity: 3
Epic: export, ldlm, nfs
Rank (Obsolete): 6517

 Description   

We have exported Lustre file system via NFS (let's say from n1).

The attached script 'repr.sh' creates the executable 'test.sh' on the Lustre folder, then connects to n1 by ssh and executes the created script 'test.sh' on n1.

The script 'repr.sh' always results with the error "bash: /folder_on_lustre/test.sh: Text file busy"

Reproducing script:
cd /folder_on_lustre
rm -rf test.sh
echo "ls -l" >> test.sh
chmod +x test.sh
echo "ls" >> test.sh
ssh -o 'StrictHostKeyChecking no' n1 '/folder_on_lustre/test.sh'



 Comments   
Comment by Peter Jones [ 22/Nov/11 ]

Lai

Could you please look at this one?

Thanks

Peter

Comment by Lai Siyao [ 05/Dec/11 ]
[root@vivaldi tests]# cat /tmp/repr.sh 
cd /mnt/lustre
rm -rf test.sh
echo "ls -l" >> test.sh
chmod +x test.sh
echo "ls" >> test.sh
ssh chopin '/mnt/lustre/test.sh'
[root@vivaldi tests]# sh /tmp/repr.sh 
root@chopin's password: 
total 56
-rw-------. 1 root root  1891 Nov 30 05:35 anaconda-ks.cfg
-rw-r--r--. 1 root root 40354 Nov 30 05:35 install.log
-rw-r--r--. 1 root root  8168 Nov 30 05:34 install.log.syslog
anaconda-ks.cfg
install.log
install.log.syslog

I tested on my environment, it could pass. Evgeny, could you verify http://review.whamcloud.com/#change,1259 is included in your lustre code?

Comment by Evgeny Repekto [ 05/Dec/11 ]

Yes, we are using this fix as a part of 1.8.7 release. But if it matters, it is deployed on n1 only, on other Lustre nodes we use 1.8.2

The difference I see in your scenario output is that we use password-less connection by ssh.

Comment by Lai Siyao [ 06/Dec/11 ]

I made ssh password-less, and it showed the same result:

[root@vivaldi tests]# mount|grep /mnt/lustre
vivaldi@tcp:/lustre on /mnt/lustre type lustre (rw,user_xattr,acl,flock)
[root@vivaldi tests]# ssh chopin mount|grep /mnt/lustre
vivaldi:/mnt/lustre on /mnt/lustre type nfs (rw,nolock,addr=192.168.111.129)
[root@vivaldi tests]# cat /tmp/repr.sh 
cd /mnt/lustre
rm -rf test.sh
echo "ls -l" >> test.sh
chmod +x test.sh
echo "ls" >> test.sh
ssh chopin '/mnt/lustre/test.sh'
[root@vivaldi tests]# sh /tmp/repr.sh
total 56
-rw-------. 1 root root  1891 Nov 30 05:35 anaconda-ks.cfg
-rw-r--r--. 1 root root 40354 Nov 30 05:35 install.log
-rw-r--r--. 1 root root  8168 Nov 30 05:34 install.log.syslog
anaconda-ks.cfg
install.log
install.log.syslog

I want to verify one thing: on node 'n1' is /folder_on_lustre a NFS mountpoint or a Lustre mountpoint? In your test did you use any NFS share?

Comment by Evgeny Repekto [ 06/Dec/11 ]

Sorry if I confused you.

Clarifying our case:

1) n1 - is Lustre node of version 1.8.7 containing folder /folder_on_lustre.
2) besides n1 this folder is also contained on other Lustre nodes of version 1.8.2 (we specifically deployed Lustre 1.8.7 on n1 to see if resolution of LU-146 fixes our problem).
3) c1 - is the remote client mounting Lustre folder '/folder_on_lustre' from n1 by NFS to the mountpoint with the same name , i.e. '/folder_on_lustre'
4) when we run repr.sh on c1 (actually from /folder_on_lustre) we get the described behaviour.
5) this is what we have in /etc/fstab on c1:
n1_ip:/folder_on_lustre /folder_on_lustre nfs rw,hard,intr 0 0

Comment by Lai Siyao [ 07/Dec/11 ]

Evgeny, I tested the same as you said. I'm afraid you didn't upgrade MDS to Lustre 1.8.7; LU-146 is a fix for MDS, upgrading client only won't help.

Comment by Evgeny Repekto [ 07/Dec/11 ]

Lai,

I made sure with our IT guys and they confirmed that n1 was not our MDS. We'll test this issue as only we get our MDS upgraded to 1.8.7

I think this ticket may be closed. If it reproduces after upgrade I'll reopen it (I think I have the permission to?).

Thank you and sorry for bother.

Comment by Peter Jones [ 07/Dec/11 ]

No problem.

Generated at Sat Feb 10 01:11:12 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.