[LU-2277] Text file busy Created: 05/Nov/12  Updated: 17/Mar/14  Resolved: 11/Mar/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.3
Fix Version/s: Lustre 2.1.5

Type: Bug Priority: Critical
Reporter: Mahmoud Hanafi Assignee: Nathaniel Clark
Resolution: Fixed Votes: 0
Labels: None
Environment:

server-2.1.3-1nas (centos6.3)
client 2.1.3-1nas Suse11-1


Attachments: Text File client.inode.dlmtrace.debug     Text File debug.duringcompile.gz     Text File debug.duringldd.gz     File mds.inode.dlmtrace.debug     Text File simple-2277.c    
Issue Links:
Duplicate
duplicates LU-1571 It should not update last xid for rep... Resolved
duplicates LU-971 Re-sent open requests cause mod_count... Closed
Severity: 3
Rank (Obsolete): 5443

 Description   
  1. ldd a.out
    /usr/bin/ldd: line 118: ./a.out: Text file busy

see attached client and server debug traces.



 Comments   
Comment by Oleg Drokin [ 05/Nov/12 ]

This is supposed to happen when somebody holds a file opened for write.

How was a.out produced? Was somebody holding it open still?

Comment by Mahmoud Hanafi [ 06/Nov/12 ]

a.out was compiled in placed. It seems that error only occers on some of the clients.

Comment by Oleg Drokin [ 06/Nov/12 ]

Is there any NFS reexport going on by any chance?
Was the file created via NFS in particular? Do the clients where you are able to execute is on NFS and ones that fail are on lustre?

I also see that you have standard debug enabled, if you can still reproduce this and there is no NFS involved, please increase debug level on your mds as follows:
cat /proc/sys/lnet/debug
remember the value it gave you

try to minimize all other activity oing on on the cluster

echo -1 >/proc/sys/lnet debug ; echo - trace >/proc/sys/lnet/debug

do the reproducer

  1. important to run the lctl below as quickly as possible after the reproducer
    lctl dk >/tmp/lustre/log

echo "YOUR SAVED VALUE FROM ABOVE" >/proc/sys/lnet/debug

Thanks

Comment by Mahmoud Hanafi [ 07/Nov/12 ]

The Lustre filesyste is not reexported via nfs. There NFS mounts on the client for home dir, etc.

I was able to track this issue to the stripe width of the directory. If the stripe width is > 54 it will reproduce the error.

how it was reproduced.
mhanafi@pfe1:/nobackupp5/mhanafi/60> cat test.c++
#include <stdio.h>
int
main(int argc, char *argv[])
{
return 0;
}

mhanafi@pfe1:/nobackupp5/mhanafi/60> icpc -O test.c++
mhanafi@pfe1:/nobackupp5/mhanafi/60> ldd a.out
/usr/bin/ldd: line 118: ./a.out: Text file busy

see attached debug logs.

Comment by Nathaniel Clark [ 07/Dec/12 ]

Have reproduction case:
1 Client - CentOS 6.3 - Lustre 2.1.3-2.6.32_279.2.1.el6.x86_64
gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC)
1 MDS - CentOS 6.3 - Lustre 2.1.3-2.6.32_279.2.1.el6_lustre.gc46c389.x86_64 (1 mds/mgs partition)
1 OSS - CentOS 6.3 - Lustre 2.1.3-2.6.32_279.2.1.el6_lustre.gc46c389.x86_64 (60 osts on loop devices)

All TCP interconnect
All Server partitions created & mounted with standard autotest tools (auster -c PATH/TO/custom.sh sanity.sh --only MOUNT)

In mounted FS:
cd /mnt/lustre
mkdir 58
lfs setstripe -c 58 58
cd 58
cat << EOF > test.c
#include <stdio.h>
int
main(int argc, char *argv[])
{
return 0;
}
EOF
gcc test.c
./a.out

Expected Results:
Should run w/o error. For directories with a strip width less than 54, there is no error.

Actual Results:
For stripe widths of 54 and larger (at least up to 60) the following error results:
./a.out: Text file busy

Other Test Results:

  • If the filesystem is remounted then all a.out's will run correctly.
  • If any a.out is copied (either w/in or between directories, regardless of stripe width) it will run fine
  • If "bad" a.out is moved it will still illicit the same error

With 2 Clients (both CentOS 6.3 - Lustre 2.1.3-2.6.32_279.2.1.el6.x86_64)
1) First clients generates "bad" a.out
2) Second client mounts FS
3) 2nd client gets error running a.out
4) First client unmounts FS
5) 2nd Client no longer gets error running a.out

WAG of cause:
Client 1 when compiling test.c somehow continues to have a writecount ref (mot_writecount > 0) after compilation finishes. This ref is cleaned up when the client fully disconnects.

Comment by Nathaniel Clark [ 10/Dec/12 ]

Attached is C File which can recreate this bug in directories with a default stripe width greater than 53, by just doing a modified copy of an executable.

What causes the executable to be in a bad state is the following order of events:
1. Create file (O_RDWR or O_WRONLY)
2. write to file
3. close file (up to here, everything is fine – this is just a copy)
4. open file (O_RDWR or O_WRONLY) - O_RDONLY do not cause this issue
5. close file

Comment by Nathaniel Clark [ 13/Dec/12 ]

After more careful inspection of the logs, it appears that in the failing cases, the second open is being processed twice:

This is log lines that match the following grep expression: '((mdt_close(|mdt_mfd_open).*leaving|incoming)' from logs that are of a double open copy to a directory with a given stripe size:

53
(events.c:284:request_in_callback()) incoming req@ffff8800189bc850 x1421171244404975 msgsize 192
(events.c:284:request_in_callback()) incoming req@ffff8800189bc050 x1421171249691684 msgsize 192
(events.c:284:request_in_callback()) incoming req@ffff880014590800 x1421167733816518 msgsize 544
(mdt_open.c:721:mdt_mfd_open()) Process leaving (rc=0 : 0 : 0)
(events.c:284:request_in_callback()) incoming req@ffff880012109400 x1421167733816520 msgsize 360
(mdt_open.c:1645:mdt_close()) Process leaving (rc=0 : 0 : 0)
(events.c:284:request_in_callback()) incoming req@ffff880015236050 x1421167733816521 msgsize 544
(events.c:284:request_in_callback()) incoming req@ffff880012043000 x1421167733816522 msgsize 544
(mdt_open.c:721:mdt_mfd_open()) Process leaving (rc=0 : 0 : 0)
(events.c:284:request_in_callback()) incoming req@ffff8800152ca050 x1421167733816523 msgsize 360
(mdt_open.c:1645:mdt_close()) Process leaving (rc=0 : 0 : 0)
(events.c:284:request_in_callback()) incoming req@ffff8800189bd850 x1421167733816525 msgsize 192
54
(events.c:284:request_in_callback()) incoming req@ffff8800189bd050 x1421167797775378 msgsize 192
(events.c:284:request_in_callback()) incoming req@ffff880012109400 x1421167797775379 msgsize 192
(events.c:284:request_in_callback()) incoming req@ffff8800189be850 x1421171244404976 msgsize 192
(events.c:284:request_in_callback()) incoming req@ffff880014590800 x1421167733816584 msgsize 544
(mdt_open.c:721:mdt_mfd_open()) Process leaving (rc=0 : 0 : 0)
(events.c:284:request_in_callback()) incoming req@ffff8800148c9800 x1421167733816584 msgsize 544
(events.c:284:request_in_callback()) incoming req@ffff88001458dc00 x1421167733816586 msgsize 360
(mdt_open.c:1645:mdt_close()) Process leaving (rc=0 : 0 : 0)
(events.c:284:request_in_callback()) incoming req@ffff880015238850 x1421167733816587 msgsize 544
(events.c:284:request_in_callback()) incoming req@ffff880014689800 x1421167733816588 msgsize 544
(mdt_open.c:721:mdt_mfd_open()) Process leaving (rc=0 : 0 : 0)
(events.c:284:request_in_callback()) incoming req@ffff880012109400 x1421167733816588 msgsize 544
(mdt_open.c:721:mdt_mfd_open()) Process leaving (rc=0 : 0 : 0)
(events.c:284:request_in_callback()) incoming req@ffff880014590800 x1421167733816589 msgsize 360
(mdt_open.c:1645:mdt_close()) Process leaving (rc=0 : 0 : 0)

Notice the repeated x1421167733816588 in the 54 stripe case with accompanying mdt_mfd_open but missing mdt_close(). mdt_close is skipped. This means that a request is being processed twice.

Comment by Nathaniel Clark [ 17/Dec/12 ]

Note on what goes over the wire:
Items listed are wireshark dump of traffic, all items are requests client -> mds, all items are replied to prior to the next item being sent.

For stripe sizes 50-53: (example is stripe 50)
1) LDLM_ENQUEUE request [Concurrent Write][ intent : open create ] filename : a2
2) MDS_CLOSE [REINT_SETATTR]
3) LDLM_ENQUEUE request [Concurrent Read][ intent : lookup ] filename : 50
4) LDLM_ENQUEUE request [Concurrent Read][ intent : open ][REINT_OPEN] filename : a2
5) MDS_CLOSE [REINT_SETATTR]

For stripe sizes 54-59: (example is stripe 54)
1) LDLM_ENQUEUE request [Concurrent Write][ intent : open create ] filename : a2
2) LDLM_ENQUEUE request [Concurrent Write][ intent : open create ] filename : a2
3) MDS_CLOSE [REINT_SETATTR]
4) LDLM_ENQUEUE request [Concurrent Read][ intent : lookup ] filename : 54
5) LDLM_ENQUEUE request [Concurrent Read][ intent : open ] filename : a2
6) LDLM_ENQUEUE request [Concurrent Read][ intent : open ] filename : a2
7) MDS_CLOSE [REINT_SETATTR]

Comment by Nathaniel Clark [ 17/Dec/12 ]

Client seems to be re-sending open create because response data is large and reply_in_callback() registers it as truncated and resends. So the solution I'm going to look to implement is to correctly handle the replay of the open/create for write/exec in mdt_mfd_open().

Comment by Nathaniel Clark [ 18/Dec/12 ]

http://review.whamcloud.com/4848

Comment by Nathaniel Clark [ 11/Jan/13 ]

Original Commit is now just testing updates:

Following are cherry-picked commits that are dependent on 4848, 5002 is unchanged, 5003 needed a merge conflict resolved.
http://review.whamcloud.com/5002
http://review.whamcloud.com/5003

Comment by Nathaniel Clark [ 11/Mar/13 ]

Patches picked to branch

Generated at Sat Feb 10 01:23:51 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.