Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.1.5
    • Lustre 2.1.3
    • None
    • server-2.1.3-1nas (centos6.3)
      client 2.1.3-1nas Suse11-1
    • 3
    • 5443

    Description

      1. ldd a.out
        /usr/bin/ldd: line 118: ./a.out: Text file busy

      see attached client and server debug traces.

      Attachments

        1. client.inode.dlmtrace.debug
          0.2 kB
        2. mds.inode.dlmtrace.debug
          765 kB
        3. debug.duringcompile.gz
          0.2 kB
        4. debug.duringldd.gz
          0.2 kB
        5. simple-2277.c
          1 kB

        Issue Links

          Activity

            [LU-2277] Text file busy
            utopiabound Nathaniel Clark added a comment - - edited

            Patches picked to branch

            utopiabound Nathaniel Clark added a comment - - edited Patches picked to branch

            Original Commit is now just testing updates:

            Following are cherry-picked commits that are dependent on 4848, 5002 is unchanged, 5003 needed a merge conflict resolved.
            http://review.whamcloud.com/5002
            http://review.whamcloud.com/5003

            utopiabound Nathaniel Clark added a comment - Original Commit is now just testing updates: Following are cherry-picked commits that are dependent on 4848, 5002 is unchanged, 5003 needed a merge conflict resolved. http://review.whamcloud.com/5002 http://review.whamcloud.com/5003
            utopiabound Nathaniel Clark added a comment - http://review.whamcloud.com/4848

            Client seems to be re-sending open create because response data is large and reply_in_callback() registers it as truncated and resends. So the solution I'm going to look to implement is to correctly handle the replay of the open/create for write/exec in mdt_mfd_open().

            utopiabound Nathaniel Clark added a comment - Client seems to be re-sending open create because response data is large and reply_in_callback() registers it as truncated and resends. So the solution I'm going to look to implement is to correctly handle the replay of the open/create for write/exec in mdt_mfd_open().

            Note on what goes over the wire:
            Items listed are wireshark dump of traffic, all items are requests client -> mds, all items are replied to prior to the next item being sent.

            For stripe sizes 50-53: (example is stripe 50)
            1) LDLM_ENQUEUE request [Concurrent Write][ intent : open create ] filename : a2
            2) MDS_CLOSE [REINT_SETATTR]
            3) LDLM_ENQUEUE request [Concurrent Read][ intent : lookup ] filename : 50
            4) LDLM_ENQUEUE request [Concurrent Read][ intent : open ][REINT_OPEN] filename : a2
            5) MDS_CLOSE [REINT_SETATTR]

            For stripe sizes 54-59: (example is stripe 54)
            1) LDLM_ENQUEUE request [Concurrent Write][ intent : open create ] filename : a2
            2) LDLM_ENQUEUE request [Concurrent Write][ intent : open create ] filename : a2
            3) MDS_CLOSE [REINT_SETATTR]
            4) LDLM_ENQUEUE request [Concurrent Read][ intent : lookup ] filename : 54
            5) LDLM_ENQUEUE request [Concurrent Read][ intent : open ] filename : a2
            6) LDLM_ENQUEUE request [Concurrent Read][ intent : open ] filename : a2
            7) MDS_CLOSE [REINT_SETATTR]

            utopiabound Nathaniel Clark added a comment - Note on what goes over the wire: Items listed are wireshark dump of traffic, all items are requests client -> mds, all items are replied to prior to the next item being sent. For stripe sizes 50-53: (example is stripe 50) 1) LDLM_ENQUEUE request [Concurrent Write] [ intent : open create ] filename : a2 2) MDS_CLOSE [REINT_SETATTR] 3) LDLM_ENQUEUE request [Concurrent Read] [ intent : lookup ] filename : 50 4) LDLM_ENQUEUE request [Concurrent Read] [ intent : open ] [REINT_OPEN] filename : a2 5) MDS_CLOSE [REINT_SETATTR] For stripe sizes 54-59: (example is stripe 54) 1) LDLM_ENQUEUE request [Concurrent Write] [ intent : open create ] filename : a2 2) LDLM_ENQUEUE request [Concurrent Write] [ intent : open create ] filename : a2 3) MDS_CLOSE [REINT_SETATTR] 4) LDLM_ENQUEUE request [Concurrent Read] [ intent : lookup ] filename : 54 5) LDLM_ENQUEUE request [Concurrent Read] [ intent : open ] filename : a2 6) LDLM_ENQUEUE request [Concurrent Read] [ intent : open ] filename : a2 7) MDS_CLOSE [REINT_SETATTR]

            After more careful inspection of the logs, it appears that in the failing cases, the second open is being processed twice:

            This is log lines that match the following grep expression: '((mdt_close(|mdt_mfd_open).*leaving|incoming)' from logs that are of a double open copy to a directory with a given stripe size:

            53
            (events.c:284:request_in_callback()) incoming req@ffff8800189bc850 x1421171244404975 msgsize 192
            (events.c:284:request_in_callback()) incoming req@ffff8800189bc050 x1421171249691684 msgsize 192
            (events.c:284:request_in_callback()) incoming req@ffff880014590800 x1421167733816518 msgsize 544
            (mdt_open.c:721:mdt_mfd_open()) Process leaving (rc=0 : 0 : 0)
            (events.c:284:request_in_callback()) incoming req@ffff880012109400 x1421167733816520 msgsize 360
            (mdt_open.c:1645:mdt_close()) Process leaving (rc=0 : 0 : 0)
            (events.c:284:request_in_callback()) incoming req@ffff880015236050 x1421167733816521 msgsize 544
            (events.c:284:request_in_callback()) incoming req@ffff880012043000 x1421167733816522 msgsize 544
            (mdt_open.c:721:mdt_mfd_open()) Process leaving (rc=0 : 0 : 0)
            (events.c:284:request_in_callback()) incoming req@ffff8800152ca050 x1421167733816523 msgsize 360
            (mdt_open.c:1645:mdt_close()) Process leaving (rc=0 : 0 : 0)
            (events.c:284:request_in_callback()) incoming req@ffff8800189bd850 x1421167733816525 msgsize 192
            54
            (events.c:284:request_in_callback()) incoming req@ffff8800189bd050 x1421167797775378 msgsize 192
            (events.c:284:request_in_callback()) incoming req@ffff880012109400 x1421167797775379 msgsize 192
            (events.c:284:request_in_callback()) incoming req@ffff8800189be850 x1421171244404976 msgsize 192
            (events.c:284:request_in_callback()) incoming req@ffff880014590800 x1421167733816584 msgsize 544
            (mdt_open.c:721:mdt_mfd_open()) Process leaving (rc=0 : 0 : 0)
            (events.c:284:request_in_callback()) incoming req@ffff8800148c9800 x1421167733816584 msgsize 544
            (events.c:284:request_in_callback()) incoming req@ffff88001458dc00 x1421167733816586 msgsize 360
            (mdt_open.c:1645:mdt_close()) Process leaving (rc=0 : 0 : 0)
            (events.c:284:request_in_callback()) incoming req@ffff880015238850 x1421167733816587 msgsize 544
            (events.c:284:request_in_callback()) incoming req@ffff880014689800 x1421167733816588 msgsize 544
            (mdt_open.c:721:mdt_mfd_open()) Process leaving (rc=0 : 0 : 0)
            (events.c:284:request_in_callback()) incoming req@ffff880012109400 x1421167733816588 msgsize 544
            (mdt_open.c:721:mdt_mfd_open()) Process leaving (rc=0 : 0 : 0)
            (events.c:284:request_in_callback()) incoming req@ffff880014590800 x1421167733816589 msgsize 360
            (mdt_open.c:1645:mdt_close()) Process leaving (rc=0 : 0 : 0)

            Notice the repeated x1421167733816588 in the 54 stripe case with accompanying mdt_mfd_open but missing mdt_close(). mdt_close is skipped. This means that a request is being processed twice.

            utopiabound Nathaniel Clark added a comment - After more careful inspection of the logs, it appears that in the failing cases, the second open is being processed twice: This is log lines that match the following grep expression: '((mdt_close(|mdt_mfd_open).*leaving|incoming)' from logs that are of a double open copy to a directory with a given stripe size: 53 (events.c:284:request_in_callback()) incoming req@ffff8800189bc850 x1421171244404975 msgsize 192 (events.c:284:request_in_callback()) incoming req@ffff8800189bc050 x1421171249691684 msgsize 192 (events.c:284:request_in_callback()) incoming req@ffff880014590800 x1421167733816518 msgsize 544 (mdt_open.c:721:mdt_mfd_open()) Process leaving (rc=0 : 0 : 0) (events.c:284:request_in_callback()) incoming req@ffff880012109400 x1421167733816520 msgsize 360 (mdt_open.c:1645:mdt_close()) Process leaving (rc=0 : 0 : 0) (events.c:284:request_in_callback()) incoming req@ffff880015236050 x1421167733816521 msgsize 544 (events.c:284:request_in_callback()) incoming req@ffff880012043000 x1421167733816522 msgsize 544 (mdt_open.c:721:mdt_mfd_open()) Process leaving (rc=0 : 0 : 0) (events.c:284:request_in_callback()) incoming req@ffff8800152ca050 x1421167733816523 msgsize 360 (mdt_open.c:1645:mdt_close()) Process leaving (rc=0 : 0 : 0) (events.c:284:request_in_callback()) incoming req@ffff8800189bd850 x1421167733816525 msgsize 192 54 (events.c:284:request_in_callback()) incoming req@ffff8800189bd050 x1421167797775378 msgsize 192 (events.c:284:request_in_callback()) incoming req@ffff880012109400 x1421167797775379 msgsize 192 (events.c:284:request_in_callback()) incoming req@ffff8800189be850 x1421171244404976 msgsize 192 (events.c:284:request_in_callback()) incoming req@ffff880014590800 x1421167733816584 msgsize 544 (mdt_open.c:721:mdt_mfd_open()) Process leaving (rc=0 : 0 : 0) (events.c:284:request_in_callback()) incoming req@ffff8800148c9800 x1421167733816584 msgsize 544 (events.c:284:request_in_callback()) incoming req@ffff88001458dc00 x1421167733816586 msgsize 360 (mdt_open.c:1645:mdt_close()) Process leaving (rc=0 : 0 : 0) (events.c:284:request_in_callback()) incoming req@ffff880015238850 x1421167733816587 msgsize 544 (events.c:284:request_in_callback()) incoming req@ffff880014689800 x1421167733816588 msgsize 544 (mdt_open.c:721:mdt_mfd_open()) Process leaving (rc=0 : 0 : 0) (events.c:284:request_in_callback()) incoming req@ffff880012109400 x1421167733816588 msgsize 544 (mdt_open.c:721:mdt_mfd_open()) Process leaving (rc=0 : 0 : 0) (events.c:284:request_in_callback()) incoming req@ffff880014590800 x1421167733816589 msgsize 360 (mdt_open.c:1645:mdt_close()) Process leaving (rc=0 : 0 : 0) Notice the repeated x1421167733816588 in the 54 stripe case with accompanying mdt_mfd_open but missing mdt_close(). mdt_close is skipped. This means that a request is being processed twice.
            utopiabound Nathaniel Clark added a comment - - edited

            Attached is C File which can recreate this bug in directories with a default stripe width greater than 53, by just doing a modified copy of an executable.

            What causes the executable to be in a bad state is the following order of events:
            1. Create file (O_RDWR or O_WRONLY)
            2. write to file
            3. close file (up to here, everything is fine – this is just a copy)
            4. open file (O_RDWR or O_WRONLY) - O_RDONLY do not cause this issue
            5. close file

            utopiabound Nathaniel Clark added a comment - - edited Attached is C File which can recreate this bug in directories with a default stripe width greater than 53, by just doing a modified copy of an executable. What causes the executable to be in a bad state is the following order of events: 1. Create file (O_RDWR or O_WRONLY) 2. write to file 3. close file (up to here, everything is fine – this is just a copy) 4. open file (O_RDWR or O_WRONLY) - O_RDONLY do not cause this issue 5. close file

            Have reproduction case:
            1 Client - CentOS 6.3 - Lustre 2.1.3-2.6.32_279.2.1.el6.x86_64
            gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC)
            1 MDS - CentOS 6.3 - Lustre 2.1.3-2.6.32_279.2.1.el6_lustre.gc46c389.x86_64 (1 mds/mgs partition)
            1 OSS - CentOS 6.3 - Lustre 2.1.3-2.6.32_279.2.1.el6_lustre.gc46c389.x86_64 (60 osts on loop devices)

            All TCP interconnect
            All Server partitions created & mounted with standard autotest tools (auster -c PATH/TO/custom.sh sanity.sh --only MOUNT)

            In mounted FS:
            cd /mnt/lustre
            mkdir 58
            lfs setstripe -c 58 58
            cd 58
            cat << EOF > test.c
            #include <stdio.h>
            int
            main(int argc, char *argv[])
            {
            return 0;
            }
            EOF
            gcc test.c
            ./a.out

            Expected Results:
            Should run w/o error. For directories with a strip width less than 54, there is no error.

            Actual Results:
            For stripe widths of 54 and larger (at least up to 60) the following error results:
            ./a.out: Text file busy

            Other Test Results:

            • If the filesystem is remounted then all a.out's will run correctly.
            • If any a.out is copied (either w/in or between directories, regardless of stripe width) it will run fine
            • If "bad" a.out is moved it will still illicit the same error

            With 2 Clients (both CentOS 6.3 - Lustre 2.1.3-2.6.32_279.2.1.el6.x86_64)
            1) First clients generates "bad" a.out
            2) Second client mounts FS
            3) 2nd client gets error running a.out
            4) First client unmounts FS
            5) 2nd Client no longer gets error running a.out

            WAG of cause:
            Client 1 when compiling test.c somehow continues to have a writecount ref (mot_writecount > 0) after compilation finishes. This ref is cleaned up when the client fully disconnects.

            utopiabound Nathaniel Clark added a comment - Have reproduction case: 1 Client - CentOS 6.3 - Lustre 2.1.3-2.6.32_279.2.1.el6.x86_64 gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC) 1 MDS - CentOS 6.3 - Lustre 2.1.3-2.6.32_279.2.1.el6_lustre.gc46c389.x86_64 (1 mds/mgs partition) 1 OSS - CentOS 6.3 - Lustre 2.1.3-2.6.32_279.2.1.el6_lustre.gc46c389.x86_64 (60 osts on loop devices) All TCP interconnect All Server partitions created & mounted with standard autotest tools (auster -c PATH/TO/custom.sh sanity.sh --only MOUNT) In mounted FS: cd /mnt/lustre mkdir 58 lfs setstripe -c 58 58 cd 58 cat << EOF > test.c #include <stdio.h> int main(int argc, char *argv[]) { return 0; } EOF gcc test.c ./a.out Expected Results: Should run w/o error. For directories with a strip width less than 54, there is no error. Actual Results: For stripe widths of 54 and larger (at least up to 60) the following error results: ./a.out: Text file busy Other Test Results: If the filesystem is remounted then all a.out's will run correctly. If any a.out is copied (either w/in or between directories, regardless of stripe width) it will run fine If "bad" a.out is moved it will still illicit the same error With 2 Clients (both CentOS 6.3 - Lustre 2.1.3-2.6.32_279.2.1.el6.x86_64) 1) First clients generates "bad" a.out 2) Second client mounts FS 3) 2nd client gets error running a.out 4) First client unmounts FS 5) 2nd Client no longer gets error running a.out WAG of cause: Client 1 when compiling test.c somehow continues to have a writecount ref (mot_writecount > 0) after compilation finishes. This ref is cleaned up when the client fully disconnects.

            The Lustre filesyste is not reexported via nfs. There NFS mounts on the client for home dir, etc.

            I was able to track this issue to the stripe width of the directory. If the stripe width is > 54 it will reproduce the error.

            how it was reproduced.
            mhanafi@pfe1:/nobackupp5/mhanafi/60> cat test.c++
            #include <stdio.h>
            int
            main(int argc, char *argv[])
            {
            return 0;
            }

            mhanafi@pfe1:/nobackupp5/mhanafi/60> icpc -O test.c++
            mhanafi@pfe1:/nobackupp5/mhanafi/60> ldd a.out
            /usr/bin/ldd: line 118: ./a.out: Text file busy

            see attached debug logs.

            mhanafi Mahmoud Hanafi added a comment - The Lustre filesyste is not reexported via nfs. There NFS mounts on the client for home dir, etc. I was able to track this issue to the stripe width of the directory. If the stripe width is > 54 it will reproduce the error. how it was reproduced. mhanafi@pfe1:/nobackupp5/mhanafi/60> cat test.c++ #include <stdio.h> int main(int argc, char *argv[]) { return 0; } mhanafi@pfe1:/nobackupp5/mhanafi/60> icpc -O test.c++ mhanafi@pfe1:/nobackupp5/mhanafi/60> ldd a.out /usr/bin/ldd: line 118: ./a.out: Text file busy see attached debug logs.
            green Oleg Drokin added a comment -

            Is there any NFS reexport going on by any chance?
            Was the file created via NFS in particular? Do the clients where you are able to execute is on NFS and ones that fail are on lustre?

            I also see that you have standard debug enabled, if you can still reproduce this and there is no NFS involved, please increase debug level on your mds as follows:
            cat /proc/sys/lnet/debug
            remember the value it gave you

            try to minimize all other activity oing on on the cluster

            echo -1 >/proc/sys/lnet debug ; echo - trace >/proc/sys/lnet/debug

            do the reproducer

            1. important to run the lctl below as quickly as possible after the reproducer
              lctl dk >/tmp/lustre/log

            echo "YOUR SAVED VALUE FROM ABOVE" >/proc/sys/lnet/debug

            Thanks

            green Oleg Drokin added a comment - Is there any NFS reexport going on by any chance? Was the file created via NFS in particular? Do the clients where you are able to execute is on NFS and ones that fail are on lustre? I also see that you have standard debug enabled, if you can still reproduce this and there is no NFS involved, please increase debug level on your mds as follows: cat /proc/sys/lnet/debug remember the value it gave you try to minimize all other activity oing on on the cluster echo -1 >/proc/sys/lnet debug ; echo - trace >/proc/sys/lnet/debug do the reproducer important to run the lctl below as quickly as possible after the reproducer lctl dk >/tmp/lustre/log echo "YOUR SAVED VALUE FROM ABOVE" >/proc/sys/lnet/debug Thanks

            People

              utopiabound Nathaniel Clark
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: