Details

    • New Feature
    • Resolution: Fixed
    • Blocker
    • None
    • None
    • 7176

    Description

      We need to add details of how to use this new tool in the manual for 2.2

      Attachments

        Issue Links

          Activity

            [LUDOC-36] Document mds-survey
            mdiep Minh Diep added a comment -

            merged with master

            mdiep Minh Diep added a comment - merged with master
            pjones Peter Jones added a comment - Minh did this work under http://review.whamcloud.com/#change,2072

            From Minh:
            ERROR often caused by out of inode on the mdt
            SHORT often means there isn't enough files to generate enough data to collect between 1 second

            cliffw Cliff White (Inactive) added a comment - From Minh: ERROR often caused by out of inode on the mdt SHORT often means there isn't enough files to generate enough data to collect between 1 second

            Minh, can you explain the SHORT messages?

            cliffw Cliff White (Inactive) added a comment - Minh, can you explain the SHORT messages?
            rhenwood Richard Henwood (Inactive) added a comment - Script is here: http://review.whamcloud.com/#change,1969 Related LU is here: http://jira.whamcloud.com/browse/LU-633

            I often see SHORT and ERROR in the output of my mds-survey run. This needs to be explained in the doc too.

            # mds-survey 
            Tue Feb  7 15:20:45 EST 2012 /usr/bin/mds-survey from wc0008
            mdt 1 file  100000 dir    4 thr    4 create 7039.07             ERROR lookup 191596.67             SHORT md_getattr 106189.70             SHORT setxattr 18632.91             ERROR destroy 10423.02             ERROR 
            mdt 1 file  100000 dir    4 thr    8 create 6911.21             ERROR lookup 218940.93             SHORT md_getattr 130979.97             SHORT setxattr 4427.82             ERROR destroy 6362.05             ERROR 
            mdt 1 file  100000 dir    4 thr   16 create 5213.21             ERROR lookup 189467.26             SHORT md_getattr 113743.37             ERROR setxattr 1720.42             ERROR destroy  706.99             ERROR 
            mdt 1 file  100000 dir    4 thr   32 create 
            
            
            
            rhenwood Richard Henwood (Inactive) added a comment - I often see SHORT and ERROR in the output of my mds-survey run. This needs to be explained in the doc too. # mds-survey Tue Feb 7 15:20:45 EST 2012 /usr/bin/mds-survey from wc0008 mdt 1 file 100000 dir 4 thr 4 create 7039.07 ERROR lookup 191596.67 SHORT md_getattr 106189.70 SHORT setxattr 18632.91 ERROR destroy 10423.02 ERROR mdt 1 file 100000 dir 4 thr 8 create 6911.21 ERROR lookup 218940.93 SHORT md_getattr 130979.97 SHORT setxattr 4427.82 ERROR destroy 6362.05 ERROR mdt 1 file 100000 dir 4 thr 16 create 5213.21 ERROR lookup 189467.26 SHORT md_getattr 113743.37 ERROR setxattr 1720.42 ERROR destroy 706.99 ERROR mdt 1 file 100000 dir 4 thr 32 create
            mdiep Minh Diep added a comment -

            24.5 Testing MDS Performance (mds-survey)

            The mds-survey script tests the local metadata performance using the echo_client to drive different layers of the MDS stack: mdd, mdt, osd.

            It can be used with the following classes of operations

            1. Open-create/mkdir/create
            2. Lookup/getattr/setxattr
            3. Delete/destroy
            4. Unlink/rmdir

            These operations will be run by a variable number of concurrent threads and will test with the number of
            directories specified by the user. The run can be executed such that all threads operate in a single directory
            (dir_count=1) or in private/unique directory (dir_count=x thrlo=x thrhi=x).

            The mdd, mdt, or osd instance is driven directly. The script automatically loads the obdecho module if required
            and creates instance of echo_client.

            This script can also create OST objects by providing stripe_count greater than zero.

            To perform a run:
            1. Start the Lustre MDT.
            The Lustre MDT should be mounted on the MDS node to be tested.
            2. Start the Lustre OSTs (optional, only required when test with OST objects)
            The Lustre OSTs should be mounted on the OSS node(s).
            3. Run the mds-survey script as explain below

            The script must be customized according to the components under test and
            where it should keep its working files. Customization variables are
            described as followed:

            thrlo threads to start testing. skipped if less than dir_count
            thrhi maximum number of threads to test
            targets MDT instance
            file_count total number of files to test
            dir_count total number of directories to test
            stripe_count number stripe on OST objects
            tests_str test operations. Must have at least "create" and "destroy"
            start_number base number for each thread to prevent name collisions
            layer MDS stack's layer to be tested

            • Create a Lustre configuration using your normal methods

            a. Run without OST objects creation:
            Setup the Lustre MDS without OST mounted. Then invoke the mds-survey script
            $ thrhi=64 file_count=200000 sh mds-survey

            b. Run with OST objects creation:
            Setup the Lustre MDS with at least one OST mounted. Then invoke the mds-survey script with stripe_count
            parameter
            $ thrhi=64 file_count=200000 stripe_count=2 sh mds-survey

            Note: a specific mdt instance can be specified using targets variable.
            $ targets=lustre-MDT0000 thrhi=64 file_count=200000 stripe_count=2 sh mds-survey

            Output files:

            When the script runs, it creates a number of working files and a pair of
            result files. All files start with the prefix given by ${rslt}.

            ${rslt}.summary same as stdout
            ${rslt}.script_* per-host test script files
            ${rslt}.detail_tmp* per-mdt result files
            ${rslt}.detail collected result files for post-mortem

            The script iterates over the given numbers of threads performing
            all the specified tests and checking that all test processes
            completed successfully.

            Note that the script may not clean up properly if it is aborted or if it
            encounters an unrecoverable error. In this case, manual cleanup may be
            required, possibly including killing any running instances of 'lctl' (local
            or remote), removing echo_client instances created by the script and
            unloading obdecho.

            Script output:

            The summary file and stdout contain lines like...

            mdt 1 file 100000 dir 4 thr 4 create 5652.05 [ 999.01,46940.48] destroy 5797.79 [ 0.00,52951.55]

            mdt 1 is the total number of MDTs under test.
            file 100000 is the total number of files to operate
            dir 4 is the total number of directories to operate
            thr 4 is the total number of threads operate over all directories
            create
            destroy are the test name. More tests will be displayed on the same line.
            565.05 is the aggregate operations over all MDTs measured by
            dividing the total number of operations by the elapsed time.
            [999.01,46940.48] are the minimum and maximum instantaneous operation seen on
            any individual MDT.

            mdiep Minh Diep added a comment - 24.5 Testing MDS Performance (mds-survey) The mds-survey script tests the local metadata performance using the echo_client to drive different layers of the MDS stack: mdd, mdt, osd. It can be used with the following classes of operations 1. Open-create/mkdir/create 2. Lookup/getattr/setxattr 3. Delete/destroy 4. Unlink/rmdir These operations will be run by a variable number of concurrent threads and will test with the number of directories specified by the user. The run can be executed such that all threads operate in a single directory (dir_count=1) or in private/unique directory (dir_count=x thrlo=x thrhi=x). The mdd, mdt, or osd instance is driven directly. The script automatically loads the obdecho module if required and creates instance of echo_client. This script can also create OST objects by providing stripe_count greater than zero. To perform a run: 1. Start the Lustre MDT. The Lustre MDT should be mounted on the MDS node to be tested. 2. Start the Lustre OSTs (optional, only required when test with OST objects) The Lustre OSTs should be mounted on the OSS node(s). 3. Run the mds-survey script as explain below The script must be customized according to the components under test and where it should keep its working files. Customization variables are described as followed: thrlo threads to start testing. skipped if less than dir_count thrhi maximum number of threads to test targets MDT instance file_count total number of files to test dir_count total number of directories to test stripe_count number stripe on OST objects tests_str test operations. Must have at least "create" and "destroy" start_number base number for each thread to prevent name collisions layer MDS stack's layer to be tested Create a Lustre configuration using your normal methods a. Run without OST objects creation: Setup the Lustre MDS without OST mounted. Then invoke the mds-survey script $ thrhi=64 file_count=200000 sh mds-survey b. Run with OST objects creation: Setup the Lustre MDS with at least one OST mounted. Then invoke the mds-survey script with stripe_count parameter $ thrhi=64 file_count=200000 stripe_count=2 sh mds-survey Note: a specific mdt instance can be specified using targets variable. $ targets=lustre-MDT0000 thrhi=64 file_count=200000 stripe_count=2 sh mds-survey Output files: When the script runs, it creates a number of working files and a pair of result files. All files start with the prefix given by ${rslt}. ${rslt}.summary same as stdout ${rslt}.script_* per-host test script files ${rslt}.detail_tmp* per-mdt result files ${rslt}.detail collected result files for post-mortem The script iterates over the given numbers of threads performing all the specified tests and checking that all test processes completed successfully. Note that the script may not clean up properly if it is aborted or if it encounters an unrecoverable error. In this case, manual cleanup may be required, possibly including killing any running instances of 'lctl' (local or remote), removing echo_client instances created by the script and unloading obdecho. Script output: The summary file and stdout contain lines like... mdt 1 file 100000 dir 4 thr 4 create 5652.05 [ 999.01,46940.48] destroy 5797.79 [ 0.00,52951.55] mdt 1 is the total number of MDTs under test. file 100000 is the total number of files to operate dir 4 is the total number of directories to operate thr 4 is the total number of threads operate over all directories create destroy are the test name. More tests will be displayed on the same line. 565.05 is the aggregate operations over all MDTs measured by dividing the total number of operations by the elapsed time. [999.01,46940.48] are the minimum and maximum instantaneous operation seen on any individual MDT.

            Hi Minh,

            After you wrote the initial content, I can help to test your instructions. It would be a good opportunity for me to learn how mds-survey works.

            Thanks,
            Zhiqi

            zhiqi Zhiqi Tao (Inactive) added a comment - Hi Minh, After you wrote the initial content, I can help to test your instructions. It would be a good opportunity for me to learn how mds-survey works. Thanks, Zhiqi
            pjones Peter Jones added a comment -

            Ah yes. Good idea - Thanks Di!

            pjones Peter Jones added a comment - Ah yes. Good idea - Thanks Di!
            di.wang Di Wang added a comment -

            If the manual is only about how to use mdt_survey script like obdfilter_survey. Minh might be the right person to write it, because he wrote the script. As I knew, he already wrote a README for this script. Thanks.

            di.wang Di Wang added a comment - If the manual is only about how to use mdt_survey script like obdfilter_survey. Minh might be the right person to write it, because he wrote the script. As I knew, he already wrote a README for this script. Thanks.

            People

              mdiep Minh Diep
              pjones Peter Jones
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: