Details

    • Task
    • Resolution: Fixed
    • Blocker
    • None
    • None
    • None
    • 9223372036854775807

    Description

      Many stripe count test

      The many stripe count functional test is intended to show that a DNE2 configuration can handle many MDTs in a single filesystem, and a single directory can be striped over many MDTs. Due to the virtual AWS environment in which this is being tested, while performance will be measured, neither performance scaling nor load testing are primary goals of this test. It is rather a functional scaling test of the ability of the filesystem configuration and directory striping code to handle a large number of MDTs.

      1. Create a filesystem with 128 MDTs, 128 OSTs and at least 128 client mount points (multiple mounts per client)
      2. Create striped directories with stripe count N in 16, 32, 64, 96, 128:
                lfs setdirstripe -c N /mnt/lustre/testN
        

        Note: This command creates a striped directory across N MDTs.

                lfs setdirstripe -D -c N /mnt/lustre/testN
        

        Note: This command sets the default stripe count to N. All directories created within this directory will have this default stripe count applied.

      3. Run mdtest on all client mount points, and each thread will create/stat/unlink at least 128k files in the striped test directory. Run this test under a striped directory with default stripes, so all of subdirectories will be striped directory.
                lfs setdirstripe -c N /mnt/lustre/testN
                lfs setdirstripe -D -c N /mnt/lustre/testN
        
      4. No errors will be observed, and balanced striping of files across MDTs will be observed.

      Attachments

        1. 20150629-bench.log
          275 kB
        2. 20150629-results.json
          1 kB
        3. 20150701-bench96.log
          76 kB
        4. 20150701-results96.json
          0.3 kB

        Issue Links

          Activity

            [LU-6737] many stripe testing of DNE2

            Thanks for you help Robert - we've got the data we need.

            rhenwood Richard Henwood (Inactive) added a comment - Thanks for you help Robert - we've got the data we need.
            rread Robert Read added a comment -

            Results from the 96 stripe run.

            rread Robert Read added a comment - Results from the 96 stripe run.

            Robert, you are correct that the current DNE MDT allocation policy is not as balanced as the OST allocation policy. That is an enhancement for the future, including taking MDT space usage into account.

            It should be noted that the DNE allocation policy isn't necessarily to always start at MDT0, but rather (I believe by default) it will use the parent directory as the master (stripe 0) and round-robin from there, so if all of the directories are created off the filesystem root they will use MDT0 as a starting point. This can be changed via lfs mkdir -i <master_mdt_idx> -c N to explicitly start the stripe creation on a different MDT, but it isn't as good as an improved MDT allocation policy.

            adilger Andreas Dilger added a comment - Robert, you are correct that the current DNE MDT allocation policy is not as balanced as the OST allocation policy. That is an enhancement for the future, including taking MDT space usage into account. It should be noted that the DNE allocation policy isn't necessarily to always start at MDT0, but rather (I believe by default) it will use the parent directory as the master (stripe 0) and round-robin from there, so if all of the directories are created off the filesystem root they will use MDT0 as a starting point. This can be changed via lfs mkdir -i <master_mdt_idx> -c N to explicitly start the stripe creation on a different MDT, but it isn't as good as an improved MDT allocation policy.
            rread Robert Read added a comment - - edited

            Although this was not intended to be a performance test, I did notice that the stripe allocation policy for striped directories appears to be simplistic. As you can see, it appears to always allocate N sequential targets starting from MDT0. This means usage of MDTs will be very uneven unless all directories are widely striped.

            CE is designed to provision targets sequentially on each node, and with during striped directory allocation scheme this results in the initial 16 MDT striped directory using a single MDS, rather than using all of them. In the interest of saving time, I changed the target allocation scheme specifically for this test so targets were staggered across the servers, and this balance IO across all MDS instances for all test runs.

            rread Robert Read added a comment - - edited Although this was not intended to be a performance test, I did notice that the stripe allocation policy for striped directories appears to be simplistic. As you can see, it appears to always allocate N sequential targets starting from MDT0. This means usage of MDTs will be very uneven unless all directories are widely striped. CE is designed to provision targets sequentially on each node, and with during striped directory allocation scheme this results in the initial 16 MDT striped directory using a single MDS, rather than using all of them. In the interest of saving time, I changed the target allocation scheme specifically for this test so targets were staggered across the servers, and this balance IO across all MDS instances for all test runs.
            rread Robert Read added a comment - - edited

            Log file and results summary for test run.

            Details

            • 8 MDS nodes, each with 16x MDT
            • 8 OSS nodes, each with 16x OST
            • 8 clients, each with 16 mount points
            • all nodes were m3.2xlarge instances
            • 4 test runs, each in a single shared 16, 32, 64, 128 striped directory
            • mdsrate --create, --stat, --unlink in each directory
            • 128k files per MDT for each run
            • 8 threads per MDT for each run
            rread Robert Read added a comment - - edited Log file and results summary for test run. Details 8 MDS nodes, each with 16x MDT 8 OSS nodes, each with 16x OST 8 clients, each with 16 mount points all nodes were m3.2xlarge instances 4 test runs, each in a single shared 16, 32, 64, 128 striped directory mdsrate --create, --stat, --unlink in each directory 128k files per MDT for each run 8 threads per MDT for each run
            rread Robert Read added a comment -

            My tools are ready, but haven't had a chance to go run the full test yet. Will try to get to this today.

            rread Robert Read added a comment - My tools are ready, but haven't had a chance to go run the full test yet. Will try to get to this today.
            di.wang Di Wang added a comment -

            Robert, any update for the test? Thanks.

            di.wang Di Wang added a comment - Robert, any update for the test? Thanks.
            rread Robert Read added a comment -

            "lfs getdirstripe <dir>" is only printing the stripe info of the one directory so the reason for the long pause was not obvious. I had to use strace to see it reading all the dirents after it prints the stripe info. Yes I'd agree it's a bug. I peaked at the code, and this behavior appears to be buried in the details of the llapi_semantic_traverse().

            Yes, test128/dir-0 has 128 stripes with 128k regular files in one directory. test/dir-0 also 128k regular files in one directory.

            I'll try with 128 unstriped subdirectories for comparison next time, but I suspect scanning that will still be quick.

            rread Robert Read added a comment - "lfs getdirstripe <dir>" is only printing the stripe info of the one directory so the reason for the long pause was not obvious. I had to use strace to see it reading all the dirents after it prints the stripe info. Yes I'd agree it's a bug. I peaked at the code, and this behavior appears to be buried in the details of the llapi_semantic_traverse(). Yes, test128/dir-0 has 128 stripes with 128k regular files in one directory. test/dir-0 also 128k regular files in one directory. I'll try with 128 unstriped subdirectories for comparison next time, but I suspect scanning that will still be quick.
            di.wang Di Wang added a comment -

            If you have enough OSTs(let's say >= 32), then single stripe, otherwise zero stripe.

            I assume /mnt/scratch/test128/dir-0 has 128 stripes? all children(131073) under dir-0 are regular files? Strange, I did not expect lfs find under striped directory are so slow. IMHO, it should be similar as no-striped directory. something might be wrong. probably statahead. Could you please collect client side -1 debug log? Thanks

            di.wang Di Wang added a comment - If you have enough OSTs(let's say >= 32), then single stripe, otherwise zero stripe. I assume /mnt/scratch/test128/dir-0 has 128 stripes? all children(131073) under dir-0 are regular files? Strange, I did not expect lfs find under striped directory are so slow. IMHO, it should be similar as no-striped directory. something might be wrong. probably statahead. Could you please collect client side -1 debug log? Thanks

            It seems like a bug for "lfs getdirstripe" to scan all the entries in the subdirectory I think? That should require "-R" to scan subdirectories.

            As for "lfs find" I guess it is doing the reassure on all 128 directory shards, but it would be interesting to compare if this is slower than e.g. "lfs find" on a directory with 128 subdirs with an equal number of files (i.e. 1000/subdir).

            adilger Andreas Dilger added a comment - It seems like a bug for "lfs getdirstripe" to scan all the entries in the subdirectory I think? That should require "-R" to scan subdirectories. As for "lfs find" I guess it is doing the reassure on all 128 directory shards, but it would be interesting to compare if this is slower than e.g. "lfs find" on a directory with 128 subdirs with an equal number of files (i.e. 1000/subdir).

            People

              rread Robert Read
              rhenwood Richard Henwood (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: