During the dedicated system test, there are several full-system and sub-system tests that we can run to help identify any potential metadata performance issues in the system as a whole, or in any particular subsystem. As far as full-system performance, two tests are likely to be useful: 1) A scale up mdtest were we run X threads on Y clients, where X and Y are incremented in order to generate a performance curve 2) Run many separate (not mpi-coordinated) mdtests on the system to see if there are issues with combinations of different md ops causing more contention on one FS #1 is the more important test. #2, due to its random nature, could be difficult to really analyze. For the sub-system tests, three tests should identify where along the network, lock manager, ldiskfs, disk chain the delays are coming. 1) network -> lnet_selftest between client and MDT, both a bandwidth and message rate test 2) locking/ldiskfs -> run mdtest directly on MDT ldiskfs filesystem to remove the Lustre layer 3) ldiskfs/disk -> create a new logical volume on the volume group and run xdd iop tests. I think #2 is probably where to start and then either move up or down the chain, depending on results. As far as stats collection, we should capture debug logs from the client and MDT during a single-client mdtest. These can be analyzed later if necessary. During the mdtest runs, we can take timestamped snapshots of the MDT stats proc files, as well as the disk and LVM stats. If there is a particularly interesting run, we can then look at those snapshots to see if there are any interesting patterns. Capturing cpu load and memory usage would also be good.