[LU-1695] Demonstrate MDS performance with increasing client load for SMP Affinity Created: 25/Jul/12 Updated: 28/Feb/18 Resolved: 28/Feb/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | John Carrier (Inactive) | Assignee: | Liang Zhen (Inactive) |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Cray XE6 with Lustre 2.1.1 MDS/OSS |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 2186 | ||||||||
| Description |
|
Cray has found that, plotting mds performance against number of clients (or ranks) using either mdtest or metabench, demonstrates that MDS performance for create/stat/unlink rises from 1 to 64 clients, where it peaks, then declines as additional clients are added to the test. Historically, more than 64 clients were not needed to show MDS performance saturation. the problem is that using more than 64 clients leads to a decline in performance rather than reaching a plateau, which would be expected given the limitation of using a single MDS. The following data were gathered using metabench to measure rates of create/stat/unlink for a fixed number of files spread over a growing number of clients. We are using Lustre 2.1.1 plus patches on the Lustre servers and the clients were Lustre 1.8.6 on Cray XE6. The data are for 1M files, but the degradation of create and unlink rates as the number of clients increases is consistent for a broad range of file counts. Furthermore, the degradation is higher when all files are in a single directory (as expected). Individual directories: Shared directory: The DoD's HPCMOD office first reported this "behavior" to Sun and Cray several years ago following a test they funded to compare Lustre and GPFS metadata performance. For a small range of clients, Lustre out performed GPFS, but then, instead of hitting a plateau with increasing client load, the Lustre MDS performance declined significantly (greater than 64 or 128 nodes, depending on the test run). At the time, Sun told Cray and its customer that making the MDS SMP-aware would resolve the problem. As a result, we need to add a test of create/stat/unlink rates as a function of a wide range of client counts into the qualification of the SMP affinity feature. We need to show results before and after the SMP patches. If there is no effect, then these results will provide a baseline for comparison with future investigations. |
| Comments |
| Comment by Bryon Neitzel (Inactive) [ 10/Aug/12 ] |
|
Liang will look at Toro test results from Aug 4-5 to check for degradation. He will also run SMP Scaling code on Hyperion DAT cluster to look at larger scales. |