[LU-7958] Directory Operation Performance Degradation on DNE Created: 30/Mar/16  Updated: 07/Apr/16  Resolved: 07/Apr/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Giuseppe Di Natale (Inactive) Assignee: John Fuchs-Chesney (Inactive)
Resolution: Done Votes: 0
Labels: dne, llnl

Attachments: Text File Catalyst-mdtest-3-25-2016.txt     Text File Zwicky-mdtest-10-7-2015.txt    
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

We have been performing on going testing of DNE in Lustre 2.8. During testing, we have noticed that severe performance degradation occurs when directory operations involving 2 or more MDSs occur. We are using mdtest to assess performance of our setup. We are using directories which are stripped a varying number of MDSs (all have a starting index of 0) to vary the MDS count. Our setup contains 4 MDSs, 4 OSSs, and an independent MGS. Each server is backed by zfs and each server has a pool built with 4 solid state drives. Below is a summary of results from mdtest. I've attached the raw output just in case as well (Catalyst-mdtest-3-25-2016.txt).

Directory creation:

MDS Count Max Min Mean Std Dev
1 5430.605 3665.080 4601.181 647.529
2 140.306 108.522 126.983 11.541
3 105.811 96.243 102.489 3.741
4 109.556 103.074 106.963 2.482

Directory removal:

MDS Count Max Min Mean Std Dev
1 2667.779 2126.894 2326.902 206.254
2 163.166 86.190 107.369 32.263
3 61.288 57.917 59.675 1.438
4 81.529 62.300 69.171 7.785

Directory stat:

MDS Count Max Min Mean Std Dev
1 53112.492 52023.002 52549.901 483.261
2 20169.405 20068.291 20118.116 44.012
3 18580.815 17621.591 18308.662 397.660
4 16904.242 16836.548 16863.872 24.812

We checked some output from testing back in October 2015. A similar trend of degradation is present, though the performance degradation did not appear to be as severe. For this test, we were running 4 MDSs each w/ 1 MDT, 2 OSSs each w/ 1 OST, and clients were running the same lustre version as the servers. Backend was zfs. I attached the logs from that test as well (Zwicky-mdtest-10-7-2015.txt).

Let us know if you need any other information or more testing.



 Comments   
Comment by Alex Zhuravlev [ 30/Mar/16 ]

it looks like many mkdir's were distributed? this isn't a target workload for DNE2. such mkdir's aren't supposed to be very frequent, especially with ZFS.

Comment by Andreas Dilger [ 30/Mar/16 ]

The intended usage for DNE is not to be creating many thousands of remote or striped directories at one time, but rather to create a striped directory and then create large numbers of files or subdirectories inside the striped directory.

The creation of remote or striped directories themselves is relatively heavyweight because multiple MDTs are involved and distributed recovery, but regular create/unlink and mkdir/rmdir operations inside the remote or striped directory are independent of each other.

Comment by Di Wang [ 31/Mar/16 ]

It looks like you set default striped EA on the top directory, then all of its children will be created as striped directory. Probably need to "fix" mdtest to do sth like Andreas's comments.

Comment by John Fuchs-Chesney (Inactive) [ 04/Apr/16 ]

Hello Guiseppe,

Do you have what you need from this ticket? Or is there more work you would like us to do?

Thanks,
~ jfc.

Comment by John Fuchs-Chesney (Inactive) [ 07/Apr/16 ]

Hello Guiseppe,

We are marking this as resolved/done, having provided information back to you.

If you feel this ticket needs more work form us, please let us know.

Thanks,
~ jfc.

Generated at Sat Feb 10 02:13:23 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.