[LU-10823] max_create_count triggering uneven distribution across OSTs Created: 16/Mar/18  Updated: 20/Jul/18  Resolved: 20/Jul/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.3
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Jesse Stroik Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None
Environment:

Centos 7


Issue Links:
Related
is related to LU-11115 OST selection algorithm broken with m... Resolved
Severity: 2
Rank (Obsolete): 9223372036854775807

 Description   

I set two OSTs on a file system to max_create_count=0 to facilitate migrating data off of them so I could destroy and recreate the zpools. Each OST has a corrupted spacemap.

 

The OSTs in question:

 

iliad-OST0018_UUID   56246673664 21707936640 34538672768  39% /iliad[OST:24]
iliad-OST0019_UUID   56246646528 21155081600 35091486464  38% /iliad[OST:25]

During the process of running lfs_migrate, I found that the very next OST in the list, OST:26 below, was getting a disproportionate number of files assigned to it. I set max_create_count=0 for that OST as well. I ran into the same issue on OST:27. Then OST:28 as soon as setting max_create_count=0 to OST:27.

 

The file system was previously well balanced and is about 86% used.

 

iliad-OST0013_UUID         52.4T 46.2T 6.1T  88% /iliad[OST:19]
iliad-OST0014_UUID         52.4T 46.2T 6.2T  88% /iliad[OST:20]
iliad-OST0015_UUID         52.4T 46.5T 5.9T  89% /iliad[OST:21]
iliad-OST0016_UUID         52.4T 46.3T 6.1T  88% /iliad[OST:22]
iliad-OST0017_UUID         52.4T 45.1T 7.3T  86% /iliad[OST:23]
iliad-OST0018_UUID         52.4T 20.1T 32.3T  38% /iliad[OST:24]
iliad-OST0019_UUID         52.4T 19.6T 32.7T  37% /iliad[OST:25]
iliad-OST001a_UUID         52.4T 49.1T 3.3T  94% /iliad[OST:26]
iliad-OST001b_UUID         52.4T 51.6T 841.1G  98% /iliad[OST:27]
iliad-OST001c_UUID         52.4T 47.6T 4.8T  91% /iliad[OST:28]

 

Last week I upgraded the Metadata / messaging and object store servers from Centos 6 running lustre 2.7 to Centos 7 running lustre 2.10.3 with ZFS 0.7.5. The MDT is still ldiskfs.

 

I ran a couple of quick tests to adjust the balancing which may help diagnose the issue.

 

While actively migrating files from OST 24,25 to the rest of the FS, I tested adjusting qos_prio_free=95

 

$ date; lfs df /iliad  | egrep "OST:(12|28)"                                                                                                        
Fri Mar 16 19:10:10 UTC 2018
iliad-OST000c_UUID   56246866176 49690633216  6556138368 88% /iliad[OST:12]
iliad-OST001c_UUID   56246898304 51048918400  5191627264 91% /iliad[OST:28]

$ date; lfs df /iliad  | egrep "OST:(12|28)"
Fri Mar 16 19:18:11 UTC 2018
iliad-OST000c_UUID   56246865920 49691904768  6554891648 88% /iliad[OST:12]
iliad-OST001c_UUID   56246902912 51064366592  5182073984 91% /iliad[OST:28]

Change for OST12: (49691904768-49690633216) / 1024 = 1241.75

Change for OST28: (51064366592-51048918400) / 1024 = 15086.125

 

Looks like it's allocating more than ten times the amount you'd expect – worse yet since OST28 is fuller than most. I then set qos_prio_free back to default and set qos_threshold_rr to 100 to attempt to get straight round robin balancing.

 

$ date; lfs df /iliad  | egrep "OST:(12|28)"
Fri Mar 16 19:18:25 UTC 2018
iliad-OST000c_UUID   56246865920 49691904768  6554891648 88% /iliad[OST:12]
iliad-OST001c_UUID   56246902912 51064366592  5182106624 91% /iliad[OST:28]

$ date; lfs df /iliad  | egrep "OST:(12|28)"
Fri Mar 16 19:32:28 UTC 2018
iliad-OST000c_UUID   56246865280 49696041856  6550753920 88% /iliad[OST:12]
iliad-OST001c_UUID   56246904448 51070885248  5175770368 91% /iliad[OST:28]

 

(49696041856-49691904768) / 1024 = 4040.125

(51070885248-51064366592) / 1024 = 6365.875

 

That seems to be in the realm of blind round robin for written files.

 

Perhaps max_create_count isn't taken into account during the balancing algorithm and the file goes to the next available OST in order. In this case, would I be better off deactivating the OSTs?



 Comments   
Comment by Taizeng Wu [ 25/Apr/18 ]

We have encounter same problem.

Environemnt:

  • OS: RHEL 7.4
  • Lustre: 2.10.2
  • ZFS: 0.7.3 (OST used ZFS)

We set some OST (40-49, 60-69) max_count_create=0 early. Yesterday, wo found a OST50 in the high read operations which cause some user access lustre slowly, then execute `lfs df -i` to found that OST50 usage have more than others.

I create a test that touch about 160 files, found that OST50 have heen allocated 10 times more than others. Then i set qos_threshold_rr to 50, run test agin, it seems to be work in the round robin for written files.

lfs df -h

UUID bytes Used Available Use% Mounted on
Share-MDT0000_UUID 1.6T 5.7G 1.5T 0% /Share[MDT:0]
Share-OST0000_UUID 58.6T 9.2T 49.4T 16% /Share[OST:0]
Share-OST0001_UUID 58.6T 9.3T 49.3T 16% /Share[OST:1]
Share-OST0002_UUID 58.6T 9.3T 49.3T 16% /Share[OST:2]
Share-OST0003_UUID 58.6T 9.3T 49.3T 16% /Share[OST:3]
Share-OST0004_UUID 58.6T 9.2T 49.4T 16% /Share[OST:4]
Share-OST0005_UUID 58.6T 9.3T 49.3T 16% /Share[OST:5]
Share-OST0006_UUID 58.6T 9.4T 49.1T 16% /Share[OST:6]
Share-OST0007_UUID 58.6T 9.3T 49.3T 16% /Share[OST:7]
Share-OST0008_UUID 58.6T 9.3T 49.3T 16% /Share[OST:8]
Share-OST0009_UUID 58.6T 9.3T 49.2T 16% /Share[OST:9]
Share-OST000a_UUID 58.6T 9.1T 49.4T 16% /Share[OST:10]
Share-OST000b_UUID 58.6T 9.1T 49.4T 16% /Share[OST:11]
Share-OST000c_UUID 58.6T 9.2T 49.3T 16% /Share[OST:12]
Share-OST000d_UUID 58.6T 9.2T 49.4T 16% /Share[OST:13]
Share-OST000e_UUID 58.6T 9.4T 49.2T 16% /Share[OST:14]
Share-OST000f_UUID 58.6T 9.3T 49.3T 16% /Share[OST:15]
Share-OST0010_UUID 58.6T 9.4T 49.1T 16% /Share[OST:16]
Share-OST0011_UUID 58.6T 9.3T 49.2T 16% /Share[OST:17]
Share-OST0012_UUID 58.6T 9.4T 49.2T 16% /Share[OST:18]
Share-OST0013_UUID 58.6T 9.6T 49.0T 16% /Share[OST:19]
Share-OST0014_UUID 58.6T 9.5T 49.1T 16% /Share[OST:20]
Share-OST0015_UUID 58.6T 9.4T 49.2T 16% /Share[OST:21]
Share-OST0016_UUID 58.6T 9.2T 49.4T 16% /Share[OST:22]
Share-OST0017_UUID 58.6T 9.4T 49.2T 16% /Share[OST:23]
Share-OST0018_UUID 58.6T 9.3T 49.3T 16% /Share[OST:24]
Share-OST0019_UUID 58.6T 9.1T 49.5T 16% /Share[OST:25]
Share-OST001a_UUID 58.6T 9.4T 49.2T 16% /Share[OST:26]
Share-OST001b_UUID 58.6T 9.5T 49.1T 16% /Share[OST:27]
Share-OST001c_UUID 58.6T 9.4T 49.2T 16% /Share[OST:28]
Share-OST001d_UUID 58.6T 9.2T 49.4T 16% /Share[OST:29]
Share-OST001e_UUID 58.6T 9.1T 49.5T 16% /Share[OST:30]
Share-OST001f_UUID 58.6T 9.3T 49.3T 16% /Share[OST:31]
Share-OST0020_UUID 58.6T 9.2T 49.4T 16% /Share[OST:32]
Share-OST0021_UUID 58.6T 9.4T 49.2T 16% /Share[OST:33]
Share-OST0022_UUID 58.6T 9.3T 49.3T 16% /Share[OST:34]
Share-OST0023_UUID 58.6T 9.5T 49.1T 16% /Share[OST:35]
Share-OST0024_UUID 58.6T 9.3T 49.3T 16% /Share[OST:36]
Share-OST0025_UUID 58.6T 9.2T 49.3T 16% /Share[OST:37]
Share-OST0026_UUID 58.6T 9.5T 49.1T 16% /Share[OST:38]
Share-OST0027_UUID 58.6T 9.3T 49.2T 16% /Share[OST:39]
Share-OST0028_UUID 58.6T 206.4G 58.4T 0% /Share[OST:40]
Share-OST0029_UUID 58.6T 30.3G 58.5T 0% /Share[OST:41]
Share-OST002a_UUID 58.6T 29.0G 58.5T 0% /Share[OST:42]
Share-OST002b_UUID 58.6T 32.4G 58.5T 0% /Share[OST:43]
Share-OST002c_UUID 58.6T 29.9G 58.5T 0% /Share[OST:44]
Share-OST002d_UUID 58.6T 29.7G 58.5T 0% /Share[OST:45]
Share-OST002e_UUID 58.6T 30.6G 58.5T 0% /Share[OST:46]
Share-OST002f_UUID 58.6T 29.9G 58.5T 0% /Share[OST:47]
Share-OST0030_UUID 58.6T 31.9G 58.5T 0% /Share[OST:48]
Share-OST0031_UUID 58.6T 32.6G 58.5T 0% /Share[OST:49]
Share-OST0032_UUID 58.6T 23.5T 35.1T 40% /Share[OST:50]
Share-OST0033_UUID 58.6T 9.5T 49.1T 16% /Share[OST:51]
Share-OST0034_UUID 58.6T 9.3T 49.2T 16% /Share[OST:52]
Share-OST0035_UUID 58.6T 9.3T 49.2T 16% /Share[OST:53]
Share-OST0036_UUID 58.6T 9.2T 49.3T 16% /Share[OST:54]
Share-OST0037_UUID 58.6T 9.3T 49.3T 16% /Share[OST:55]
Share-OST0038_UUID 58.6T 9.3T 49.3T 16% /Share[OST:56]
Share-OST0039_UUID 58.6T 9.6T 49.0T 16% /Share[OST:57]
Share-OST003a_UUID 58.6T 9.1T 49.5T 16% /Share[OST:58]
Share-OST003b_UUID 58.6T 9.1T 49.5T 16% /Share[OST:59]
Share-OST003c_UUID 58.6T 31.9G 58.5T 0% /Share[OST:60]
Share-OST003d_UUID 58.6T 30.8G 58.5T 0% /Share[OST:61]
Share-OST003e_UUID 58.6T 31.0G 58.5T 0% /Share[OST:62]
Share-OST003f_UUID 58.6T 34.9G 58.5T 0% /Share[OST:63]
Share-OST0040_UUID 58.6T 31.6G 58.5T 0% /Share[OST:64]
Share-OST0041_UUID 58.6T 30.9G 58.5T 0% /Share[OST:65]
Share-OST0042_UUID 58.6T 75.6G 58.5T 0% /Share[OST:66]
Share-OST0043_UUID 58.6T 45.9G 58.5T 0% /Share[OST:67]
Share-OST0044_UUID 58.6T 33.2G 58.5T 0% /Share[OST:68]
Share-OST0045_UUID 58.6T 33.7G 58.5T 0% /Share[OST:69]
Share-OST0046_UUID 58.6T 9.1T 49.4T 16% /Share[OST:70]
Share-OST0047_UUID 58.6T 9.3T 49.3T 16% /Share[OST:71]
Share-OST0048_UUID 58.6T 9.3T 49.2T 16% /Share[OST:72]
Share-OST0049_UUID 58.6T 9.3T 49.2T 16% /Share[OST:73]
Share-OST004a_UUID 58.6T 9.1T 49.5T 16% /Share[OST:74]
Share-OST004b_UUID 58.6T 9.3T 49.3T 16% /Share[OST:75]
Share-OST004c_UUID 58.6T 9.2T 49.3T 16% /Share[OST:76]
Share-OST004d_UUID 58.6T 9.5T 49.1T 16% /Share[OST:77]
Share-OST004e_UUID 58.6T 9.4T 49.2T 16% /Share[OST:78]
Share-OST004f_UUID 58.6T 9.3T 49.3T 16% /Share[OST:79]

filesystem_summary: 4.6P 573.4T 4.0P 12% /Share

 

Comment by Andreas Dilger [ 20/Jul/18 ]

Closing this as a duplicate of LU-11115, which has a patch to address this issue.

Generated at Sat Feb 10 02:38:30 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.