Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10823

max_create_count triggering uneven distribution across OSTs

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.10.3
    • None
    • Centos 7
    • 2
    • 9223372036854775807

    Description

      I set two OSTs on a file system to max_create_count=0 to facilitate migrating data off of them so I could destroy and recreate the zpools. Each OST has a corrupted spacemap.

       

      The OSTs in question:

       

      iliad-OST0018_UUID   56246673664 21707936640 34538672768  39% /iliad[OST:24]
      iliad-OST0019_UUID   56246646528 21155081600 35091486464  38% /iliad[OST:25]

      During the process of running lfs_migrate, I found that the very next OST in the list, OST:26 below, was getting a disproportionate number of files assigned to it. I set max_create_count=0 for that OST as well. I ran into the same issue on OST:27. Then OST:28 as soon as setting max_create_count=0 to OST:27.

       

      The file system was previously well balanced and is about 86% used.

       

      iliad-OST0013_UUID         52.4T 46.2T 6.1T  88% /iliad[OST:19]
      iliad-OST0014_UUID         52.4T 46.2T 6.2T  88% /iliad[OST:20]
      iliad-OST0015_UUID         52.4T 46.5T 5.9T  89% /iliad[OST:21]
      iliad-OST0016_UUID         52.4T 46.3T 6.1T  88% /iliad[OST:22]
      iliad-OST0017_UUID         52.4T 45.1T 7.3T  86% /iliad[OST:23]
      iliad-OST0018_UUID         52.4T 20.1T 32.3T  38% /iliad[OST:24]
      iliad-OST0019_UUID         52.4T 19.6T 32.7T  37% /iliad[OST:25]
      iliad-OST001a_UUID         52.4T 49.1T 3.3T  94% /iliad[OST:26]
      iliad-OST001b_UUID         52.4T 51.6T 841.1G  98% /iliad[OST:27]
      iliad-OST001c_UUID         52.4T 47.6T 4.8T  91% /iliad[OST:28]

       

      Last week I upgraded the Metadata / messaging and object store servers from Centos 6 running lustre 2.7 to Centos 7 running lustre 2.10.3 with ZFS 0.7.5. The MDT is still ldiskfs.

       

      I ran a couple of quick tests to adjust the balancing which may help diagnose the issue.

       

      While actively migrating files from OST 24,25 to the rest of the FS, I tested adjusting qos_prio_free=95

       

      $ date; lfs df /iliad  | egrep "OST:(12|28)"                                                                                                        
      Fri Mar 16 19:10:10 UTC 2018
      iliad-OST000c_UUID   56246866176 49690633216  6556138368 88% /iliad[OST:12]
      iliad-OST001c_UUID   56246898304 51048918400  5191627264 91% /iliad[OST:28]

      $ date; lfs df /iliad  | egrep "OST:(12|28)"
      Fri Mar 16 19:18:11 UTC 2018
      iliad-OST000c_UUID   56246865920 49691904768  6554891648 88% /iliad[OST:12]
      iliad-OST001c_UUID   56246902912 51064366592  5182073984 91% /iliad[OST:28]

      Change for OST12: (49691904768-49690633216) / 1024 = 1241.75

      Change for OST28: (51064366592-51048918400) / 1024 = 15086.125

       

      Looks like it's allocating more than ten times the amount you'd expect – worse yet since OST28 is fuller than most. I then set qos_prio_free back to default and set qos_threshold_rr to 100 to attempt to get straight round robin balancing.

       

      $ date; lfs df /iliad  | egrep "OST:(12|28)"
      Fri Mar 16 19:18:25 UTC 2018
      iliad-OST000c_UUID   56246865920 49691904768  6554891648 88% /iliad[OST:12]
      iliad-OST001c_UUID   56246902912 51064366592  5182106624 91% /iliad[OST:28]

      $ date; lfs df /iliad  | egrep "OST:(12|28)"
      Fri Mar 16 19:32:28 UTC 2018
      iliad-OST000c_UUID   56246865280 49696041856  6550753920 88% /iliad[OST:12]
      iliad-OST001c_UUID   56246904448 51070885248  5175770368 91% /iliad[OST:28]

       

      (49696041856-49691904768) / 1024 = 4040.125

      (51070885248-51064366592) / 1024 = 6365.875

       

      That seems to be in the realm of blind round robin for written files.

       

      Perhaps max_create_count isn't taken into account during the balancing algorithm and the file goes to the next available OST in order. In this case, would I be better off deactivating the OSTs?

      Attachments

        Issue Links

          Activity

            [LU-10823] max_create_count triggering uneven distribution across OSTs

            Closing this as a duplicate of LU-11115, which has a patch to address this issue.

            adilger Andreas Dilger added a comment - Closing this as a duplicate of LU-11115 , which has a patch to address this issue.

            We have encounter same problem.

            Environemnt:

            • OS: RHEL 7.4
            • Lustre: 2.10.2
            • ZFS: 0.7.3 (OST used ZFS)

            We set some OST (40-49, 60-69) max_count_create=0 early. Yesterday, wo found a OST50 in the high read operations which cause some user access lustre slowly, then execute `lfs df -i` to found that OST50 usage have more than others.

            I create a test that touch about 160 files, found that OST50 have heen allocated 10 times more than others. Then i set qos_threshold_rr to 50, run test agin, it seems to be work in the round robin for written files.

            lfs df -h

            UUID bytes Used Available Use% Mounted on
            Share-MDT0000_UUID 1.6T 5.7G 1.5T 0% /Share[MDT:0]
            Share-OST0000_UUID 58.6T 9.2T 49.4T 16% /Share[OST:0]
            Share-OST0001_UUID 58.6T 9.3T 49.3T 16% /Share[OST:1]
            Share-OST0002_UUID 58.6T 9.3T 49.3T 16% /Share[OST:2]
            Share-OST0003_UUID 58.6T 9.3T 49.3T 16% /Share[OST:3]
            Share-OST0004_UUID 58.6T 9.2T 49.4T 16% /Share[OST:4]
            Share-OST0005_UUID 58.6T 9.3T 49.3T 16% /Share[OST:5]
            Share-OST0006_UUID 58.6T 9.4T 49.1T 16% /Share[OST:6]
            Share-OST0007_UUID 58.6T 9.3T 49.3T 16% /Share[OST:7]
            Share-OST0008_UUID 58.6T 9.3T 49.3T 16% /Share[OST:8]
            Share-OST0009_UUID 58.6T 9.3T 49.2T 16% /Share[OST:9]
            Share-OST000a_UUID 58.6T 9.1T 49.4T 16% /Share[OST:10]
            Share-OST000b_UUID 58.6T 9.1T 49.4T 16% /Share[OST:11]
            Share-OST000c_UUID 58.6T 9.2T 49.3T 16% /Share[OST:12]
            Share-OST000d_UUID 58.6T 9.2T 49.4T 16% /Share[OST:13]
            Share-OST000e_UUID 58.6T 9.4T 49.2T 16% /Share[OST:14]
            Share-OST000f_UUID 58.6T 9.3T 49.3T 16% /Share[OST:15]
            Share-OST0010_UUID 58.6T 9.4T 49.1T 16% /Share[OST:16]
            Share-OST0011_UUID 58.6T 9.3T 49.2T 16% /Share[OST:17]
            Share-OST0012_UUID 58.6T 9.4T 49.2T 16% /Share[OST:18]
            Share-OST0013_UUID 58.6T 9.6T 49.0T 16% /Share[OST:19]
            Share-OST0014_UUID 58.6T 9.5T 49.1T 16% /Share[OST:20]
            Share-OST0015_UUID 58.6T 9.4T 49.2T 16% /Share[OST:21]
            Share-OST0016_UUID 58.6T 9.2T 49.4T 16% /Share[OST:22]
            Share-OST0017_UUID 58.6T 9.4T 49.2T 16% /Share[OST:23]
            Share-OST0018_UUID 58.6T 9.3T 49.3T 16% /Share[OST:24]
            Share-OST0019_UUID 58.6T 9.1T 49.5T 16% /Share[OST:25]
            Share-OST001a_UUID 58.6T 9.4T 49.2T 16% /Share[OST:26]
            Share-OST001b_UUID 58.6T 9.5T 49.1T 16% /Share[OST:27]
            Share-OST001c_UUID 58.6T 9.4T 49.2T 16% /Share[OST:28]
            Share-OST001d_UUID 58.6T 9.2T 49.4T 16% /Share[OST:29]
            Share-OST001e_UUID 58.6T 9.1T 49.5T 16% /Share[OST:30]
            Share-OST001f_UUID 58.6T 9.3T 49.3T 16% /Share[OST:31]
            Share-OST0020_UUID 58.6T 9.2T 49.4T 16% /Share[OST:32]
            Share-OST0021_UUID 58.6T 9.4T 49.2T 16% /Share[OST:33]
            Share-OST0022_UUID 58.6T 9.3T 49.3T 16% /Share[OST:34]
            Share-OST0023_UUID 58.6T 9.5T 49.1T 16% /Share[OST:35]
            Share-OST0024_UUID 58.6T 9.3T 49.3T 16% /Share[OST:36]
            Share-OST0025_UUID 58.6T 9.2T 49.3T 16% /Share[OST:37]
            Share-OST0026_UUID 58.6T 9.5T 49.1T 16% /Share[OST:38]
            Share-OST0027_UUID 58.6T 9.3T 49.2T 16% /Share[OST:39]
            Share-OST0028_UUID 58.6T 206.4G 58.4T 0% /Share[OST:40]
            Share-OST0029_UUID 58.6T 30.3G 58.5T 0% /Share[OST:41]
            Share-OST002a_UUID 58.6T 29.0G 58.5T 0% /Share[OST:42]
            Share-OST002b_UUID 58.6T 32.4G 58.5T 0% /Share[OST:43]
            Share-OST002c_UUID 58.6T 29.9G 58.5T 0% /Share[OST:44]
            Share-OST002d_UUID 58.6T 29.7G 58.5T 0% /Share[OST:45]
            Share-OST002e_UUID 58.6T 30.6G 58.5T 0% /Share[OST:46]
            Share-OST002f_UUID 58.6T 29.9G 58.5T 0% /Share[OST:47]
            Share-OST0030_UUID 58.6T 31.9G 58.5T 0% /Share[OST:48]
            Share-OST0031_UUID 58.6T 32.6G 58.5T 0% /Share[OST:49]
            Share-OST0032_UUID 58.6T 23.5T 35.1T 40% /Share[OST:50]
            Share-OST0033_UUID 58.6T 9.5T 49.1T 16% /Share[OST:51]
            Share-OST0034_UUID 58.6T 9.3T 49.2T 16% /Share[OST:52]
            Share-OST0035_UUID 58.6T 9.3T 49.2T 16% /Share[OST:53]
            Share-OST0036_UUID 58.6T 9.2T 49.3T 16% /Share[OST:54]
            Share-OST0037_UUID 58.6T 9.3T 49.3T 16% /Share[OST:55]
            Share-OST0038_UUID 58.6T 9.3T 49.3T 16% /Share[OST:56]
            Share-OST0039_UUID 58.6T 9.6T 49.0T 16% /Share[OST:57]
            Share-OST003a_UUID 58.6T 9.1T 49.5T 16% /Share[OST:58]
            Share-OST003b_UUID 58.6T 9.1T 49.5T 16% /Share[OST:59]
            Share-OST003c_UUID 58.6T 31.9G 58.5T 0% /Share[OST:60]
            Share-OST003d_UUID 58.6T 30.8G 58.5T 0% /Share[OST:61]
            Share-OST003e_UUID 58.6T 31.0G 58.5T 0% /Share[OST:62]
            Share-OST003f_UUID 58.6T 34.9G 58.5T 0% /Share[OST:63]
            Share-OST0040_UUID 58.6T 31.6G 58.5T 0% /Share[OST:64]
            Share-OST0041_UUID 58.6T 30.9G 58.5T 0% /Share[OST:65]
            Share-OST0042_UUID 58.6T 75.6G 58.5T 0% /Share[OST:66]
            Share-OST0043_UUID 58.6T 45.9G 58.5T 0% /Share[OST:67]
            Share-OST0044_UUID 58.6T 33.2G 58.5T 0% /Share[OST:68]
            Share-OST0045_UUID 58.6T 33.7G 58.5T 0% /Share[OST:69]
            Share-OST0046_UUID 58.6T 9.1T 49.4T 16% /Share[OST:70]
            Share-OST0047_UUID 58.6T 9.3T 49.3T 16% /Share[OST:71]
            Share-OST0048_UUID 58.6T 9.3T 49.2T 16% /Share[OST:72]
            Share-OST0049_UUID 58.6T 9.3T 49.2T 16% /Share[OST:73]
            Share-OST004a_UUID 58.6T 9.1T 49.5T 16% /Share[OST:74]
            Share-OST004b_UUID 58.6T 9.3T 49.3T 16% /Share[OST:75]
            Share-OST004c_UUID 58.6T 9.2T 49.3T 16% /Share[OST:76]
            Share-OST004d_UUID 58.6T 9.5T 49.1T 16% /Share[OST:77]
            Share-OST004e_UUID 58.6T 9.4T 49.2T 16% /Share[OST:78]
            Share-OST004f_UUID 58.6T 9.3T 49.3T 16% /Share[OST:79]

            filesystem_summary: 4.6P 573.4T 4.0P 12% /Share

             

            wutaizeng Taizeng Wu (Inactive) added a comment - We have encounter same problem. Environemnt: OS: RHEL 7.4 Lustre: 2.10.2 ZFS: 0.7.3 (OST used ZFS) We set some OST (40-49, 60-69) max_count_create=0 early. Yesterday, wo found a OST50 in the high read operations which cause some user access lustre slowly, then execute `lfs df -i` to found that OST50 usage have more than others. I create a test that touch about 160 files, found that OST50 have heen allocated 10 times more than others. Then i set qos_threshold_rr to 50, run test agin, it seems to be work in the round robin for written files. lfs df -h UUID bytes Used Available Use% Mounted on Share-MDT0000_UUID 1.6T 5.7G 1.5T 0% /Share [MDT:0] Share-OST0000_UUID 58.6T 9.2T 49.4T 16% /Share [OST:0] Share-OST0001_UUID 58.6T 9.3T 49.3T 16% /Share [OST:1] Share-OST0002_UUID 58.6T 9.3T 49.3T 16% /Share [OST:2] Share-OST0003_UUID 58.6T 9.3T 49.3T 16% /Share [OST:3] Share-OST0004_UUID 58.6T 9.2T 49.4T 16% /Share [OST:4] Share-OST0005_UUID 58.6T 9.3T 49.3T 16% /Share [OST:5] Share-OST0006_UUID 58.6T 9.4T 49.1T 16% /Share [OST:6] Share-OST0007_UUID 58.6T 9.3T 49.3T 16% /Share [OST:7] Share-OST0008_UUID 58.6T 9.3T 49.3T 16% /Share [OST:8] Share-OST0009_UUID 58.6T 9.3T 49.2T 16% /Share [OST:9] Share-OST000a_UUID 58.6T 9.1T 49.4T 16% /Share [OST:10] Share-OST000b_UUID 58.6T 9.1T 49.4T 16% /Share [OST:11] Share-OST000c_UUID 58.6T 9.2T 49.3T 16% /Share [OST:12] Share-OST000d_UUID 58.6T 9.2T 49.4T 16% /Share [OST:13] Share-OST000e_UUID 58.6T 9.4T 49.2T 16% /Share [OST:14] Share-OST000f_UUID 58.6T 9.3T 49.3T 16% /Share [OST:15] Share-OST0010_UUID 58.6T 9.4T 49.1T 16% /Share [OST:16] Share-OST0011_UUID 58.6T 9.3T 49.2T 16% /Share [OST:17] Share-OST0012_UUID 58.6T 9.4T 49.2T 16% /Share [OST:18] Share-OST0013_UUID 58.6T 9.6T 49.0T 16% /Share [OST:19] Share-OST0014_UUID 58.6T 9.5T 49.1T 16% /Share [OST:20] Share-OST0015_UUID 58.6T 9.4T 49.2T 16% /Share [OST:21] Share-OST0016_UUID 58.6T 9.2T 49.4T 16% /Share [OST:22] Share-OST0017_UUID 58.6T 9.4T 49.2T 16% /Share [OST:23] Share-OST0018_UUID 58.6T 9.3T 49.3T 16% /Share [OST:24] Share-OST0019_UUID 58.6T 9.1T 49.5T 16% /Share [OST:25] Share-OST001a_UUID 58.6T 9.4T 49.2T 16% /Share [OST:26] Share-OST001b_UUID 58.6T 9.5T 49.1T 16% /Share [OST:27] Share-OST001c_UUID 58.6T 9.4T 49.2T 16% /Share [OST:28] Share-OST001d_UUID 58.6T 9.2T 49.4T 16% /Share [OST:29] Share-OST001e_UUID 58.6T 9.1T 49.5T 16% /Share [OST:30] Share-OST001f_UUID 58.6T 9.3T 49.3T 16% /Share [OST:31] Share-OST0020_UUID 58.6T 9.2T 49.4T 16% /Share [OST:32] Share-OST0021_UUID 58.6T 9.4T 49.2T 16% /Share [OST:33] Share-OST0022_UUID 58.6T 9.3T 49.3T 16% /Share [OST:34] Share-OST0023_UUID 58.6T 9.5T 49.1T 16% /Share [OST:35] Share-OST0024_UUID 58.6T 9.3T 49.3T 16% /Share [OST:36] Share-OST0025_UUID 58.6T 9.2T 49.3T 16% /Share [OST:37] Share-OST0026_UUID 58.6T 9.5T 49.1T 16% /Share [OST:38] Share-OST0027_UUID 58.6T 9.3T 49.2T 16% /Share [OST:39] Share-OST0028_UUID 58.6T 206.4G 58.4T 0% /Share [OST:40] Share-OST0029_UUID 58.6T 30.3G 58.5T 0% /Share [OST:41] Share-OST002a_UUID 58.6T 29.0G 58.5T 0% /Share [OST:42] Share-OST002b_UUID 58.6T 32.4G 58.5T 0% /Share [OST:43] Share-OST002c_UUID 58.6T 29.9G 58.5T 0% /Share [OST:44] Share-OST002d_UUID 58.6T 29.7G 58.5T 0% /Share [OST:45] Share-OST002e_UUID 58.6T 30.6G 58.5T 0% /Share [OST:46] Share-OST002f_UUID 58.6T 29.9G 58.5T 0% /Share [OST:47] Share-OST0030_UUID 58.6T 31.9G 58.5T 0% /Share [OST:48] Share-OST0031_UUID 58.6T 32.6G 58.5T 0% /Share [OST:49] Share-OST0032_UUID 58.6T 23.5T 35.1T 40% /Share [OST:50] Share-OST0033_UUID 58.6T 9.5T 49.1T 16% /Share [OST:51] Share-OST0034_UUID 58.6T 9.3T 49.2T 16% /Share [OST:52] Share-OST0035_UUID 58.6T 9.3T 49.2T 16% /Share [OST:53] Share-OST0036_UUID 58.6T 9.2T 49.3T 16% /Share [OST:54] Share-OST0037_UUID 58.6T 9.3T 49.3T 16% /Share [OST:55] Share-OST0038_UUID 58.6T 9.3T 49.3T 16% /Share [OST:56] Share-OST0039_UUID 58.6T 9.6T 49.0T 16% /Share [OST:57] Share-OST003a_UUID 58.6T 9.1T 49.5T 16% /Share [OST:58] Share-OST003b_UUID 58.6T 9.1T 49.5T 16% /Share [OST:59] Share-OST003c_UUID 58.6T 31.9G 58.5T 0% /Share [OST:60] Share-OST003d_UUID 58.6T 30.8G 58.5T 0% /Share [OST:61] Share-OST003e_UUID 58.6T 31.0G 58.5T 0% /Share [OST:62] Share-OST003f_UUID 58.6T 34.9G 58.5T 0% /Share [OST:63] Share-OST0040_UUID 58.6T 31.6G 58.5T 0% /Share [OST:64] Share-OST0041_UUID 58.6T 30.9G 58.5T 0% /Share [OST:65] Share-OST0042_UUID 58.6T 75.6G 58.5T 0% /Share [OST:66] Share-OST0043_UUID 58.6T 45.9G 58.5T 0% /Share [OST:67] Share-OST0044_UUID 58.6T 33.2G 58.5T 0% /Share [OST:68] Share-OST0045_UUID 58.6T 33.7G 58.5T 0% /Share [OST:69] Share-OST0046_UUID 58.6T 9.1T 49.4T 16% /Share [OST:70] Share-OST0047_UUID 58.6T 9.3T 49.3T 16% /Share [OST:71] Share-OST0048_UUID 58.6T 9.3T 49.2T 16% /Share [OST:72] Share-OST0049_UUID 58.6T 9.3T 49.2T 16% /Share [OST:73] Share-OST004a_UUID 58.6T 9.1T 49.5T 16% /Share [OST:74] Share-OST004b_UUID 58.6T 9.3T 49.3T 16% /Share [OST:75] Share-OST004c_UUID 58.6T 9.2T 49.3T 16% /Share [OST:76] Share-OST004d_UUID 58.6T 9.5T 49.1T 16% /Share [OST:77] Share-OST004e_UUID 58.6T 9.4T 49.2T 16% /Share [OST:78] Share-OST004f_UUID 58.6T 9.3T 49.3T 16% /Share [OST:79] filesystem_summary: 4.6P 573.4T 4.0P 12% /Share  

            People

              wc-triage WC Triage
              jstroik Jesse Stroik
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: