Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-29

obdfilter-survey doesn't work well if cpu_cores (/w hyperT) > 16

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 1.8.6
    • Lustre 1.8.6
    • None
    • 3
    • 22,980
    • 8550

    Description

      it seems obdfilter-survey is not working well on 12 cores system (can see 24 cores on OSS if hyper_thread=on).
      Here is quick results on 12 cores, 6 cores and 8 on same OSSs. For 6 and 8 cores, I turned CPUs off by "echo 0 > /sys/devices/system/cpu/cpuX/online" on 12 core system. (X5670, Westmere 6 cores x 2 sockets)
      Testing on "# of cpu cores <= 16" seems no problem, but on 24 cores, it can't be working well.
      This has been discussing on bug 22980, but still nothing solution to run obdfilter-survery on current Westmere box.

      #TEST-1 4xOSSs, 56OSTs(14 OSTs per OSS), 12 cores (# of CPU cores is 24)
      ost 56 sz 469762048K rsz 1024K obj 56 thr 56 write 3323.91 [ 39.96, 71.93] read 5967.91 [ 94.91, 127.93]
      ost 56 sz 469762048K rsz 1024K obj 56 thr 112 write 5807.10 [ 72.93, 120.77] read 6182.79 [ 96.91, 140.86]
      ost 56 sz 469762048K rsz 1024K obj 56 thr 224 write 6377.41 [ 75.93, 176.83] read 6193.18 [ 81.98, 139.86]
      ost 56 sz 469762048K rsz 1024K obj 56 thr 448 write 6279.64 [ 69.93, 185.83] read 6162.43 [ 77.88, 162.86]
      ost 56 sz 469762048K rsz 1024K obj 56 thr 896 write 6114.28 [ 9.99, 226.79] read 6017.08 [ 14.98, 220.80]
      ost 56 sz 469762048K rsz 1024K obj 56 thr 1792 write 6078.08 [ 8.99, 285.73] read 5923.64 [ 16.98, 161.85]
      ost 56 sz 469762048K rsz 1024K obj 56 thr 3584 write 6168.36 [ 76.92, 250.75] read 5828.33 [ 85.95, 174.77]

      #TEST-2 4xOSSs, 56OSTs(14 OSTs per OSS), 6 cores (# of CPU cores is 12, all physical cpu_id=1 are turned off)
      ost 56 sz 469762048K rsz 1024K obj 56 thr 56 write 3677.43 [ 36.97, 75.93] read 8355.91 [ 137.87, 168.85]
      ost 56 sz 469762048K rsz 1024K obj 56 thr 112 write 7045.25 [ 89.92, 141.87] read 10672.33 [ 153.87, 212.80]
      ost 56 sz 469762048K rsz 1024K obj 56 thr 224 write 9909.58 [ 116.88, 217.78] read 10235.82 [ 140.87, 203.83]
      ost 56 sz 469762048K rsz 1024K obj 56 thr 448 write 9796.21 [ 106.90, 214.80] read 10803.78 [ 142.87, 348.93]
      ost 56 sz 469762048K rsz 1024K obj 56 thr 896 write 9377.85 [ 54.95, 265.75] read 10700.27 [ 126.76, 279.74]
      ost 56 sz 469762048K rsz 1024K obj 56 thr 1792 write 9257.48 [ 0.00, 384.63] read 10726.18 [ 121.87, 291.74]
      ost 56 sz 469762048K rsz 1024K obj 56 thr 3584 write 9162.01 [ 0.00, 242.78] read 10627.94 [ 115.89, 271.74]

      #TEST-3 4xOSSx, 56OSTs(14 OSTs per OSS), 8 cores (# of CPU cores is 16, core_id=

      {2, 10}

      from both sockets are turned off)
      ost 56 sz 469762048K rsz 1024K obj 56 thr 56 write 3614.92 [ 43.96, 75.93] read 7919.40 [ 122.88, 169.84]
      ost 56 sz 469762048K rsz 1024K obj 56 thr 112 write 6703.91 [ 71.94, 135.87] read 9899.53 [ 156.87, 201.81]
      ost 56 sz 469762048K rsz 1024K obj 56 thr 224 write 9901.78 [ 123.88, 233.78] read 10401.05 [ 151.85, 202.81]
      ost 56 sz 469762048K rsz 1024K obj 56 thr 448 write 9721.29 [ 115.89, 212.80] read 10812.26 [ 151.86, 241.54]
      ost 56 sz 469762048K rsz 1024K obj 56 thr 896 write 9330.51 [ 94.91, 257.50] read 10672.22 [ 112.90, 342.66]
      ost 56 sz 469762048K rsz 1024K obj 56 thr 1792 write 9053.42 [ 22.98, 263.75] read 10657.08 [ 95.91, 286.73]
      ost 56 sz 469762048K rsz 1024K obj 56 thr 3584 write 9081.75 [ 45.96, 239.57] read 10562.43 [ 78.93, 270.75]

      Attachments

        Activity

          [LU-29] obdfilter-survey doesn't work well if cpu_cores (/w hyperT) > 16

          Niu, I'm investigating for test infrastructure on VMs (KVM: Kernel based Virtual Machine). Once apply the your patch and run obdfilter survey on VM, the performance is going to bad. Without the patch, I'm getting reasonable number even on VMs. So, the the patch seems have some impacts if I run obdfilter-survey on VM.

          I will file results and more information (will get oprofile on VM) in a couple of days.

          Ihara

          ihara Shuichi Ihara (Inactive) added a comment - Niu, I'm investigating for test infrastructure on VMs (KVM: Kernel based Virtual Machine). Once apply the your patch and run obdfilter survey on VM, the performance is going to bad. Without the patch, I'm getting reasonable number even on VMs. So, the the patch seems have some impacts if I run obdfilter-survey on VM. I will file results and more information (will get oprofile on VM) in a couple of days. Ihara

          Yes, I meant 22980, thanks Peter.

          Andreas, the HAVE_UNLOCKED_IOCTL is defined by the kernel which has the 'unlocked_ioctl' method.

          niu Niu Yawei (Inactive) added a comment - Yes, I meant 22980, thanks Peter. Andreas, the HAVE_UNLOCKED_IOCTL is defined by the kernel which has the 'unlocked_ioctl' method.

          I don't think there are any ioctls that depend on BKL, but I haven't looked through them closely. In particular, I'm not sure if there is proper serialization around the configuration ioctls or not.

          That said, since the configuration is almost always done by mount/unmount and not by the old lctl commands, I don't think this will be a serious risk, so I think it makes sense to move the Lustre ioctl handling over to ->unlocked_ioctl(). That should be done only for kernels which support the ->unlocked_ioctl() method, which means a configure check is needed to set HAVE_UNLOCKED_IOCTL if that method is present in struct file_operations.

          adilger Andreas Dilger added a comment - I don't think there are any ioctls that depend on BKL, but I haven't looked through them closely. In particular, I'm not sure if there is proper serialization around the configuration ioctls or not. That said, since the configuration is almost always done by mount/unmount and not by the old lctl commands, I don't think this will be a serious risk, so I think it makes sense to move the Lustre ioctl handling over to ->unlocked_ioctl(). That should be done only for kernels which support the ->unlocked_ioctl() method, which means a configure check is needed to set HAVE_UNLOCKED_IOCTL if that method is present in struct file_operations.
          pjones Peter Jones added a comment -

          As per Andreas, you probably mean bz 22980, rather than 22890. Yes please, can you attach your patch to the bz - thanks!

          pjones Peter Jones added a comment - As per Andreas, you probably mean bz 22980, rather than 22890. Yes please, can you attach your patch to the bz - thanks!

          Thanks for your good news, Ihara. Looks the patch works as we expected.

          Hi, Andreas

          The user space semaphore used to protect the shmem is another contention source, however, it looks not so severe as the BKL of each ioctl. Should we post the patch to the bug 22890 to see if it resovles the problem?

          BTW, I thought there isn't any Lustre ioctls depends on BKL, and it's safe to introduce 'unlocked_ioctl'. Could you confirm it?

          niu Niu Yawei (Inactive) added a comment - Thanks for your good news, Ihara. Looks the patch works as we expected. Hi, Andreas The user space semaphore used to protect the shmem is another contention source, however, it looks not so severe as the BKL of each ioctl. Should we post the patch to the bug 22890 to see if it resovles the problem? BTW, I thought there isn't any Lustre ioctls depends on BKL, and it's safe to introduce 'unlocked_ioctl'. Could you confirm it?
          ihara Shuichi Ihara (Inactive) added a comment - - edited

          Niu, sorry, it looked like something bad in the storage side when I did benchmark yesterday. Once I fixed the storage, tried obdfilter-survey with applied your patches. It seems patches fixe the problem on 24 core system and getting close number to when HT=off. Here is results on 12 cores (HT=off) and 24 cores (HT=on).

          # 12 cores (HT=off), 4 OSSs, 56 OSTs (14OSTs per OSS)
          ost 56 sz 469762048K rsz 1024K obj   56 thr   56 write 3546.88 [  37.96,  70.86] read 7633.11 [ 124.88, 156.85] 
          ost 56 sz 469762048K rsz 1024K obj   56 thr  112 write 6420.31 [  91.91, 130.75] read 10121.79 [ 159.70, 202.60] 
          ost 56 sz 469762048K rsz 1024K obj   56 thr  224 write 9576.76 [ 125.84, 216.80] read 10444.91 [ 167.84, 216.79] 
          ost 56 sz 469762048K rsz 1024K obj   56 thr  448 write 10264.63 [  98.95, 207.61] read 10972.26 [ 150.68, 232.78] 
          ost 56 sz 469762048K rsz 1024K obj   56 thr  896 write 9842.69 [  91.91, 305.69] read 10896.16 [ 121.89, 330.57] 
          ost 56 sz 469762048K rsz 1024K obj   56 thr 1792 write 9613.51 [  28.96, 251.70] read 10792.37 [ 123.88, 277.50] 
          ost 56 sz 469762048K rsz 1024K obj   56 thr 3584 write 9597.46 [   0.00, 253.78] read 10698.87 [ 118.89, 271.75] 
          
          # 24 cores (HT=on), 4 OSSs, 56 OSTs (14OSTs per OSS)
          ost 56 sz 469762048K rsz 1024K obj   56 thr   56 write 3345.48 [  42.96,  66.94] read 6981.70 [ 102.91, 153.86] 
          ost 56 sz 469762048K rsz 1024K obj   56 thr  112 write 6327.40 [  88.92, 128.89] read 9826.28 [ 156.85, 208.80] 
          ost 56 sz 469762048K rsz 1024K obj   56 thr  224 write 9792.45 [ 139.87, 218.77] read 10409.23 [ 173.84, 303.70] 
          ost 56 sz 469762048K rsz 1024K obj   56 thr  448 write 10262.20 [ 106.90, 235.78] read 10903.93 [ 157.86, 253.79] 
          ost 56 sz 469762048K rsz 1024K obj   56 thr  896 write 9905.94 [  98.91, 233.78] read 10829.35 [ 127.88, 266.75] 
          ost 56 sz 469762048K rsz 1024K obj   56 thr 1792 write 9656.78 [   6.99, 251.79] read 10761.36 [ 115.89, 333.68] 
          ost 56 sz 469762048K rsz 1024K obj   56 thr 3584 write 9596.28 [   0.00, 261.76] read 10742.13 [ 119.89, 324.68] 
          
          
          ihara Shuichi Ihara (Inactive) added a comment - - edited Niu, sorry, it looked like something bad in the storage side when I did benchmark yesterday. Once I fixed the storage, tried obdfilter-survey with applied your patches. It seems patches fixe the problem on 24 core system and getting close number to when HT=off. Here is results on 12 cores (HT=off) and 24 cores (HT=on). # 12 cores (HT=off), 4 OSSs, 56 OSTs (14OSTs per OSS) ost 56 sz 469762048K rsz 1024K obj 56 thr 56 write 3546.88 [ 37.96, 70.86] read 7633.11 [ 124.88, 156.85] ost 56 sz 469762048K rsz 1024K obj 56 thr 112 write 6420.31 [ 91.91, 130.75] read 10121.79 [ 159.70, 202.60] ost 56 sz 469762048K rsz 1024K obj 56 thr 224 write 9576.76 [ 125.84, 216.80] read 10444.91 [ 167.84, 216.79] ost 56 sz 469762048K rsz 1024K obj 56 thr 448 write 10264.63 [ 98.95, 207.61] read 10972.26 [ 150.68, 232.78] ost 56 sz 469762048K rsz 1024K obj 56 thr 896 write 9842.69 [ 91.91, 305.69] read 10896.16 [ 121.89, 330.57] ost 56 sz 469762048K rsz 1024K obj 56 thr 1792 write 9613.51 [ 28.96, 251.70] read 10792.37 [ 123.88, 277.50] ost 56 sz 469762048K rsz 1024K obj 56 thr 3584 write 9597.46 [ 0.00, 253.78] read 10698.87 [ 118.89, 271.75] # 24 cores (HT=on), 4 OSSs, 56 OSTs (14OSTs per OSS) ost 56 sz 469762048K rsz 1024K obj 56 thr 56 write 3345.48 [ 42.96, 66.94] read 6981.70 [ 102.91, 153.86] ost 56 sz 469762048K rsz 1024K obj 56 thr 112 write 6327.40 [ 88.92, 128.89] read 9826.28 [ 156.85, 208.80] ost 56 sz 469762048K rsz 1024K obj 56 thr 224 write 9792.45 [ 139.87, 218.77] read 10409.23 [ 173.84, 303.70] ost 56 sz 469762048K rsz 1024K obj 56 thr 448 write 10262.20 [ 106.90, 235.78] read 10903.93 [ 157.86, 253.79] ost 56 sz 469762048K rsz 1024K obj 56 thr 896 write 9905.94 [ 98.91, 233.78] read 10829.35 [ 127.88, 266.75] ost 56 sz 469762048K rsz 1024K obj 56 thr 1792 write 9656.78 [ 6.99, 251.79] read 10761.36 [ 115.89, 333.68] ost 56 sz 469762048K rsz 1024K obj 56 thr 3584 write 9596.28 [ 0.00, 261.76] read 10742.13 [ 119.89, 324.68]

          See also https://bugzilla.lustre.org/show_bug.cgi?id=22980#c18 for a similar issue. I suspect that the performance bottleneck may be in userspace, but we can only find out with some oprofile and/or lockmeter data.

          adilger Andreas Dilger added a comment - See also https://bugzilla.lustre.org/show_bug.cgi?id=22980#c18 for a similar issue. I suspect that the performance bottleneck may be in userspace, but we can only find out with some oprofile and/or lockmeter data.
          niu Niu Yawei (Inactive) added a comment - - edited

          Thank you, Ihara.

          Could you run a full test, and post all the output (like what you did in the first comment) to see if there is any differences?

          I suspect there is some other contention dragged down the performance, could you use oprofile to collect some data while running the test?

          btw, what's the kernel version?

          niu Niu Yawei (Inactive) added a comment - - edited Thank you, Ihara. Could you run a full test, and post all the output (like what you did in the first comment) to see if there is any differences? I suspect there is some other contention dragged down the performance, could you use oprofile to collect some data while running the test? btw, what's the kernel version?

          adjusted patch for 1.8.x

          ihara Shuichi Ihara (Inactive) added a comment - adjusted patch for 1.8.x

          btw, I've been testing this on lustre-1.8.4. So, I did some code adjustments from http://review.whamcloud.com/163 for lustre-1.8.

          Ihara

          ihara Shuichi Ihara (Inactive) added a comment - btw, I've been testing this on lustre-1.8.4. So, I did some code adjustments from http://review.whamcloud.com/163 for lustre-1.8. Ihara

          Niu,

          I just tested your latest patch, but obdfilter-suvery result is still low on 24 cores. Here is results.

          12 cores (HT=disabled)
          ost 56 sz 469762048K rsz 1024K obj 56 thr 896 write 9871.99 [ 85.75, 229.56] read 10802.02 [ 125.88, 309.74]

          24 cores (HT=enabled)
          ost 56 sz 469762048K rsz 1024K obj 56 thr 896 write 6076.08 [ 21.98, 557.93] read 5614.03 [ 12.98, 748.07]

          ihara Shuichi Ihara (Inactive) added a comment - Niu, I just tested your latest patch, but obdfilter-suvery result is still low on 24 cores. Here is results. 12 cores (HT=disabled) ost 56 sz 469762048K rsz 1024K obj 56 thr 896 write 9871.99 [ 85.75, 229.56] read 10802.02 [ 125.88, 309.74] 24 cores (HT=enabled) ost 56 sz 469762048K rsz 1024K obj 56 thr 896 write 6076.08 [ 21.98, 557.93] read 5614.03 [ 12.98, 748.07]

          People

            niu Niu Yawei (Inactive)
            ihara Shuichi Ihara (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: