Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 1.8.6
-
None
-
3
-
22,980
-
8550
Description
it seems obdfilter-survey is not working well on 12 cores system (can see 24 cores on OSS if hyper_thread=on).
Here is quick results on 12 cores, 6 cores and 8 on same OSSs. For 6 and 8 cores, I turned CPUs off by "echo 0 > /sys/devices/system/cpu/cpuX/online" on 12 core system. (X5670, Westmere 6 cores x 2 sockets)
Testing on "# of cpu cores <= 16" seems no problem, but on 24 cores, it can't be working well.
This has been discussing on bug 22980, but still nothing solution to run obdfilter-survery on current Westmere box.
#TEST-1 4xOSSs, 56OSTs(14 OSTs per OSS), 12 cores (# of CPU cores is 24)
ost 56 sz 469762048K rsz 1024K obj 56 thr 56 write 3323.91 [ 39.96, 71.93] read 5967.91 [ 94.91, 127.93]
ost 56 sz 469762048K rsz 1024K obj 56 thr 112 write 5807.10 [ 72.93, 120.77] read 6182.79 [ 96.91, 140.86]
ost 56 sz 469762048K rsz 1024K obj 56 thr 224 write 6377.41 [ 75.93, 176.83] read 6193.18 [ 81.98, 139.86]
ost 56 sz 469762048K rsz 1024K obj 56 thr 448 write 6279.64 [ 69.93, 185.83] read 6162.43 [ 77.88, 162.86]
ost 56 sz 469762048K rsz 1024K obj 56 thr 896 write 6114.28 [ 9.99, 226.79] read 6017.08 [ 14.98, 220.80]
ost 56 sz 469762048K rsz 1024K obj 56 thr 1792 write 6078.08 [ 8.99, 285.73] read 5923.64 [ 16.98, 161.85]
ost 56 sz 469762048K rsz 1024K obj 56 thr 3584 write 6168.36 [ 76.92, 250.75] read 5828.33 [ 85.95, 174.77]
#TEST-2 4xOSSs, 56OSTs(14 OSTs per OSS), 6 cores (# of CPU cores is 12, all physical cpu_id=1 are turned off)
ost 56 sz 469762048K rsz 1024K obj 56 thr 56 write 3677.43 [ 36.97, 75.93] read 8355.91 [ 137.87, 168.85]
ost 56 sz 469762048K rsz 1024K obj 56 thr 112 write 7045.25 [ 89.92, 141.87] read 10672.33 [ 153.87, 212.80]
ost 56 sz 469762048K rsz 1024K obj 56 thr 224 write 9909.58 [ 116.88, 217.78] read 10235.82 [ 140.87, 203.83]
ost 56 sz 469762048K rsz 1024K obj 56 thr 448 write 9796.21 [ 106.90, 214.80] read 10803.78 [ 142.87, 348.93]
ost 56 sz 469762048K rsz 1024K obj 56 thr 896 write 9377.85 [ 54.95, 265.75] read 10700.27 [ 126.76, 279.74]
ost 56 sz 469762048K rsz 1024K obj 56 thr 1792 write 9257.48 [ 0.00, 384.63] read 10726.18 [ 121.87, 291.74]
ost 56 sz 469762048K rsz 1024K obj 56 thr 3584 write 9162.01 [ 0.00, 242.78] read 10627.94 [ 115.89, 271.74]
#TEST-3 4xOSSx, 56OSTs(14 OSTs per OSS), 8 cores (# of CPU cores is 16, core_id=
{2, 10} from both sockets are turned off)
ost 56 sz 469762048K rsz 1024K obj 56 thr 56 write 3614.92 [ 43.96, 75.93] read 7919.40 [ 122.88, 169.84]
ost 56 sz 469762048K rsz 1024K obj 56 thr 112 write 6703.91 [ 71.94, 135.87] read 9899.53 [ 156.87, 201.81]
ost 56 sz 469762048K rsz 1024K obj 56 thr 224 write 9901.78 [ 123.88, 233.78] read 10401.05 [ 151.85, 202.81]
ost 56 sz 469762048K rsz 1024K obj 56 thr 448 write 9721.29 [ 115.89, 212.80] read 10812.26 [ 151.86, 241.54]
ost 56 sz 469762048K rsz 1024K obj 56 thr 896 write 9330.51 [ 94.91, 257.50] read 10672.22 [ 112.90, 342.66]
ost 56 sz 469762048K rsz 1024K obj 56 thr 1792 write 9053.42 [ 22.98, 263.75] read 10657.08 [ 95.91, 286.73]
ost 56 sz 469762048K rsz 1024K obj 56 thr 3584 write 9081.75 [ 45.96, 239.57] read 10562.43 [ 78.93, 270.75]
I don't think there are any ioctls that depend on BKL, but I haven't looked through them closely. In particular, I'm not sure if there is proper serialization around the configuration ioctls or not.
That said, since the configuration is almost always done by mount/unmount and not by the old lctl commands, I don't think this will be a serious risk, so I think it makes sense to move the Lustre ioctl handling over to ->unlocked_ioctl(). That should be done only for kernels which support the ->unlocked_ioctl() method, which means a configure check is needed to set HAVE_UNLOCKED_IOCTL if that method is present in struct file_operations.