Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
None
-
2
-
9223372036854775807
Description
On 2.12, we observe a 30 GiB/s throughput, while on 2.15, we see 25 GiB/s.
Both perf captures and ftrace data (with the help of git blame) have led me down the path to a root cause:
commit d0337cab8e845efcdbfb9e26e573feb18f28e303 Author: Mr NeilBrown <neilb@suse.de> Date: Wed Dec 9 13:00:16 2020 +1100 LU-14195 osd: don't use set_fs() for ->fiemap() calls. ->fiemap() only accesses kernel-space data, so does not need, and never has needed, set_fs() calls. In Linux 5.10, these calls are deprecated. So remove the unnecessary code.
If I revert the above commit for LU-14195, the obdfilter-survey rewrite throughput is immediately restored.
Reverting this commit, performance is actually a bit improved to 31 GiB/s.
Note - Example obdfilter-survey command:
tests_str="write rewrite" nobjlo=1 nobjhi=1 thrlo=1024 thrhi=1024 rszlo=4096 rszhi=4096 size=524288 obdfilter-survey
–
After speaking with Alexander Boyko about this exact issue and the impact it will have once we move to a post 5.10 Linux kernel:
Petros Koutoupis 8:06 AM Hello. I do have a question though. Does it make sense to you how d0337cab8 would impact the obdflilter-survey rewrites (by that much too)? Alexander Boyko 8:18 AM yeap I had a discussion with team. obdfilter executes from ioctl and doesnot have KERNEL_DS Petros Koutoupis 8:18 AM I ask only because in 5.10 set_fs is deprecated ok Alexander Boyko 8:20 AM so without a revert it fall to copy_from_user, it returns EFAULT and fiemap handle this as worst case this should not affect general Lustre IO
More comments from Alexander:
Unfortunately the LU-14195 patch has a more serious impact on whole Lustre, not only obdfilter-survey. Yesterday I discussed it with Andrew Perepechko, the current idea is
ldiskfs_fiemap..()->..->copy_to_user()
returns EFAULT. EFAULT leads to unmapped block logic, so Lustre would call block allocator, calculate grants, call quota logic for overwrite. All Lustre ioctl requests are affected. There is no simple way for fixing it at new kernels, since all parts with set_fs() were changed to ITER_KVEC. But using set_fs() for fiemap is only Lustre trick, and internal fiemap logic does not support iterator. I think LU-14195 should be reverted until 5.x kernel.