[LU-9413] stat->st_blksize and glibc buffering Created: 27/Apr/17 Updated: 24/May/17 Resolved: 24/May/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.9.0 |
| Fix Version/s: | Lustre 2.10.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Andrew Perepechko | Assignee: | WC Triage |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Epic/Theme: | Performance |
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
The issue has a detailed description in https://bugzilla.lustre.org/show_bug.cgi?id=12739 In short, for an open file Lustre returns st_blksize=4 MiB and glibc allocates a buffer of this size. Short random reads cause 4 MiB BRWs which ruin performance as compared to other distributed fs. It is often not possible or not practical to patch the program itself. The original ticket contains the assertion that it's a bug in glibc and should be fixed in glibc. While it was fixed in glibc 2.25 (https://sourceware.org/bugzilla/show_bug.cgi?id=4099#c10), this glibc version won't be used in Linux distros with Lustre support any time soon. How about adding a temporary workaround for this issue similar to the one proposed by Aurélien Degrémont in bz #12739? |
| Comments |
| Comment by Gerrit Updater [ 27/Apr/17 ] |
|
Andrew Perepechko (andrew.perepechko@seagate.com) uploaded a new patch: https://review.whamcloud.com/26869 |
| Comment by Andreas Dilger [ 18/May/17 ] |
|
I'm glad you used a config param instead of a module parameter in the updated patch. However, as Aurelien said in Bugzilla, this parameter causes problems on the nodes that it is enabled on. Have you done any benchmarks with this enabled? I think iozone has a buffered IO mode with a range of blocksizes, with this feature enabled and disabled? As I also wrote in that patch:
That said, I think the Connectathon issue could be fixed in several ways (e.g. set the st_blksize based on the filesystem default stripe size, rather than an arbitrary constant, or similar. Another option would be to use lfs ladvise to allow setting the blocksize on the file (in memory on the client inode, or possibly also on disk for ZFS |
| Comment by Gerrit Updater [ 24/May/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26869/ |
| Comment by Peter Jones [ 24/May/17 ] |
|
Landed for 2.10 |