[LU-9413] stat->st_blksize and glibc buffering Created: 27/Apr/17  Updated: 24/May/17  Resolved: 24/May/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.9.0
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Major
Reporter: Andrew Perepechko Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: patch

Epic/Theme: Performance
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

The issue has a detailed description in https://bugzilla.lustre.org/show_bug.cgi?id=12739

In short, for an open file Lustre returns st_blksize=4 MiB and glibc allocates a buffer of this size. Short random reads cause 4 MiB BRWs which ruin performance as compared to other distributed fs.

It is often not possible or not practical to patch the program itself.

The original ticket contains the assertion that it's a bug in glibc and should be fixed in glibc. While it was fixed in glibc 2.25 (https://sourceware.org/bugzilla/show_bug.cgi?id=4099#c10), this glibc version won't be used in Linux distros with Lustre support any time soon.

How about adding a temporary workaround for this issue similar to the one proposed by Aurélien Degrémont in bz #12739?



 Comments   
Comment by Gerrit Updater [ 27/Apr/17 ]

Andrew Perepechko (andrew.perepechko@seagate.com) uploaded a new patch: https://review.whamcloud.com/26869
Subject: LU-9413 llite: llite.stat_blocksize param for fixed st_blksize
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 16964b5af530b59a3e7da73d62f3a8b1e0d8a4d0

Comment by Andreas Dilger [ 18/May/17 ]

I'm glad you used a config param instead of a module parameter in the updated patch. However, as Aurelien said in Bugzilla, this parameter causes problems on the nodes that it is enabled on. Have you done any benchmarks with this enabled? I think iozone has a buffered IO mode with a range of blocksizes, with this feature enabled and disabled?

As I also wrote in that patch:

In Lustre 1.4 it was possible to change the stat() blocksize of a file by changing the file
striping. The returned value was stripe_count * stripe_size, or 2MB, whichever was less.

This was removed because NFS Connectathon failed an sanity test due to changing st_blksize if a file was mknod() (using an artifical st_blksize since the file has no layout) and then opened and it inherited a layout from the parent.

That said, I think the Connectathon issue could be fixed in several ways (e.g. set the st_blksize based on the filesystem default stripe size, rather than an arbitrary constant, or similar.

Another option would be to use lfs ladvise to allow setting the blocksize on the file (in memory on the client inode, or possibly also on disk for ZFS LU-8951). This would allow tuning the st_blksize value in a script before running the application, but it adds some complexity to run this on the file on every client for every time the application is running on a new file. Conceivably, st_blksize could be stored persistently in an xattr on the file (inherited from the parent?), but this is considerably more work.

Comment by Gerrit Updater [ 24/May/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26869/
Subject: LU-9413 llite: llite.stat_blocksize param for fixed st_blksize
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f07576d3f82b50d74a858ccd60f7bdd0977a85a4

Comment by Peter Jones [ 24/May/17 ]

Landed for 2.10

Generated at Sat Feb 10 02:25:59 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.