Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9413

stat->st_blksize and glibc buffering

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.10.0
    • Lustre 2.9.0
    • 3
    • 9223372036854775807

    Description

      The issue has a detailed description in https://bugzilla.lustre.org/show_bug.cgi?id=12739

      In short, for an open file Lustre returns st_blksize=4 MiB and glibc allocates a buffer of this size. Short random reads cause 4 MiB BRWs which ruin performance as compared to other distributed fs.

      It is often not possible or not practical to patch the program itself.

      The original ticket contains the assertion that it's a bug in glibc and should be fixed in glibc. While it was fixed in glibc 2.25 (https://sourceware.org/bugzilla/show_bug.cgi?id=4099#c10), this glibc version won't be used in Linux distros with Lustre support any time soon.

      How about adding a temporary workaround for this issue similar to the one proposed by Aurélien Degrémont in bz #12739?

      Attachments

        Activity

          [LU-9413] stat->st_blksize and glibc buffering
          pjones Peter Jones added a comment -

          Landed for 2.10

          pjones Peter Jones added a comment - Landed for 2.10

          Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26869/
          Subject: LU-9413 llite: llite.stat_blocksize param for fixed st_blksize
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: f07576d3f82b50d74a858ccd60f7bdd0977a85a4

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26869/ Subject: LU-9413 llite: llite.stat_blocksize param for fixed st_blksize Project: fs/lustre-release Branch: master Current Patch Set: Commit: f07576d3f82b50d74a858ccd60f7bdd0977a85a4

          I'm glad you used a config param instead of a module parameter in the updated patch. However, as Aurelien said in Bugzilla, this parameter causes problems on the nodes that it is enabled on. Have you done any benchmarks with this enabled? I think iozone has a buffered IO mode with a range of blocksizes, with this feature enabled and disabled?

          As I also wrote in that patch:

          In Lustre 1.4 it was possible to change the stat() blocksize of a file by changing the file
          striping. The returned value was stripe_count * stripe_size, or 2MB, whichever was less.

          This was removed because NFS Connectathon failed an sanity test due to changing st_blksize if a file was mknod() (using an artifical st_blksize since the file has no layout) and then opened and it inherited a layout from the parent.

          That said, I think the Connectathon issue could be fixed in several ways (e.g. set the st_blksize based on the filesystem default stripe size, rather than an arbitrary constant, or similar.

          Another option would be to use lfs ladvise to allow setting the blocksize on the file (in memory on the client inode, or possibly also on disk for ZFS LU-8951). This would allow tuning the st_blksize value in a script before running the application, but it adds some complexity to run this on the file on every client for every time the application is running on a new file. Conceivably, st_blksize could be stored persistently in an xattr on the file (inherited from the parent?), but this is considerably more work.

          adilger Andreas Dilger added a comment - I'm glad you used a config param instead of a module parameter in the updated patch. However, as Aurelien said in Bugzilla, this parameter causes problems on the nodes that it is enabled on. Have you done any benchmarks with this enabled? I think iozone has a buffered IO mode with a range of blocksizes, with this feature enabled and disabled? As I also wrote in that patch: In Lustre 1.4 it was possible to change the stat() blocksize of a file by changing the file striping. The returned value was stripe_count * stripe_size, or 2MB, whichever was less. This was removed because NFS Connectathon failed an sanity test due to changing st_blksize if a file was mknod() (using an artifical st_blksize since the file has no layout) and then opened and it inherited a layout from the parent. That said, I think the Connectathon issue could be fixed in several ways (e.g. set the st_blksize based on the filesystem default stripe size, rather than an arbitrary constant, or similar. Another option would be to use lfs ladvise to allow setting the blocksize on the file (in memory on the client inode, or possibly also on disk for ZFS LU-8951 ). This would allow tuning the st_blksize value in a script before running the application, but it adds some complexity to run this on the file on every client for every time the application is running on a new file. Conceivably, st_blksize could be stored persistently in an xattr on the file (inherited from the parent?), but this is considerably more work.

          Andrew Perepechko (andrew.perepechko@seagate.com) uploaded a new patch: https://review.whamcloud.com/26869
          Subject: LU-9413 llite: llite.stat_blocksize param for fixed st_blksize
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 16964b5af530b59a3e7da73d62f3a8b1e0d8a4d0

          gerrit Gerrit Updater added a comment - Andrew Perepechko (andrew.perepechko@seagate.com) uploaded a new patch: https://review.whamcloud.com/26869 Subject: LU-9413 llite: llite.stat_blocksize param for fixed st_blksize Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 16964b5af530b59a3e7da73d62f3a8b1e0d8a4d0

          People

            wc-triage WC Triage
            panda Andrew Perepechko
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: