Uploaded image for project: 'Lustre Documentation'
  1. Lustre Documentation
  2. LUDOC-160

Add more info about metrics - offset (rpc_stats) and extents (extents_stats).

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • None
    • None
    • 8744

    Description

      On 2013-06-19, at 9:21, "Lee, Brett" <brett.lee@intel.com> wrote:
      > I understand the metric "offset" in the client rpc_stats file to indicate to the delta between where the client last read / wrote. Per a recent thread in hpdd-discuss it seems this metric differs between 1.x and 2.x systems, and am wondering what could have changed?

      To be honest, I don't know much of the details here. It might be that "offset" is relative to the previous read/write as you propose, or it could be the offset within the 1MB RPC. I would suspect the latter, because this gives a bounded range of values, and is the only critical metric for the RPC stack. Doing 1MB sized and aligned write RPCs (i.e. "offset = 0" per my definition) but with random file offsets should give nearly the same performance vs. 1MB sequential writes. In contrast, doing smaller RPCs would cause read-modify-write cycles on the underlying RAID device and should be avoided.

      > Am familiar with the block pointers and levels of indirection within an inode, but have recently read/understood that Ext4 does not support block pointers but instead uses extents. Can you confirm whether my understanding is correct,

      Yes it is correct. Ext4 and ldiskfs before it use extents instead of block pointers to reference the blocks allocated to a file. This can be up to 10000x more efficient on disk and in memory.

      > and if possible, do you have any "pointers" ('scuse the pun) to where I might learn more about extents.

      There is a paper presented at the Ottawa Linux Symposium authored by myself and Alex about ext4 features. I think it is referenced from the ext4 wiki page (sorry, don't have URL right now, Google is your friend).

      > My end goal is to better understand how, in the /proc/.../extents_stats file, small extents translate to small IO - on the surface it's obvious (small == small) but at a lower (inode) level I can't explain it.

      Smaller extents means that there are fewer contiguous blocks that can be read without a seek. Once the extent is 2MB or larger, the single seek per IO is not the dominant factor anymore (consider 2MB per seek * 150 seeks/second = 300MB/s, which is approaching the limit of the underlying storage bandwidth). The ext4 code tries to allocate at least 8MB extents when possible, but this is not always possible if the free space is fragmented. If the allocated extents are too small (< 1MB, because a larger one could not be found at the time) then there will be significant seek overhead during IO.

      So, the small extents are really just a symptom of the fragmented free space, and not the root cause. The same performance problem would have happened with block allocation (assuming the same free blocks at allocation time), but it wouldn't have been as easy to find.

      A useful feature that we developed (and is now in widespread use) is the File Extent Map (FIEMAP) ioctl. It allows efficienly extracting the on-disk layout from a file in an fs-agnostic format and display the fragmentation/sparseness of the file using the "filefrag" utility. Lustre can do this across the network as well, though it needs the Lustre version of e2fsprogs installed on the client for multi-striped files.

      Attachments

        Activity

          People

            jlevi Jodi Levi (Inactive)
            brett Brett Lee (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: