Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.14.0
    • None
    • 9223372036854775807

    Description

      Add statistics on the client to measure and report latency of operations.

      The current llite.*.stats file only reports the operation counts, not the time taken for each one. It should be simple to add stats to measure min/max/sum/sumsq for these metrics, including new read and write operations for the latency, as currently it reports read_bytes and write_bytes, which should also be kept. It might make sense to report sync writes separately as write_sync (with file->f_flags & (O_DIRECT | O_SYNC) set) since the latency profile will be quite different compared to cached writes.

      Attachments

        Issue Links

          Activity

            [LU-12631] Report latency of client operations
            pjones Peter Jones added a comment -

            Landed for 2.14

            pjones Peter Jones added a comment - Landed for 2.14

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36078/
            Subject: LU-12631 llite: report latency for filesystem ops
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: ea58c4cfb0fc255befbbb7754bd4ed71704a2a2c

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36078/ Subject: LU-12631 llite: report latency for filesystem ops Project: fs/lustre-release Branch: master Current Patch Set: Commit: ea58c4cfb0fc255befbbb7754bd4ed71704a2a2c

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36078
            Subject: LU-12631 llite: report latency for filesystem ops
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: c2acdf859997d91efbedfbfc80ece00323ee636e

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36078 Subject: LU-12631 llite: report latency for filesystem ops Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c2acdf859997d91efbedfbfc80ece00323ee636e
            pjones Peter Jones added a comment -

            Jian

            Could you please assist with this?

            Thanks

            Peter

            pjones Peter Jones added a comment - Jian Could you please assist with this? Thanks Peter

            The place that this should be done is llite_opcode_table and add LPROCFS_CNTR_AVGMINMAX to all of the fields there, and maybe LPROCFS_CNTR_STDDEV to the main ones that correspond to actual userspace VFS operations, not necessarily the "internal" stats like LPROC_LL_ALLOC_INODE, LPROC_LL_GETXATTR_HITS, and LPROC_LL_INODE_PERM. The LPROCFS_TYPE_REGS type can be changed to LPROCFS_TYPE_USEC for the "usec" units since it isn't used anywhere.

            We need to record the start and end time for each operation using ktime_get() and only convert the times to usec units only when printed.

            adilger Andreas Dilger added a comment - The place that this should be done is llite_opcode_table and add LPROCFS_CNTR_AVGMINMAX to all of the fields there, and maybe LPROCFS_CNTR_STDDEV to the main ones that correspond to actual userspace VFS operations, not necessarily the "internal" stats like LPROC_LL_ALLOC_INODE , LPROC_LL_GETXATTR_HITS , and LPROC_LL_INODE_PERM . The LPROCFS_TYPE_REGS type can be changed to LPROCFS_TYPE_USEC for the " usec " units since it isn't used anywhere. We need to record the start and end time for each operation using ktime_get() and only convert the times to usec units only when printed.

            People

              adilger Andreas Dilger
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: