Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7195

Allow for static string content for jobstats jobid_var

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.8.0
    • Lustre 2.7.0, Lustre 2.5.3
    • RHEL 6.6
    • 9223372036854775807

    Description

      We've been benchmarking I/O performance (mainly metadata operations) with job stats enabled. There's potential for a performance impact when using the environment variable setup. This performance degradation appears to be associated with the environment variable lookup. The impact when using the special procname_uid setting is negligible.

      To counter this, we would like to see the ability to support a static string that's not evaluated as an environment variable, but is simply passed along with the rpc.

      I would like to propose using a prefix to the jobid_var variable to indicate that it should be passed, not evaluated. I think it would make sense to use a symbol like @ for this prefix. I'm basing this on my assumption that environment variables compliant with IEEE Std 1003.1-2001 will not contain the at-sign. This would allow administrators to statically set this at a job start, or using the client's hostname, etc, without the overhead of the environment lookup. This also allows us the ability to take this out of the user's control without resorting to read-only variables in their environments.

      Examples of use:

      Associating traffic per-host: lctl set_param jobid_var="@$(hostname)"

      Associating traffic with a specific string: lctl set_param jobid_var="@benchmarking"

      From my understanding, it looks like this would be a pretty straight forward change to the obd class, within the lustre_get_jobid function. I have a potential patch I can push to master if this is a behavior we want supported.

      Thanks!

      Jesse

      Attachments

        Issue Links

          Activity

            [LU-7195] Allow for static string content for jobstats jobid_var

            Please create a new ticket.

            simmonsja James A Simmons added a comment - Please create a new ticket.

            Ben Evans (bevans@cray.com) uploaded a new patch: https://review.whamcloud.com/25208
            Subject: LU-7195 jobstats: Create a pid-based hash for jobid values
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: c9eb53d6b65325f4b3715e56d59947b07c8d8fe1

            gerrit Gerrit Updater added a comment - Ben Evans (bevans@cray.com) uploaded a new patch: https://review.whamcloud.com/25208 Subject: LU-7195 jobstats: Create a pid-based hash for jobid values Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c9eb53d6b65325f4b3715e56d59947b07c8d8fe1
            yujian Jian Yu added a comment -

            I created LUDOC-310 to track the Lustre manual change.

            yujian Jian Yu added a comment - I created LUDOC-310 to track the Lustre manual change.
            pjones Peter Jones added a comment -

            Landed for 2.8

            pjones Peter Jones added a comment - Landed for 2.8

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16598/
            Subject: LU-7195 jobstats: Allow setting static content for jobid_var
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: fed02bd85eae0e27b682a58c1e466dfbf1f97196

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16598/ Subject: LU-7195 jobstats: Allow setting static content for jobid_var Project: fs/lustre-release Branch: master Current Patch Set: Commit: fed02bd85eae0e27b682a58c1e466dfbf1f97196

            Slightly updated version of patch to cache jobid in vvp_env. This still needs to be updated to copy the jobid into md_op_data to pass down to the MDC layer.

            adilger Andreas Dilger added a comment - Slightly updated version of patch to cache jobid in vvp_env. This still needs to be updated to copy the jobid into md_op_data to pass down to the MDC layer.

            Jesse, thanks for the additional information. The results definitely make more sense in this regard.

            I guess there isn't much surprise that there is some overhead for metadata-heavy workloads since the jobid value is cached in the inode, but with a file create workload the inodes are never re-used. I don't know if there is anything that could be done to improve performance for a per-task jobid, since the jobid is already kept in the process task struct in the kernel, just in an inefficient-to-access ASCII string format. There isn't any spare space in the task struct for keeping extra data, although it might be possible to cache the jobid in the process "env".

            There was a patch to do this posted on LKML at one point (https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg528724.html), but the whole jobid functionality was ripped out of upstream so it never landed. It might be worthwhile to see if it could be revived and the lu_env cached jobid could also be used to populate lli_jobid and then md_op_data to pass the jobid down to the MDC code for storing in pb_jobid before it gets down to ptlrpc_set_add_req(), similar to how it happens in the IO path.

            adilger Andreas Dilger added a comment - Jesse, thanks for the additional information. The results definitely make more sense in this regard. I guess there isn't much surprise that there is some overhead for metadata-heavy workloads since the jobid value is cached in the inode, but with a file create workload the inodes are never re-used. I don't know if there is anything that could be done to improve performance for a per-task jobid, since the jobid is already kept in the process task struct in the kernel, just in an inefficient-to-access ASCII string format. There isn't any spare space in the task struct for keeping extra data, although it might be possible to cache the jobid in the process "env". There was a patch to do this posted on LKML at one point ( https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg528724.html ), but the whole jobid functionality was ripped out of upstream so it never landed. It might be worthwhile to see if it could be revived and the lu_env cached jobid could also be used to populate lli_jobid and then md_op_data to pass the jobid down to the MDC code for storing in pb_jobid before it gets down to ptlrpc_set_add_req() , similar to how it happens in the IO path.

            People

              niu Niu Yawei (Inactive)
              hanleyja Jesse Hanley
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: