Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7195

Allow for static string content for jobstats jobid_var

Details

    • Improvement
    • Resolution: Fixed
    • Minor
    • Lustre 2.8.0
    • Lustre 2.7.0, Lustre 2.5.3
    • RHEL 6.6
    • 9223372036854775807

    Description

      We've been benchmarking I/O performance (mainly metadata operations) with job stats enabled. There's potential for a performance impact when using the environment variable setup. This performance degradation appears to be associated with the environment variable lookup. The impact when using the special procname_uid setting is negligible.

      To counter this, we would like to see the ability to support a static string that's not evaluated as an environment variable, but is simply passed along with the rpc.

      I would like to propose using a prefix to the jobid_var variable to indicate that it should be passed, not evaluated. I think it would make sense to use a symbol like @ for this prefix. I'm basing this on my assumption that environment variables compliant with IEEE Std 1003.1-2001 will not contain the at-sign. This would allow administrators to statically set this at a job start, or using the client's hostname, etc, without the overhead of the environment lookup. This also allows us the ability to take this out of the user's control without resorting to read-only variables in their environments.

      Examples of use:

      Associating traffic per-host: lctl set_param jobid_var="@$(hostname)"

      Associating traffic with a specific string: lctl set_param jobid_var="@benchmarking"

      From my understanding, it looks like this would be a pretty straight forward change to the obd class, within the lustre_get_jobid function. I have a potential patch I can push to master if this is a behavior we want supported.

      Thanks!

      Jesse

      Attachments

        Issue Links

          Activity

            [LU-7195] Allow for static string content for jobstats jobid_var

            Please create a new ticket.

            simmonsja James A Simmons added a comment - Please create a new ticket.

            Ben Evans (bevans@cray.com) uploaded a new patch: https://review.whamcloud.com/25208
            Subject: LU-7195 jobstats: Create a pid-based hash for jobid values
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: c9eb53d6b65325f4b3715e56d59947b07c8d8fe1

            gerrit Gerrit Updater added a comment - Ben Evans (bevans@cray.com) uploaded a new patch: https://review.whamcloud.com/25208 Subject: LU-7195 jobstats: Create a pid-based hash for jobid values Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c9eb53d6b65325f4b3715e56d59947b07c8d8fe1
            yujian Jian Yu added a comment -

            I created LUDOC-310 to track the Lustre manual change.

            yujian Jian Yu added a comment - I created LUDOC-310 to track the Lustre manual change.
            pjones Peter Jones added a comment -

            Landed for 2.8

            pjones Peter Jones added a comment - Landed for 2.8

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16598/
            Subject: LU-7195 jobstats: Allow setting static content for jobid_var
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: fed02bd85eae0e27b682a58c1e466dfbf1f97196

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16598/ Subject: LU-7195 jobstats: Allow setting static content for jobid_var Project: fs/lustre-release Branch: master Current Patch Set: Commit: fed02bd85eae0e27b682a58c1e466dfbf1f97196

            Slightly updated version of patch to cache jobid in vvp_env. This still needs to be updated to copy the jobid into md_op_data to pass down to the MDC layer.

            adilger Andreas Dilger added a comment - Slightly updated version of patch to cache jobid in vvp_env. This still needs to be updated to copy the jobid into md_op_data to pass down to the MDC layer.

            Jesse, thanks for the additional information. The results definitely make more sense in this regard.

            I guess there isn't much surprise that there is some overhead for metadata-heavy workloads since the jobid value is cached in the inode, but with a file create workload the inodes are never re-used. I don't know if there is anything that could be done to improve performance for a per-task jobid, since the jobid is already kept in the process task struct in the kernel, just in an inefficient-to-access ASCII string format. There isn't any spare space in the task struct for keeping extra data, although it might be possible to cache the jobid in the process "env".

            There was a patch to do this posted on LKML at one point (https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg528724.html), but the whole jobid functionality was ripped out of upstream so it never landed. It might be worthwhile to see if it could be revived and the lu_env cached jobid could also be used to populate lli_jobid and then md_op_data to pass the jobid down to the MDC code for storing in pb_jobid before it gets down to ptlrpc_set_add_req(), similar to how it happens in the IO path.

            adilger Andreas Dilger added a comment - Jesse, thanks for the additional information. The results definitely make more sense in this regard. I guess there isn't much surprise that there is some overhead for metadata-heavy workloads since the jobid value is cached in the inode, but with a file create workload the inodes are never re-used. I don't know if there is anything that could be done to improve performance for a per-task jobid, since the jobid is already kept in the process task struct in the kernel, just in an inefficient-to-access ASCII string format. There isn't any spare space in the task struct for keeping extra data, although it might be possible to cache the jobid in the process "env". There was a patch to do this posted on LKML at one point ( https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg528724.html ), but the whole jobid functionality was ripped out of upstream so it never landed. It might be worthwhile to see if it could be revived and the lu_env cached jobid could also be used to populate lli_jobid and then md_op_data to pass the jobid down to the MDC code for storing in pb_jobid before it gets down to ptlrpc_set_add_req() , similar to how it happens in the IO path.
            hanleyja Jesse Hanley added a comment -

            Hey Andreas,

            These were actually from some runs I did. Yes, your assumption is right - this from metadata-heavy jobs. From my IOR runs I didn't see any noticeable impact. I was comparing run times of mdtest. Here's the parameters I used on a 2.7 client:

            Shared directory: mpirun -n 8 -N 8 mdtest -n 131072 -d output/run -F -C -T -r -N 8
            Unique directory: mpirun -n 8 -N 8 mdtest -n 131072 -d output/run -F -C -T -r -N 8 -u
            Shared file: mpirun -n 8 -N 8 mdtest -S -C -T -r -n 1 -d output/run -F

            I was benchmarking the overhead since we do have some metadata heavy jobs. I did about a dozen runs like this with jobid_var set to disable, an environment variable, and procname_uid. In the case of the environment variable, I tested with the target variable both undefined and defined when performing the runs.

            There was very little detectable overhead when using procname_uid, which I expected since it's a pretty easy lookup. When set to an environment variable, it was about a 5% hit, with worse behavior for file creations using a shared directory (the 7% to 9% range).

            Does this help?

            hanleyja Jesse Hanley added a comment - Hey Andreas, These were actually from some runs I did. Yes, your assumption is right - this from metadata-heavy jobs. From my IOR runs I didn't see any noticeable impact. I was comparing run times of mdtest. Here's the parameters I used on a 2.7 client: Shared directory: mpirun -n 8 -N 8 mdtest -n 131072 -d output/run -F -C -T -r -N 8 Unique directory: mpirun -n 8 -N 8 mdtest -n 131072 -d output/run -F -C -T -r -N 8 -u Shared file: mpirun -n 8 -N 8 mdtest -S -C -T -r -n 1 -d output/run -F I was benchmarking the overhead since we do have some metadata heavy jobs. I did about a dozen runs like this with jobid_var set to disable, an environment variable, and procname_uid. In the case of the environment variable, I tested with the target variable both undefined and defined when performing the runs. There was very little detectable overhead when using procname_uid, which I expected since it's a pretty easy lookup. When set to an environment variable, it was about a 5% hit, with worse behavior for file creations using a shared directory (the 7% to 9% range). Does this help?

            Since lots of users are already using job stats, it also makes sense to improve the performance of the existing code. When Oleg's patch to remove the environment variable access was going upstream, Peng Tao and I also implemented a cache mechanism for the jobid so that it didn't need to access the environment very much. I'll have to see if I can find a version of that patch.

            The other concern is that some sites run with multiple different jobs on the same nodes, so having a single global jobid assigned to the node will not work for them.

            James, it would be good to know what you were testing that hit this performance loss, since I thought we tested it ourselves and didn't see anything close to that. I wonder if something has changed in newer kernels that would make it so much can worse? It might be that this only shows up for metadata-heavy jobs, and not IO jobs? Maybe the other difference is how many environment variables are set, since this could affect the parsing time significantly.

            adilger Andreas Dilger added a comment - Since lots of users are already using job stats, it also makes sense to improve the performance of the existing code. When Oleg's patch to remove the environment variable access was going upstream, Peng Tao and I also implemented a cache mechanism for the jobid so that it didn't need to access the environment very much. I'll have to see if I can find a version of that patch. The other concern is that some sites run with multiple different jobs on the same nodes, so having a single global jobid assigned to the node will not work for them. James, it would be good to know what you were testing that hit this performance loss, since I thought we tested it ourselves and didn't see anything close to that. I wonder if something has changed in newer kernels that would make it so much can worse? It might be that this only shows up for metadata-heavy jobs, and not IO jobs? Maybe the other difference is how many environment variables are set, since this could affect the parsing time significantly.

            James Simmons (uja.ornl@yahoo.com) uploaded a new patch: http://review.whamcloud.com/16598
            Subject: LU-7195 lprocfs: Replace jobid acquiring with per node setting
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 9d40df475908e361de9ace7c5a5c25a207f16e2f

            gerrit Gerrit Updater added a comment - James Simmons (uja.ornl@yahoo.com) uploaded a new patch: http://review.whamcloud.com/16598 Subject: LU-7195 lprocfs: Replace jobid acquiring with per node setting Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 9d40df475908e361de9ace7c5a5c25a207f16e2f

            People

              niu Niu Yawei (Inactive)
              hanleyja Jesse Hanley
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: