[LU-7195] Allow for static string content for jobstats jobid_var Created: 22/Sep/15  Updated: 02/Feb/17  Resolved: 25/Oct/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0, Lustre 2.5.3
Fix Version/s: Lustre 2.8.0

Type: Improvement Priority: Minor
Reporter: Jesse Hanley Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: patch
Environment:

RHEL 6.6


Attachments: File jobid_env.patch    
Issue Links:
Related
is related to LUDOC-310 jobstats: Allow setting static conten... Resolved
Rank (Obsolete): 9223372036854775807

 Description   

We've been benchmarking I/O performance (mainly metadata operations) with job stats enabled. There's potential for a performance impact when using the environment variable setup. This performance degradation appears to be associated with the environment variable lookup. The impact when using the special procname_uid setting is negligible.

To counter this, we would like to see the ability to support a static string that's not evaluated as an environment variable, but is simply passed along with the rpc.

I would like to propose using a prefix to the jobid_var variable to indicate that it should be passed, not evaluated. I think it would make sense to use a symbol like @ for this prefix. I'm basing this on my assumption that environment variables compliant with IEEE Std 1003.1-2001 will not contain the at-sign. This would allow administrators to statically set this at a job start, or using the client's hostname, etc, without the overhead of the environment lookup. This also allows us the ability to take this out of the user's control without resorting to read-only variables in their environments.

Examples of use:

Associating traffic per-host: lctl set_param jobid_var="@$(hostname)"

Associating traffic with a specific string: lctl set_param jobid_var="@benchmarking"

From my understanding, it looks like this would be a pretty straight forward change to the obd class, within the lustre_get_jobid function. I have a potential patch I can push to master if this is a behavior we want supported.

Thanks!

Jesse



 Comments   
Comment by James A Simmons [ 22/Sep/15 ]

As a note we are seeing 9% performance lose for each job due to job stats reading the environment variables.

Comment by Oleg Drokin [ 22/Sep/15 ]

Upstream kernel client has a different mechanism where every node has a setting for node-wide jobid to be set in prologue.
I've been meaning to port this to master but had no time. Upstream kernel commit is 76133e66b1417a73c0950d0716219d09ee21d595

This is a limited solution anyway because it makes only a single setting for entire node so if multiple jobs are running - it won't work.
Solution to that is likely to implement every job as it's own cgroup and have a per-cgroup setting still enabled via a job prologue, I imagine.
Anyway even the current upstream patch does what you want, so I imagine to maintain better interoperability we should be porting it instead and perhaps also patch the tools accordingly to know of the new layout.

Comment by James A Simmons [ 22/Sep/15 ]

Yep. This looks like the solution that is needed. Will port it.

Comment by Gerrit Updater [ 22/Sep/15 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: http://review.whamcloud.com/16598
Subject: LU-7195 lprocfs: Replace jobid acquiring with per node setting
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9d40df475908e361de9ace7c5a5c25a207f16e2f

Comment by Andreas Dilger [ 27/Sep/15 ]

Since lots of users are already using job stats, it also makes sense to improve the performance of the existing code. When Oleg's patch to remove the environment variable access was going upstream, Peng Tao and I also implemented a cache mechanism for the jobid so that it didn't need to access the environment very much. I'll have to see if I can find a version of that patch.

The other concern is that some sites run with multiple different jobs on the same nodes, so having a single global jobid assigned to the node will not work for them.

James, it would be good to know what you were testing that hit this performance loss, since I thought we tested it ourselves and didn't see anything close to that. I wonder if something has changed in newer kernels that would make it so much can worse? It might be that this only shows up for metadata-heavy jobs, and not IO jobs? Maybe the other difference is how many environment variables are set, since this could affect the parsing time significantly.

Comment by Jesse Hanley [ 29/Sep/15 ]

Hey Andreas,

These were actually from some runs I did. Yes, your assumption is right - this from metadata-heavy jobs. From my IOR runs I didn't see any noticeable impact. I was comparing run times of mdtest. Here's the parameters I used on a 2.7 client:

Shared directory: mpirun -n 8 -N 8 mdtest -n 131072 -d output/run -F -C -T -r -N 8
Unique directory: mpirun -n 8 -N 8 mdtest -n 131072 -d output/run -F -C -T -r -N 8 -u
Shared file: mpirun -n 8 -N 8 mdtest -S -C -T -r -n 1 -d output/run -F

I was benchmarking the overhead since we do have some metadata heavy jobs. I did about a dozen runs like this with jobid_var set to disable, an environment variable, and procname_uid. In the case of the environment variable, I tested with the target variable both undefined and defined when performing the runs.

There was very little detectable overhead when using procname_uid, which I expected since it's a pretty easy lookup. When set to an environment variable, it was about a 5% hit, with worse behavior for file creations using a shared directory (the 7% to 9% range).

Does this help?

Comment by Andreas Dilger [ 30/Sep/15 ]

Jesse, thanks for the additional information. The results definitely make more sense in this regard.

I guess there isn't much surprise that there is some overhead for metadata-heavy workloads since the jobid value is cached in the inode, but with a file create workload the inodes are never re-used. I don't know if there is anything that could be done to improve performance for a per-task jobid, since the jobid is already kept in the process task struct in the kernel, just in an inefficient-to-access ASCII string format. There isn't any spare space in the task struct for keeping extra data, although it might be possible to cache the jobid in the process "env".

There was a patch to do this posted on LKML at one point (https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg528724.html), but the whole jobid functionality was ripped out of upstream so it never landed. It might be worthwhile to see if it could be revived and the lu_env cached jobid could also be used to populate lli_jobid and then md_op_data to pass the jobid down to the MDC code for storing in pb_jobid before it gets down to ptlrpc_set_add_req(), similar to how it happens in the IO path.

Comment by Andreas Dilger [ 30/Sep/15 ]

Slightly updated version of patch to cache jobid in vvp_env. This still needs to be updated to copy the jobid into md_op_data to pass down to the MDC layer.

Comment by Gerrit Updater [ 24/Oct/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16598/
Subject: LU-7195 jobstats: Allow setting static content for jobid_var
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: fed02bd85eae0e27b682a58c1e466dfbf1f97196

Comment by Peter Jones [ 25/Oct/15 ]

Landed for 2.8

Comment by Jian Yu [ 26/Oct/15 ]

I created LUDOC-310 to track the Lustre manual change.

Comment by Gerrit Updater [ 02/Feb/17 ]

Ben Evans (bevans@cray.com) uploaded a new patch: https://review.whamcloud.com/25208
Subject: LU-7195 jobstats: Create a pid-based hash for jobid values
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c9eb53d6b65325f4b3715e56d59947b07c8d8fe1

Comment by James A Simmons [ 02/Feb/17 ]

Please create a new ticket.

Generated at Sat Feb 10 02:06:48 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.