<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:06:48 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-7195] Allow for static string content for jobstats jobid_var</title>
                <link>https://jira.whamcloud.com/browse/LU-7195</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We&apos;ve been benchmarking I/O performance (mainly metadata operations) with job stats enabled.  There&apos;s potential for a performance impact when using the environment variable setup.  This performance degradation appears to be associated with the environment variable lookup.  The impact when using the special &lt;tt&gt;procname_uid&lt;/tt&gt; setting is negligible.  &lt;/p&gt;

&lt;p&gt;To counter this, we would like to see the ability to support a static string that&apos;s not evaluated as an environment variable, but is simply passed along with the rpc. &lt;/p&gt;

&lt;p&gt;I would like to propose using a prefix to the &lt;tt&gt;jobid_var&lt;/tt&gt; variable to indicate that it should be passed, not evaluated.  I think it would make sense to use a symbol like &lt;tt&gt;@&lt;/tt&gt; for this prefix.  I&apos;m basing this on my assumption that environment variables compliant with &lt;em&gt;IEEE Std 1003.1-2001&lt;/em&gt; will not contain the at-sign.  This would allow administrators to statically set this at a job start, or using the client&apos;s hostname, etc, without the overhead of the environment lookup.  This also allows us the ability to take this out of the user&apos;s control without resorting to read-only variables in their environments. &lt;/p&gt;



&lt;p&gt;Examples of use:&lt;/p&gt;

&lt;p&gt;Associating traffic per-host: &lt;tt&gt;lctl set_param jobid_var=&quot;@$(hostname)&quot;&lt;/tt&gt;&lt;/p&gt;

&lt;p&gt;Associating traffic with a specific string: &lt;tt&gt;lctl set_param jobid_var=&quot;@benchmarking&quot;&lt;/tt&gt;&lt;/p&gt;


&lt;p&gt;From my understanding, it looks like this would be a pretty straight forward change to the obd class, within the &lt;tt&gt;lustre_get_jobid&lt;/tt&gt; function.  I have a potential patch I can push to master if this is a behavior we want supported.&lt;/p&gt;

&lt;p&gt;Thanks!&lt;br/&gt;
&amp;#8211;&lt;br/&gt;
Jesse&lt;/p&gt;
</description>
                <environment>RHEL 6.6</environment>
        <key id="32262">LU-7195</key>
            <summary>Allow for static string content for jobstats jobid_var</summary>
                <type id="4" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11310&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="niu">Niu Yawei</assignee>
                                    <reporter username="hanleyja">Jesse Hanley</reporter>
                        <labels>
                            <label>patch</label>
                    </labels>
                <created>Tue, 22 Sep 2015 14:09:30 +0000</created>
                <updated>Thu, 2 Feb 2017 16:56:24 +0000</updated>
                            <resolved>Sun, 25 Oct 2015 12:46:32 +0000</resolved>
                                    <version>Lustre 2.7.0</version>
                    <version>Lustre 2.5.3</version>
                                    <fixVersion>Lustre 2.8.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>11</watches>
                                                                            <comments>
                            <comment id="128103" author="simmonsja" created="Tue, 22 Sep 2015 16:46:09 +0000"  >&lt;p&gt;As a note we are seeing 9% performance lose for each job due to job stats reading the environment variables.&lt;/p&gt;</comment>
                            <comment id="128112" author="green" created="Tue, 22 Sep 2015 17:27:46 +0000"  >&lt;p&gt;Upstream kernel client has a different mechanism where every node has a setting for node-wide jobid to be set in prologue.&lt;br/&gt;
I&apos;ve been meaning to port this to master but had no time. Upstream kernel commit is 76133e66b1417a73c0950d0716219d09ee21d595&lt;/p&gt;

&lt;p&gt;This is a limited solution anyway because it makes only a single setting for entire node so if multiple jobs are running - it won&apos;t work.&lt;br/&gt;
Solution to that is likely to implement every job as it&apos;s own cgroup and have a per-cgroup setting still enabled via a job prologue, I imagine.&lt;br/&gt;
Anyway even the current upstream patch does what you want, so I imagine to maintain better interoperability we should be porting it instead and perhaps also patch the tools accordingly to know of the new layout.&lt;/p&gt;</comment>
                            <comment id="128113" author="simmonsja" created="Tue, 22 Sep 2015 17:32:03 +0000"  >&lt;p&gt;Yep. This looks like the solution that is needed. Will port it.&lt;/p&gt;</comment>
                            <comment id="128126" author="gerrit" created="Tue, 22 Sep 2015 18:13:45 +0000"  >&lt;p&gt;James Simmons (uja.ornl@yahoo.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/16598&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/16598&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7195&quot; title=&quot;Allow for static string content for jobstats jobid_var&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7195&quot;&gt;&lt;del&gt;LU-7195&lt;/del&gt;&lt;/a&gt; lprocfs: Replace jobid acquiring with per node setting&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 9d40df475908e361de9ace7c5a5c25a207f16e2f&lt;/p&gt;</comment>
                            <comment id="128581" author="adilger" created="Sun, 27 Sep 2015 08:39:07 +0000"  >&lt;p&gt;Since lots of users are already using job stats, it also makes sense to improve the performance of the existing code. When Oleg&apos;s patch to remove the environment variable access was going upstream, Peng Tao and I also implemented a cache mechanism for the jobid so that it didn&apos;t need to access the environment very much. I&apos;ll have to see if I can find a version of that patch. &lt;/p&gt;

&lt;p&gt;The other concern is that some sites run with multiple different jobs on the same nodes, so having a single global jobid assigned to the node will not work for them. &lt;/p&gt;

&lt;p&gt;James, it would be good to know what you were testing that hit this performance loss, since I thought we tested it ourselves and didn&apos;t see anything close to that. I wonder if something has changed in newer kernels that would make it so much can worse?  It might be that this only shows up for metadata-heavy jobs, and not IO jobs?  Maybe the other difference is how many environment variables are set, since this could affect the parsing time significantly. &lt;/p&gt;</comment>
                            <comment id="128758" author="hanleyja" created="Tue, 29 Sep 2015 17:00:12 +0000"  >&lt;p&gt;Hey Andreas,&lt;/p&gt;

&lt;p&gt;These were actually from some runs I did.  Yes, your assumption is right - this from metadata-heavy jobs.  From my IOR runs I didn&apos;t see any noticeable impact.  I was comparing run times of mdtest.  Here&apos;s the parameters I used on a 2.7 client:&lt;/p&gt;

&lt;p&gt;Shared directory: mpirun -n 8 -N 8 mdtest -n 131072 -d output/run -F -C -T -r -N 8&lt;br/&gt;
Unique directory: mpirun -n 8 -N 8 mdtest -n 131072 -d output/run -F -C -T -r -N 8 -u&lt;br/&gt;
Shared file: mpirun -n 8 -N 8 mdtest -S -C -T -r -n 1 -d output/run -F&lt;/p&gt;


&lt;p&gt;I was benchmarking the overhead since we do have some metadata heavy jobs.  I did about a dozen runs like this with jobid_var set to disable, an environment variable, and procname_uid.  In the case of the environment variable, I tested with the target variable both undefined and defined when performing the runs.&lt;/p&gt;

&lt;p&gt;There was very little detectable overhead when using procname_uid, which I expected since it&apos;s a pretty easy lookup.  When set to an environment variable, it was about a 5% hit, with worse behavior for file creations using a shared directory (the 7% to 9% range).&lt;/p&gt;

&lt;p&gt;Does this help?&lt;/p&gt;
</comment>
                            <comment id="128864" author="adilger" created="Wed, 30 Sep 2015 08:33:11 +0000"  >&lt;p&gt;Jesse, thanks for the additional information.  The results definitely make more sense in this regard.&lt;/p&gt;

&lt;p&gt;I guess there isn&apos;t much surprise that there is some overhead for metadata-heavy workloads since the jobid value is cached in the inode, but with a file create workload the inodes are never re-used.  I don&apos;t know if there is anything that could be done to improve performance for a per-task jobid, since the jobid is already kept in the process task struct in the kernel, just in an inefficient-to-access ASCII string format.  There isn&apos;t any spare space in the task struct for keeping extra data, although it might be possible to cache the jobid in the process &quot;env&quot;.&lt;/p&gt;

&lt;p&gt;There was a patch to do this posted on LKML at one point (&lt;a href=&quot;https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg528724.html&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg528724.html&lt;/a&gt;), but the whole jobid functionality was ripped out of upstream so it never landed.  It might be worthwhile to see if it could be revived and the &lt;tt&gt;lu_env&lt;/tt&gt; cached jobid could also be used to populate &lt;tt&gt;lli_jobid&lt;/tt&gt; and then &lt;tt&gt;md_op_data&lt;/tt&gt; to pass the jobid down to the MDC code for storing in &lt;tt&gt;pb_jobid&lt;/tt&gt; before it gets down to &lt;tt&gt;ptlrpc_set_add_req()&lt;/tt&gt;, similar to how it happens in the IO path.&lt;/p&gt;
</comment>
                            <comment id="128865" author="adilger" created="Wed, 30 Sep 2015 08:35:19 +0000"  >&lt;p&gt;Slightly updated version of patch to cache jobid in vvp_env.  This still needs to be updated to copy the jobid into md_op_data to pass down to the MDC layer.&lt;/p&gt;</comment>
                            <comment id="131454" author="gerrit" created="Sat, 24 Oct 2015 00:35:24 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;http://review.whamcloud.com/16598/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/16598/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7195&quot; title=&quot;Allow for static string content for jobstats jobid_var&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7195&quot;&gt;&lt;del&gt;LU-7195&lt;/del&gt;&lt;/a&gt; jobstats: Allow setting static content for jobid_var&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: fed02bd85eae0e27b682a58c1e466dfbf1f97196&lt;/p&gt;</comment>
                            <comment id="131479" author="pjones" created="Sun, 25 Oct 2015 12:46:32 +0000"  >&lt;p&gt;Landed for 2.8&lt;/p&gt;</comment>
                            <comment id="131616" author="yujian" created="Mon, 26 Oct 2015 22:32:58 +0000"  >&lt;p&gt;I created &lt;a href=&quot;https://jira.whamcloud.com/browse/LUDOC-310&quot; title=&quot;jobstats: Allow setting static content for jobid_var&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LUDOC-310&quot;&gt;&lt;del&gt;LUDOC-310&lt;/del&gt;&lt;/a&gt; to track the Lustre manual change.&lt;/p&gt;</comment>
                            <comment id="183060" author="gerrit" created="Thu, 2 Feb 2017 15:16:17 +0000"  >&lt;p&gt;Ben Evans (bevans@cray.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/25208&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/25208&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7195&quot; title=&quot;Allow for static string content for jobstats jobid_var&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7195&quot;&gt;&lt;del&gt;LU-7195&lt;/del&gt;&lt;/a&gt; jobstats: Create a pid-based hash for jobid values&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: c9eb53d6b65325f4b3715e56d59947b07c8d8fe1&lt;/p&gt;</comment>
                            <comment id="183081" author="simmonsja" created="Thu, 2 Feb 2017 16:56:24 +0000"  >&lt;p&gt;Please create a new ticket.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                                        </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="32836">LUDOC-310</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="19023" name="jobid_env.patch" size="2412" author="adilger" created="Wed, 30 Sep 2015 08:35:19 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzxoc7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>