<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:28:54 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-9749] Reduce overhead for ll_do_fast_read</title>
                <link>https://jira.whamcloud.com/browse/LU-9749</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;For small read sizes (1k and below), creating a cl_env can be over 90% of the time spent doing ll_do_fast_read.&lt;/p&gt;

&lt;p&gt;ll_do_fast_read doesn&apos;t really need a cl_env, it only uses it for some debug printing.  Not prepping a cl_env for ll_do_fast_reads improves read rates (for already cached data) for 1 byte reads by ~20x, and 1k reads by ~10x.  4k reads are improved by ~5x.&lt;/p&gt;

&lt;p&gt;Patch coming shortly.&lt;/p&gt;</description>
                <environment></environment>
        <key id="47157">LU-9749</key>
            <summary>Reduce overhead for ll_do_fast_read</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="paf">Patrick Farrell</assignee>
                                    <reporter username="paf">Patrick Farrell</reporter>
                        <labels>
                    </labels>
                <created>Fri, 7 Jul 2017 19:21:51 +0000</created>
                <updated>Sat, 22 Jul 2017 03:59:26 +0000</updated>
                            <resolved>Sat, 22 Jul 2017 03:59:26 +0000</resolved>
                                                    <fixVersion>Lustre 2.11.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                            <comments>
                            <comment id="201407" author="gerrit" created="Fri, 7 Jul 2017 21:01:31 +0000"  >&lt;p&gt;Patrick Farrell (paf@cray.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/27970&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/27970&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9749&quot; title=&quot;Reduce overhead for ll_do_fast_read&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9749&quot;&gt;&lt;del&gt;LU-9749&lt;/del&gt;&lt;/a&gt; llite: Reduce overhead for ll_do_fast_read&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: f5fd490e35662395f59111358ad4eb09c5d9f3c5&lt;/p&gt;</comment>
                            <comment id="202379" author="paf" created="Mon, 17 Jul 2017 18:48:13 +0000"  >&lt;p&gt;4 threads of dd, as described in commit message.&lt;/p&gt;

&lt;p&gt;This is current master (with the lc_version fix to avoid refilling cached envs every time):&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Samples: 132K of event &apos;cpu-clock&apos;, Event count (approx.): 33152500000                  
  Children      Self  Command       Shared Object       Symbol                         &#9670;
-   82.92%     0.09%  dd            [kernel.kallsyms]   [k] system_call_fastpath       &#9618;
   - 82.82% system_call_fastpath                                                       &#9618;
      - 73.17% sys_read                                                                &#9618;
         - 71.71% vfs_read                                                             &#9618;
            - 68.77% ll_file_read                                                      &#9618;
               - 42.16% ll_file_aio_read                                               &#9618;
                  - 15.10% cl_env_put                                                  &#9618;
                       4.70% _raw_read_lock                                            &#9618;
                  - 10.15% cl_env_get                                                  &#9618;
                       3.58% _raw_read_lock                                            &#9618;
                       0.80% __list_del_entry                                          &#9618;
                  - 6.65% generic_file_aio_read                                        &#9618;
                     + 2.11% __find_get_page                                           &#9618;
                       0.92% put_page                                                  &#9618;
                       0.70% touch_atime                                               &#9618;
                       0.52% file_read_actor                                           &#9618;
                  - 3.28% ll_cl_add                                                    &#9618;
                       1.60% _raw_write_lock                                           &#9618;
                  - 1.87% ll_cl_remove                                                 &#9618;
                       0.67% _raw_write_lock                                           &#9618;
                  + 0.90% ll_stats_ops_tally                                           &#9618;
               - 11.90% cl_env_put                                                     &#9618;
                    3.00% _raw_read_lock                                               &#9618;
                    0.51% lu_context_exit                                              &#9618;
               - 11.71% cl_env_get                                                     &#9618;
                    4.06% _raw_read_lock                                               &#9618;
                    1.34% __list_del_entry                                             &#9618;
            + 1.43% rw_verify_area                                                     &#9618;
           0.63% fget_light                                                            &#9618;
      + 8.79% sys_write                                                                &#9618;
        0.75% fget_light                                                               &#9618;
+   80.65%     1.31%  dd            libc-2.17.so        [.] __GI___libc_read           &#9618;
+   73.88%     1.32%  dd            [kernel.kallsyms]   [k] sys_read                   &#9618;
+   71.78%     0.72%  dd            [kernel.kallsyms]   [k] vfs_read                   &#9618;
+   68.82%     1.48%  dd            [kernel.kallsyms]   [k] ll_file_read               &#9618;
Tip: Search options using a keyword: perf report -h &amp;lt;keyword&amp;gt;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;


&lt;p&gt;Notice all the time spent looking up the cl_env, etc.&lt;/p&gt;</comment>
                            <comment id="202380" author="paf" created="Mon, 17 Jul 2017 18:49:41 +0000"  >&lt;p&gt;Here&apos;s with the current patch:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Samples: 125K of event &apos;cpu-clock&apos;, Event count (approx.): 31399500000                  
  Children      Self  Command          Shared Object       Symbol                      &#9670;
-   60.22%     0.24%  dd               [kernel.kallsyms]   [k] system_call_fastpath    &#9618;
   - 59.98% system_call_fastpath                                                       &#9618;
      - 40.61% sys_read                                                                &#9618;
         - 37.48% vfs_read                                                             &#9618;
            - 29.90% ll_file_read                                                      &#9618;
               - 25.80% ll_file_aio_read                                               &#9618;
                  - 16.66% generic_file_aio_read                                       &#9618;
                     + 7.20% __find_get_page                                           &#9618;
                       2.13% put_page                                                  &#9618;
                       1.38% touch_atime                                               &#9618;
                       1.07% file_read_actor                                           &#9618;
                       0.61% copy_user_generic_unrolled                                &#9618;
                  + 2.68% ll_stats_ops_tally                                           &#9618;
                    0.75% ll_file_get_iov_count                                        &#9618;
                    0.53% iov_iter_advance                                             &#9618;
            + 3.01% rw_verify_area                                                     &#9618;
              1.70% fsnotify                                                           &#9618;
           1.27% fget_light                                                            &#9618;
      + 17.66% sys_write                                                               &#9618;
        1.56% fget_light                                                               &#9618;
+   57.28%     2.79%  dd               libc-2.17.so        [.] __GI___libc_read        &#9618;
+   41.80%     2.66%  dd               [kernel.kallsyms]   [k] sys_read                &#9618;
+   37.55%     2.16%  dd               [kernel.kallsyms]   [k] vfs_read                &#9618;
+   34.48%     3.03%  dd               libc-2.17.so        [.] __GI___libc_write       &#9618;
+   29.98%     3.43%  dd               [kernel.kallsyms]   [k] ll_file_read            &#9618;
+   26.02%     4.53%  dd               [kernel.kallsyms]   [k] ll_file_aio_read        &#9618;
+   19.25%     2.96%  dd               [kernel.kallsyms]   [k] sys_write               &#9618;
+   16.86%     3.66%  dd               [kernel.kallsyms]   [k] generic_file_aio_read   &#9618;
+   14.38%     1.60%  dd               [kernel.kallsyms]   [k] vfs_write               &#9618;
+   14.20%     6.25%  dd               [kernel.kallsyms]   [k] fsnotify                &#9618;
+    8.68%     0.38%  dd               [kernel.kallsyms]   [k] sysret_audit            &#9618;
+    7.90%     7.90%  dd               [kernel.kallsyms]   [k] system_call_after_swapgs&#9618;
+    7.73%     4.76%  dd               [kernel.kallsyms]   [k] __audit_syscall_exit    &#9618;
+    7.27%     4.35%  dd               [kernel.kallsyms]   [k] __find_get_page         &#9618;
+    4.68%     0.52%  dd               [kernel.kallsyms]   [k] auditsys                &#9618;
Tip: See assembly instructions with percentage: perf annotate &amp;lt;symbol&amp;gt;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Much better.&lt;/p&gt;</comment>
                            <comment id="202381" author="paf" created="Mon, 17 Jul 2017 18:50:35 +0000"  >&lt;p&gt;And here&apos;s with the version suggested by Jinshan, with cl_env_percpu_get used.  Timings are identical to above, and code&apos;s a little simpler:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Samples: 176K of event &apos;cpu-clock&apos;, Event count (approx.): 44168250000                  
  Children      Self  Command          Shared Object        Symbol                     &#9670;
-   60.22%     0.25%  dd               [kernel.kallsyms]    [k] system_call_fastpath   &#9618;
   - 59.97% system_call_fastpath                                                       &#9618;
      - 40.04% sys_read                                                                &#9618;
         - 36.98% vfs_read                                                             &#9618;
            - 29.83% ll_file_read                                                      &#9618;
               - 25.86% ll_file_aio_read                                               &#9618;
                  - 17.08% generic_file_aio_read                                       &#9618;
                     + 8.25% __find_get_page                                           &#9618;
                       2.18% put_page                                                  &#9618;
                       1.33% touch_atime                                               &#9618;
                       1.02% file_read_actor                                           &#9618;
                  - 2.41% ll_stats_ops_tally                                           &#9618;
                       1.72% lprocfs_counter_add                                       &#9618;
                    0.77% ll_file_get_iov_count                                        &#9618;
                    0.52% iov_iter_advance                                             &#9618;
            + 2.90% rw_verify_area                                                     &#9618;
              1.49% fsnotify                                                           &#9618;
           1.21% fget_light                                                            &#9618;
      + 18.21% sys_write                                                               &#9618;
        1.58% fget_light                                                               &#9618;
+   56.70%     2.80%  dd               libc-2.17.so         [.] __GI___libc_read       &#9618;
+   41.41%     2.86%  dd               [kernel.kallsyms]    [k] sys_read               &#9618;
+   37.04%     2.07%  dd               [kernel.kallsyms]    [k] vfs_read               &#9618;
+   34.36%     2.84%  dd               libc-2.17.so         [.] __GI___libc_write      &#9618;
+   29.90%     3.28%  dd               [kernel.kallsyms]    [k] ll_file_read           &#9618;
+   26.09%     4.50%  dd               [kernel.kallsyms]    [k] ll_file_aio_read       &#9618;
+   19.95%     3.06%  dd               [kernel.kallsyms]    [k] sys_write              &#9618;
+   17.33%     3.24%  dd               [kernel.kallsyms]    [k] generic_file_aio_read  &#9618;
+   15.20%     1.91%  dd               [kernel.kallsyms]    [k] vfs_write              &#9618;
+   13.48%     5.68%  dd               [kernel.kallsyms]    [k] fsnotify               &#9618;
+    8.65%     0.43%  dd               [kernel.kallsyms]    [k] sysret_audit           &#9618;
+    8.32%     5.64%  dd               [kernel.kallsyms]    [k] __find_get_page        &#9618;
+    7.69%     4.83%  dd               [kernel.kallsyms]    [k] __audit_syscall_exit   &#9618;
+    7.65%     7.65%  dd               [kernel.kallsyms]    [k] system_call_after_swapg&#9618;
+    5.21%     2.32%  dd               [kernel.kallsyms]    [k] rw_verify_area         &#9618;
Tip: To change sampling frequency to 100 Hz: perf record -F 100
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="203173" author="gerrit" created="Sat, 22 Jul 2017 02:54:59 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/27970/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/27970/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9749&quot; title=&quot;Reduce overhead for ll_do_fast_read&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9749&quot;&gt;&lt;del&gt;LU-9749&lt;/del&gt;&lt;/a&gt; llite: Reduce overhead for ll_do_fast_read&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: c084c6215851d238d14b0d414374b6b55c91f525&lt;/p&gt;</comment>
                            <comment id="203180" author="mdiep" created="Sat, 22 Jul 2017 03:59:26 +0000"  >&lt;p&gt;l&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzga7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>