<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:26:23 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-16365] cached &apos;ls -l&apos; is slow</title>
                <link>https://jira.whamcloud.com/browse/LU-16365</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;While testing &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14139&quot; title=&quot; batched statahead processing&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14139&quot;&gt;&lt;del&gt;LU-14139&lt;/del&gt;&lt;/a&gt;, there is an observed performance behavior.&lt;br/&gt;
Here is test workload&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# echo 3 &amp;gt; /proc/sys/vm/drop_caches
# time ls -l /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ 
# time ls -l /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In theory, when 1st &apos;ls -l&apos; finishes, client keeps data, metadata and locks in the cache, then second &apos;ls -l&apos; output should come from it.&lt;br/&gt;
It would expect 2nd &apos;ls -l&apos;could be significant faster than 1st &apos;ls -l&apos;, but it&apos;s not very much.&lt;/p&gt;

&lt;p&gt;Here is &apos;ls -l&apos; results for 1M files in single directory.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@ec01 ~]# clush -w ec01,ai400x2-1-vm[1-4] &quot;echo 3 &amp;gt; /proc/sys/vm/drop_caches&quot;
[sihara@ec01 ~]$ time ls -l /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ &amp;gt; /dev/null

real	0m27.385s
user	0m8.994s
sys	0m13.131s

[sihara@ec01 ~]$ time ls -l /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ &amp;gt; /dev/null

real	0m25.309s
user	0m8.937s
sys	0m16.327s
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;There are no RPCs to go out in 2nd &apos;ls -l&apos; below. I only saw only 16 x LNET messages on 2nd &apos;ls -l&apos; against 1.1M LNET messages on 1st &apos;ls -l&apos;, but still almost same elapsed time. most of time costs is &apos;ls&apos; itself and Lustre client side.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@ec01 ~]# clush -w ai400x2-1-vm[1-4],ec01 &quot; echo 3 &amp;gt; /proc/sys/vm/drop_caches &quot;
[root@ec01 ~]# lnetctl net show -v| grep _count; time ls -l /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ &amp;gt; /dev/null; lnetctl net show -v | grep _count
              send_count: 0
              recv_count: 0
              drop_count: 0
              send_count: 65363661
              recv_count: 62095891
              drop_count: 1

real	0m26.145s
user	0m9.070s
sys	0m13.552s
              send_count: 0
              recv_count: 0
              drop_count: 0
              send_count: 66482277
              recv_count: 63233245
              drop_count: 1
[root@ec01 ~]# lnetctl net show -v| grep _count; time ls -l /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ &amp;gt; /dev/null; lnetctl net show -v | grep _count
              send_count: 0
              recv_count: 0
              drop_count: 0
              send_count: 66482277
              recv_count: 63233245
              drop_count: 1

real	0m25.569s
user	0m8.987s
sys	0m16.537s
              send_count: 0
              recv_count: 0
              drop_count: 0
              send_count: 66482293
              recv_count: 63233261
              drop_count: 1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This is same test for 1M files in ext4 of local disk and /dev/shm on client.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@ec01 ~]# echo 3 &amp;gt; /proc/sys/vm/drop_caches
[sihara@ec01 ~]$ time ls -l /tmp/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/  &amp;gt; /dev/null

real	0m16.999s
user	0m8.956s
sys	0m5.855s
[sihara@ec01 ~]$ time ls -l /tmp/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/  &amp;gt; /dev/null

real	0m11.832s
user	0m8.765s
sys	0m3.051s

[root@ec01 ~]# echo 3 &amp;gt; /proc/sys/vm/drop_caches
[sihara@ec01 ~]$ time ls -l /dev/shm/testdir/test-dir.0-0/mdtest_tree.0/ &amp;gt; /dev/null

real	0m8.296s
user	0m5.465s
sys	0m2.813s
[sihara@ec01 ~]$ time ls -l /dev/shm/testdir/test-dir.0-0/mdtest_tree.0/ &amp;gt; /dev/null

real	0m8.273s
user	0m5.414s
sys	0m2.847s
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Lustre can be similar performance of ext4 and memcache if everything in the cache, can&apos;t it?&lt;/p&gt;</description>
                <environment></environment>
        <key id="73480">LU-16365</key>
            <summary>cached &apos;ls -l&apos; is slow</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="sihara">Shuichi Ihara</reporter>
                        <labels>
                    </labels>
                <created>Sat, 3 Dec 2022 10:45:59 +0000</created>
                <updated>Wed, 31 Jan 2024 22:14:01 +0000</updated>
                                                                                <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="355014" author="sihara" created="Sat, 3 Dec 2022 13:32:28 +0000"  >&lt;p&gt;In order to simplify cases, the number of files is reduced. 100K x 47001 byte files.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@ec01 ~]# lctl set_param llite.*.stats=clear
[root@ec01 ~]#  clush -w ec01,ai400x2-1-vm[1-4] &quot;echo 3 &amp;gt; /proc/sys/vm/drop_caches&quot;
[sihara@ec01 ~]$ strace -c ls -l /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ &amp;gt; /dev/null
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 67.27    1.180271          11    100001           statx
 31.71    0.556354           5    100002    100002 getxattr
  0.95    0.016659         134       124           getdents64

[root@ec01 ~]# lctl get_param llite.*.stats
llite.exafs-ffff9b96b8157800.stats=
snapshot_time             508528.306272142 secs.nsecs
start_time                0.000000000 secs.nsecs
elapsed_time              508528.306272142 secs.nsecs
open                      1 samples [usecs] 163 163 163 26569
close                     1 samples [usecs] 107 107 107 11449
readdir                   124 samples [usecs] 0 14912 31175 224577441
getattr                   100002 samples [usecs] 2 565 432562 2313162
getxattr                  1 samples [usecs] 88 88 88 7744
inode_permission          1000021 samples [usecs] 0 314 195050 299488
opencount                 1 samples [reqs] 1 1 1 1

[root@ec01 ~]# lctl set_param llite.*.stats=clear
[sihara@ec01 ~]$ strace -c ls -l /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ &amp;gt; /dev/null
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 65.73    1.142941          11    100001           statx
 33.22    0.577662           5    100002    100002 getxattr
  0.93    0.016237         130       124           getdents64

[root@ec01 ~]# lctl get_param llite.*.stats
llite.exafs-ffff9b96b8157800.stats=
snapshot_time             508578.202659913 secs.nsecs
start_time                0.000000000 secs.nsecs
elapsed_time              508578.202659913 secs.nsecs
open                      1 samples [usecs] 177 177 177 31329
close                     1 samples [usecs] 124 124 124 15376
readdir                   124 samples [usecs] 0 142 16484 2223244
getattr                   100002 samples [usecs] 4 21 530944 2862706
getxattr                  1 samples [usecs] 2 2 2 4
getxattr_hits             1 samples [reqs]
inode_permission          1000021 samples [usecs] 0 13 191123 193591
opencount                 1 samples [reqs] 2 2 2 4
openclosetime             1 samples [usecs] 44568082 44568082 44568082 1986313933158724
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In 1st &apos;ls -l&apos;, total time cost of 100K x statx() is 1.18sec and spent 0.43sec for getattr() and 0.19sec for inode_permission() in Lustre.&lt;br/&gt;
2nd &apos;ls -l&apos;, 100K x getattr() in Lustre is longer (0.53sec) than 1st &apos;ls -l&apos;.&#160; I did same test several times, but 2nd case is always slower.&lt;br/&gt;
In any cases, regardless cached/non-cached, 1/2 of total time is spent in the Lustre.&lt;/p&gt;</comment>
                            <comment id="355032" author="sihara" created="Sat, 3 Dec 2022 23:42:31 +0000"  >&lt;p&gt;same test, but 2M x 47001 files in the single directory. it can clarify the problem.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[sihara@ec01 ~]$ time ls -l /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ &amp;gt; /dev/null

real	1m0.494s
user	0m19.378s
sys	0m28.927s

llite.exafs-ffff9b8b77585000.stats=
snapshot_time             545315.042991581 secs.nsecs
start_time                0.000000000 secs.nsecs
elapsed_time              545315.042991581 secs.nsecs
open                      1 samples [usecs] 206 206 206 42436
close                     1 samples [usecs] 141 141 141 19881
readdir                   2502 samples [usecs] 0 40943 518532 2304508620
getattr                   2048002 samples [usecs] 2 22144 12111659 9279727979
getxattr                  1 samples [usecs] 90 90 90 8100
inode_permission          20480021 samples [usecs] 0 206 5211214 8280488
opencount                 1 samples [reqs] 1 1 1 1

[sihara@ec01 ~]$ time ls -l /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ &amp;gt; /dev/null

real	1m6.651s
user	0m19.172s
sys	0m47.345s

llite.exafs-ffff9b8b77585000.stats=
snapshot_time             545428.911309921 secs.nsecs
start_time                0.000000000 secs.nsecs
elapsed_time              545428.911309921 secs.nsecs
open                      1 samples [usecs] 8493 8493 8493 72131049
close                     1 samples [usecs] 181 181 181 32761
readdir                   2502 samples [usecs] 0 164 377300 56936396
getattr                   2048002 samples [usecs] 4 52 31198624 564334832
getxattr                  1 samples [usecs] 2 2 2 4
getxattr_hits             1 samples [reqs]
inode_permission          20480021 samples [usecs] 0 24 4861557 6459227
opencount                 1 samples [reqs] 2 2 2 4
openclosetime             1 samples [usecs] 55693604 55693604 55693604 3101777526508816


[sihara@ec01 ~]$ time ls -l /exafs/testdir/mdtest.out/test-dir.0-0/mdtest_tree.0/ &amp;gt; /dev/null

real	1m6.656s
user	0m19.146s
sys	0m47.384s

llite.exafs-ffff9b8b77585000.stats=
snapshot_time             545596.209353102 secs.nsecs
start_time                0.000000000 secs.nsecs
elapsed_time              545596.209353102 secs.nsecs
open                      1 samples [usecs] 233 233 233 54289
close                     1 samples [usecs] 188 188 188 35344
readdir                   2502 samples [usecs] 0 160 377048 56861048
getattr                   2048002 samples [usecs] 4 50 31185467 563959387
getxattr                  1 samples [usecs] 2 2 2 4
getxattr_hits             1 samples [reqs]
inode_permission          20480021 samples [usecs] 0 25 4879991 6515725
opencount                 1 samples [reqs] 4 4 4 16
openclosetime             1 samples [usecs] 24859044 24859044 24859044 617972068593936
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;cached case (2nd or 3rd &apos;ls -l&apos;) were slower than non-cached case (1st &apos;ls -l&apos;) and client spent total 31sec for getattr() on client against 12sec in non-cached case.&lt;/p&gt;</comment>
                            <comment id="355194" author="green" created="Mon, 5 Dec 2022 23:36:39 +0000"  >&lt;p&gt;&#160;seconds of user time is strange, is ls sorting things in RAM? Can you do ls -U and see what sort of a difference it makes. Also when everythign is cached there&apos;s still huge cpu time use in Lustre case compared to ext4, which given there are no RPC means it&apos;s somewhere inside lustre somewhere? And it doe not change much between cached/uncached case even, in fact cached takes more system time (13&amp;gt;16)&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;ext4 on the other hand is dropping system time 6&amp;gt;3 seconds (user time remains the same)&lt;/p&gt;</comment>
                            <comment id="355201" author="adilger" created="Tue, 6 Dec 2022 00:09:11 +0000"  >&lt;p&gt;The 19s of user time is caused by &quot;&lt;tt&gt;ls -l&lt;/tt&gt;&quot; sorting the files before printing them, and that remains constant between calls (cached/uncached) and between filesystems (lustre/ext4), so we can&apos;t do anything about that.  It looks like almost all of the system time is used by the &lt;tt&gt;getattr&lt;/tt&gt; and &lt;tt&gt;inode_permission&lt;/tt&gt; calls:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;sys	0m28.927s  (17.32s in getattr + inode_permission)
getattr                   2048002 samples [usecs] 2 22144 12111659 9279727979
inode_permission          20480021 samples [usecs] 0 206 5211214 8280488

sys	0m47.384s (36.06s in getattr+inode_permission)
getattr                   2048002 samples [usecs] 4 52 31198624 564334832
inode_permission          20480021 samples [usecs] 0 24 4861557 6459227

sys	0m47.384s (36.07s in getattr + inode_permission)
getattr                   2048002 samples [usecs] 4 50 31185467 563959387
inode_permission          20480021 samples [usecs] 0 25 4879991 6515725
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
</comment>
                            <comment id="355203" author="adilger" created="Tue, 6 Dec 2022 00:21:57 +0000"  >&lt;p&gt;According to Yingjin&apos;s comments in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14139&quot; title=&quot; batched statahead processing&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14139&quot;&gt;&lt;del&gt;LU-14139&lt;/del&gt;&lt;/a&gt; there is quite a bit of overhead in the LDLM hash code:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;From the attaching FG, lots of time costs are:&lt;br/&gt;
{{__ldlm_handle2lock () -&amp;gt;class_handle2object() }}&lt;br/&gt;
&lt;tt&gt;ldlm_resource_get() &amp;gt;cfs_hash_bd_lookup_intent()&amp;gt;ldlm_res_hop_keycmp()&lt;/tt&gt;&lt;/p&gt;

&lt;p&gt;Each of them costs 5% for data and metadata DLM lock matching on the client side. All takes about 20% of the total sample...&lt;br/&gt;
It seems that the hash for looking up the lock handle and resource take lots of time.&lt;br/&gt;
And &lt;tt&gt;ldlm_res_hop_keycmp()&lt;/tt&gt; reaches 5%, it means there are lots of elements in the same bucket of the hash table. We should increase the hash table and use the resizable hash table.&lt;/p&gt;

&lt;p&gt;For &lt;tt&gt;class_handle2object()&lt;/tt&gt; for locks, we should maintain the lock handle per target (osc/mdc) lock namespace, not shared a single global hash table for handle on the client or even the server.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;There is WIP patch &lt;a href=&quot;https://review.whamcloud.com/45882&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/45882&lt;/a&gt; &quot;&lt;tt&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8130&quot; title=&quot;Migrate from libcfs hash to rhashtable&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8130&quot;&gt;LU-8130&lt;/a&gt; ldlm: convert ldlm_resource hash to rhashtable&lt;/tt&gt;&quot; that would be interesting to try, both because rhashtable is supposed to be faster than cfs_hash, but also because the code is being converted to rhashtable and it would be good to know if it makes performance &lt;b&gt;worse&lt;/b&gt; for some reason.  However, that patch is over a year old and it failed every test session, so it would need to be updated/fixed before it could be tested here.&lt;/p&gt;</comment>
                            <comment id="355209" author="simmonsja" created="Tue, 6 Dec 2022 01:39:21 +0000"  >&lt;p&gt;Sorry I haven&apos;t worked out the details yet. Need to find the cycles.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="61685">LU-14139</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="36869">LU-8130</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="79279">LU-17329</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="47101" name="ls.svg" size="144850" author="adilger" created="Tue, 6 Dec 2022 00:23:04 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i0371z:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>