<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:50:01 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-12142] Hang in OSC on eviction - threads stuck in read() and ldlm_bl_NN</title>
                <link>https://jira.whamcloud.com/browse/LU-12142</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;This appears to be related to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6271&quot; title=&quot;(osc_cache.c:3150:discard_cb()) ASSERTION( (!(page-&amp;gt;cp_type == CPT_CACHEABLE) || (!PageDirty(cl_page_vmpage(page)))) ) failed:&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6271&quot;&gt;&lt;del&gt;LU-6271&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A private customer ticket reported a hang on a client which was suffering repeated evictions.&lt;/p&gt;

&lt;p&gt;The client threads all seem to be waiting in two connected places.&lt;/p&gt;

&lt;p&gt;First, the eviction:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[&amp;lt;ffffffffc11f8c05&amp;gt;] osc_object_invalidate+0x115/0x290 [osc]
[&amp;lt;ffffffffc11e9f4f&amp;gt;] osc_ldlm_resource_invalidate+0xaf/0x190 [osc]
[&amp;lt;ffffffffc0ce8d10&amp;gt;] cfs_hash_for_each_relax+0x250/0x450 [libcfs]
[&amp;lt;ffffffffc0cec0a5&amp;gt;] cfs_hash_for_each_nolock+0x75/0x1c0 [libcfs]
[&amp;lt;ffffffffc11f1427&amp;gt;] osc_import_event+0x497/0x1370 [osc]
[&amp;lt;ffffffffc13b3590&amp;gt;] ptlrpc_invalidate_import+0x220/0x8f0 [ptlrpc]
[&amp;lt;ffffffffc13b50c8&amp;gt;] ptlrpc_invalidate_import_thread+0x48/0x2b0 [ptlrpc]
[&amp;lt;ffffffffa52c1c71&amp;gt;] kthread+0xd1/0xe0
[&amp;lt;ffffffffa5974c1d&amp;gt;] ret_from_fork_nospec_begin+0x7/0x21
[&amp;lt;ffffffffffffffff&amp;gt;] 0xffffffffffffffff &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And then the other side:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[&amp;lt;ffffffffc11faf65&amp;gt;] osc_lru_alloc+0x265/0x390 [osc]
[&amp;lt;ffffffffc11fb1c2&amp;gt;] osc_page_init+0x132/0x1d0 [osc]
[&amp;lt;ffffffffc0ff146f&amp;gt;] lov_page_init_composite+0x26f/0x4c0 [lov]
[&amp;lt;ffffffffc0fe8b11&amp;gt;] lov_page_init+0x21/0x60 [lov]
[&amp;lt;ffffffffc0e849bd&amp;gt;] cl_page_alloc+0x10d/0x280 [obdclass]
[&amp;lt;ffffffffc0e84ba4&amp;gt;] cl_page_find+0x74/0x280 [obdclass]
[&amp;lt;ffffffffc1111653&amp;gt;] ll_readpage+0x83/0x6e0 [lustre]
[&amp;lt;ffffffffa53b81f0&amp;gt;] generic_file_aio_read+0x3f0/0x790
[&amp;lt;ffffffffc1139037&amp;gt;] vvp_io_read_start+0x4b7/0x600 [lustre]
[&amp;lt;ffffffffc0e87b78&amp;gt;] cl_io_start+0x68/0x130 [obdclass]
[&amp;lt;ffffffffc0e89f5e&amp;gt;] cl_io_loop+0x12e/0xc90 [obdclass]
[&amp;lt;ffffffffc10e43c8&amp;gt;] ll_file_io_generic+0x498/0xc80 [lustre]
[&amp;lt;ffffffffc10e547a&amp;gt;] ll_file_aio_read+0x34a/0x3e0 [lustre]
[&amp;lt;ffffffffc10e55de&amp;gt;] ll_file_read+0xce/0x1e0 [lustre]
[&amp;lt;ffffffffa54414bf&amp;gt;] vfs_read+0x9f/0x170
[&amp;lt;ffffffffa544237f&amp;gt;] SyS_read+0x7f/0xf0
[&amp;lt;ffffffffa5974ddb&amp;gt;] system_call_fastpath+0x22/0x27
[&amp;lt;ffffffffffffffff&amp;gt;] 0xffffffffffffffff&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The eviction side is waiting for:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;l_wait_event(osc-&amp;gt;oo_io_waitq, atomic_read(&amp;amp;osc-&amp;gt;oo_nr_ios) == 0, &amp;amp;lwi);&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This is the first action in osc_object_invalidate.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;And the other side, in osc_lru_alloc, sleeps with no timeout on the osc_lru_waitq:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;        struct l_wait_info lwi = LWI_INTR(LWI_ON_SIGNAL_NOOP, NULL);
[.....]
                rc = l_wait_event(osc_lru_waitq,
                                atomic_long_read(cli-&amp;gt;cl_lru_left) &amp;gt; 0,
                                &amp;amp;lwi); &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;osc_lru_alloc is called after osc_io_iter_init, which increases oo_nr_ios, so it&apos;s sleeping here with oo_nr_ios elevated.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;The OSC eviction path does not tickle osc_lru_waitq directly, it does so by freeing pages from objects, so if the first object to be invalidated has threads waiting for pages, I think it will get stuck here.&#160; (We would also expect that the failure of whatever is holding these LRU pages would free them up - We may have an ordering issue here.)&lt;/p&gt;

&lt;p&gt;Additionally, the osc_lru_alloc code does not &lt;b&gt;appear&lt;/b&gt; to have any method to fail if the import is being evicted.&#160; It looks like we have to successfully get a page in here before we&apos;ll spool out in to the larger i/o, which will eventually catch the eviction and fail.&lt;/p&gt;</description>
                <environment></environment>
        <key id="55319">LU-12142</key>
            <summary>Hang in OSC on eviction - threads stuck in read() and ldlm_bl_NN</summary>
                <type id="4" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11310&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="wshilong">Wang Shilong</assignee>
                                    <reporter username="pfarrell">Patrick Farrell</reporter>
                        <labels>
                    </labels>
                <created>Mon, 1 Apr 2019 16:57:10 +0000</created>
                <updated>Fri, 5 Nov 2021 15:30:14 +0000</updated>
                            <resolved>Tue, 6 Apr 2021 03:39:44 +0000</resolved>
                                    <version>Lustre 2.10.7</version>
                    <version>Lustre 2.12.3</version>
                                    <fixVersion>Lustre 2.15.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>10</watches>
                                                                            <comments>
                            <comment id="278682" author="adilger" created="Wed, 2 Sep 2020 22:21:52 +0000"  >&lt;p&gt;It looks like the root of this problem is caused by &lt;tt&gt;llite.*.max_cached_mb&lt;/tt&gt; being too small (&lt;tt&gt;=128&lt;/tt&gt; on the &lt;tt&gt;/home&lt;/tt&gt; filesystem and &lt;tt&gt;=2048&lt;/tt&gt; on the &lt;tt&gt;/scratch&lt;/tt&gt; filesystem) for multiple threads reading from the same filesystem to reserve enough pages for the RDMA reads at one time. This results in all of the threads being stuck holding some number of pages, but waiting for additional pages before it has enough to send the read RPC.  The clients have &lt;tt&gt;osc.&amp;#42;.max_pages_per_rpc=16M&lt;/tt&gt;, so it would only need 9+ threads preparing concurrent 16MB read RPCs from the &lt;tt&gt;/home&lt;/tt&gt; filesystem before the livelock could be hit.  With clients having 20-30 or more cores and the &lt;tt&gt;max_pages_per_rpc&lt;/tt&gt; increasing, this is increasingly likely to be hit, as seen when the &lt;tt&gt;unused_mb&lt;/tt&gt; is stuck at &lt;tt&gt;0&lt;/tt&gt;, as below:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;$ lctl get_param llite.home*.max_cached_mb
llite.home-ffff880c1c30ec00.max_cached_mb=
users: 8
max_cached_mb: 128
used_mb: 128
unused_mb: 0
reclaim_count: 0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Increasing the &lt;tt&gt;llite.&amp;#42;.max_cached_mb&lt;/tt&gt; values for both filesystems allowed the read threads to get the pages they needed and get unstuck. The &lt;tt&gt;llite.&amp;#42;.max_cached_mb&lt;/tt&gt; value was reduced during debugging another issue related to a memory problem. &lt;/p&gt;</comment>
                            <comment id="278685" author="adilger" created="Wed, 2 Sep 2020 22:46:15 +0000"  >&lt;p&gt;I think there are a couple of ways to get out of this kind of deadlock situation:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;after some (semi-random?) number of loops without making progress, the &lt;tt&gt;osc_lru_alloc()&lt;/tt&gt; returns an &lt;tt&gt;&amp;#45;EAGAIN&lt;/tt&gt; (or similar) error and unwinds the stack, freeing the pages it had previously reserved, and then trying again.  The freeing of the previous pages would allow some &lt;em&gt;other&lt;/em&gt; thread to make progress.  Having a semi-random number of loops (e.g. &lt;tt&gt;N + (pid%M)&lt;/tt&gt;) would avoid threads being stuck in a loop still contending with each other.  This has the drawback that the &lt;tt&gt;cl_loop-&amp;gt;osc_lru_alloc()&lt;/tt&gt; callchain is deep and probably hard to unwind, and would cause a lot of work to be re-done, but it is still better than the thread being stuck for hours doing nothing.&lt;/li&gt;
	&lt;li&gt;have readahead threads fail the allocation outright after some number of tries, since they shouldn&apos;t be forcing reads under memory pressure.  This has the advantage of being relatively simple to implement, but may hurt readahead performance, and may not solve all problems if normal threads are doing large reads&lt;/li&gt;
	&lt;li&gt;have the page cache reservation be done at a higher level, all at once for a given read request, rather than one page at a time at the low level.  This has the advantage that it is very efficient, but may lead to starvation if one thread can never get the pages it needs.  It may also require some significant code restructuring to move the &lt;tt&gt;max_cached_mb&lt;/tt&gt; handling up to a higher level, but at the same time since this is a &lt;tt&gt;llite&lt;/tt&gt; parameter instead of an &lt;tt&gt;osc&lt;/tt&gt; parameter it might simplify the code also?&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="278691" author="wshilong" created="Thu, 3 Sep 2020 00:57:15 +0000"  >&lt;p&gt;We definitely try to reserve LRU pages before that, see cl_io_iter_init-&amp;gt;osc_io_rw_iter_init-&amp;gt;osc_lru_reserve() but this doesn&apos;t give gurantee, it just try reserve LRU pages in batch in advance if there are plenty of free pages, and it try to tigger async reclaim if there are not enough free pages.&lt;/p&gt;

&lt;p&gt;Maybe we should modify osc_lru_reserve() logic to be blocked if there are not enough free LRU to trigger&lt;br/&gt;
at least one RPC page(or npages), so that other threads could go further to send RPC out, and at the same time we might modify readahead to be aware of lru pages, and don&apos;t trigger too much pages exceed it.&lt;/p&gt;
</comment>
                            <comment id="278714" author="adilger" created="Thu, 3 Sep 2020 06:42:32 +0000"  >&lt;p&gt;Another alternative may be to send a smaller RPC if there are not enough pages to form a full-sized RPC?&lt;/p&gt;</comment>
                            <comment id="278716" author="wshilong" created="Thu, 3 Sep 2020 07:13:11 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=adilger&quot; class=&quot;user-hover&quot; rel=&quot;adilger&quot;&gt;adilger&lt;/a&gt; I am not sure for that, this might need some debugging and testing, for example even max_cached_mb is bigger, it might still possibly run out,&lt;br/&gt;
and if we blindly send smaller rpc for this case, it might hurt performances, so we might only do that if max_cached_mb is small.&lt;/p&gt;</comment>
                            <comment id="282178" author="gerrit" created="Wed, 14 Oct 2020 02:58:47 +0000"  >&lt;p&gt;Wang Shilong (wshilong@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/40237&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/40237&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12142&quot; title=&quot;Hang in OSC on eviction - threads stuck in read() and ldlm_bl_NN&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12142&quot;&gt;&lt;del&gt;LU-12142&lt;/del&gt;&lt;/a&gt; clio: fix hang on urgent cached pages&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: a0c85f030166234f732628c38fffe573f841fec2&lt;/p&gt;</comment>
                            <comment id="295214" author="gerrit" created="Wed, 17 Mar 2021 10:09:28 +0000"  >&lt;p&gt;Wang Shilong (wshilong@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/42060&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/42060&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12142&quot; title=&quot;Hang in OSC on eviction - threads stuck in read() and ldlm_bl_NN&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12142&quot;&gt;&lt;del&gt;LU-12142&lt;/del&gt;&lt;/a&gt; readahead: limit over reservation&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 470d677b2eb05961067034afeb78b58302d65323&lt;/p&gt;</comment>
                            <comment id="297858" author="gerrit" created="Tue, 6 Apr 2021 03:01:49 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/42060/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/42060/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12142&quot; title=&quot;Hang in OSC on eviction - threads stuck in read() and ldlm_bl_NN&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12142&quot;&gt;&lt;del&gt;LU-12142&lt;/del&gt;&lt;/a&gt; readahead: limit over reservation&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 1058867c004bf19774218945631a691e8210b502&lt;/p&gt;</comment>
                            <comment id="297859" author="gerrit" created="Tue, 6 Apr 2021 03:01:53 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/40237/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/40237/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12142&quot; title=&quot;Hang in OSC on eviction - threads stuck in read() and ldlm_bl_NN&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12142&quot;&gt;&lt;del&gt;LU-12142&lt;/del&gt;&lt;/a&gt; clio: fix hang on urgent cached pages&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 2a34dc95bd100c181573e231047ff8976e296a36&lt;/p&gt;</comment>
                            <comment id="297880" author="pjones" created="Tue, 6 Apr 2021 03:39:44 +0000"  >&lt;p&gt;Landed for 2.15&lt;/p&gt;</comment>
                            <comment id="310833" author="bzzz" created="Mon, 23 Aug 2021 08:16:06 +0000"  >&lt;p&gt;hitting the following deadlock in racer quite often:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
schedule,vvp_io_setattr_start,cl_io_start,cl_io_loop,cl_setattr_ost,ll_setattr_raw,do_truncate,path_openat,do_filp_open,do_sys_open
	PIDs(2): &lt;span class=&quot;code-quote&quot;&gt;&quot;dir_create.sh&quot;&lt;/span&gt;:9061 &lt;span class=&quot;code-quote&quot;&gt;&quot;dir_create.sh&quot;&lt;/span&gt;:9134 

schedule,wait_for_common,osc_io_setattr_end,cl_io_end,lov_io_end_wrapper,lov_io_call,lov_io_end,cl_io_end,cl_io_loop,cl_setattr_ost,ll_setattr_raw,do_truncate,path_openat,do_filp_open,do_sys_open
	PIDs(1): &lt;span class=&quot;code-quote&quot;&gt;&quot;dir_create.sh&quot;&lt;/span&gt;:9363 

schedule,ldlm_completion_ast,ldlm_cli_enqueue_local,ofd_destroy_by_fid,ofd_destroy_hdl,tgt_request_handle,ptlrpc_main
	PIDs(1): &lt;span class=&quot;code-quote&quot;&gt;&quot;ll_ost00_007&quot;&lt;/span&gt;:12274 

schedule,osc_object_invalidate,osc_ldlm_resource_invalidate,cfs_hash_for_each_relax,cfs_hash_for_each_nolock,osc_import_event,ptlrpc_invalidate_import,ptlrpc_invalidate_import_thread
	PIDs(1): &lt;span class=&quot;code-quote&quot;&gt;&quot;ll_imp_inval&quot;&lt;/span&gt;:553105 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                                        </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="28805">LU-6271</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00eb3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>