<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:29:38 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-2948] Client read hang in osc_page_init()</title>
                <link>https://jira.whamcloud.com/browse/LU-2948</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We are seeing threads that get indefinitely stuck (I&apos;ve personally seen it stuck there for over an hour) sleeping under osc_page_init() while doing a read().  The backtrace looks like this:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2013-03-11 18:11:27.964356 {DefaultControlEventListener} [mmcs]{360}.0.0: sysiod        S 00000fff92016460     0 41292   3080 0x00000000
2013-03-11 18:11:27.964386 {DefaultControlEventListener} [mmcs]{360}.0.0: Call Trace:
2013-03-11 18:11:27.964416 {DefaultControlEventListener} [mmcs]{360}.0.0: [c0000003c878e7c0] [c0000003c878e850] 0xc0000003c878e850 (unreliable)
2013-03-11 18:11:27.964446 {DefaultControlEventListener} [mmcs]{360}.0.0: [c0000003c878e990] [c000000000009b2c] .__switch_to+0xc4/0x100
2013-03-11 18:11:27.964476 {DefaultControlEventListener} [mmcs]{360}.0.0: [c0000003c878ea20] [c00000000044c660] .schedule+0x7d4/0x944
2013-03-11 18:11:27.964506 {DefaultControlEventListener} [mmcs]{360}.0.0: [c0000003c878ecd0] [8000000000af2090] .cfs_waitq_wait+0x10/0x30 [libcfs]
2013-03-11 18:11:27.964536 {DefaultControlEventListener} [mmcs]{360}.0.0: [c0000003c878ed40] [80000000046f8c58] .osc_page_init+0xb18/0x1130 [osc]
2013-03-11 18:11:27.964567 {DefaultControlEventListener} [mmcs]{360}.0.0: [c0000003c878eea0] [800000000252c078] .cl_page_find0+0x378/0xb70 [obdclass]
2013-03-11 18:11:27.964597 {DefaultControlEventListener} [mmcs]{360}.0.0: [c0000003c878efc0] [800000000519bf80] .lov_page_init_raid0+0x230/0xa20 [lov]
2013-03-11 18:11:27.964627 {DefaultControlEventListener} [mmcs]{360}.0.0: [c0000003c878f0e0] [8000000005196160] .lov_page_init+0x50/0xa0 [lov]
2013-03-11 18:11:27.964657 {DefaultControlEventListener} [mmcs]{360}.0.0: [c0000003c878f170] [800000000252c078] .cl_page_find0+0x378/0xb70 [obdclass]
2013-03-11 18:11:27.964686 {DefaultControlEventListener} [mmcs]{360}.0.0: [c0000003c878f290] [800000000699e098] .ll_readahead+0xdb8/0x1670 [lustre]
2013-03-11 18:11:27.964717 {DefaultControlEventListener} [mmcs]{360}.0.0: [c0000003c878f400] [80000000069dd38c] .vvp_io_read_page+0x3bc/0x4d0 [lustre]
2013-03-11 18:11:27.964747 {DefaultControlEventListener} [mmcs]{360}.0.0: [c0000003c878f510] [800000000253d8a4] .cl_io_read_page+0xf4/0x280 [obdclass]
2013-03-11 18:11:27.964777 {DefaultControlEventListener} [mmcs]{360}.0.0: [c0000003c878f5d0] [800000000699c5dc] .ll_readpage+0xdc/0x2c0 [lustre]
2013-03-11 18:11:27.964807 {DefaultControlEventListener} [mmcs]{360}.0.0: [c0000003c878f680] [c000000000099144] .generic_file_aio_read+0x500/0x728
2013-03-11 18:11:27.964837 {DefaultControlEventListener} [mmcs]{360}.0.0: [c0000003c878f7c0] [80000000069dfe24] .vvp_io_read_start+0x274/0x640 [lustre]
2013-03-11 18:11:27.964867 {DefaultControlEventListener} [mmcs]{360}.0.0: [c0000003c878f8e0] [80000000025398cc] .cl_io_start+0xcc/0x220 [obdclass]
2013-03-11 18:11:27.964897 {DefaultControlEventListener} [mmcs]{360}.0.0: [c0000003c878f980] [8000000002541724] .cl_io_loop+0x194/0x2c0 [obdclass]
2013-03-11 18:11:27.964928 {DefaultControlEventListener} [mmcs]{360}.0.0: [c0000003c878fa30] [800000000695a160] .ll_file_io_generic+0x410/0x670 [lustre]
2013-03-11 18:11:27.964958 {DefaultControlEventListener} [mmcs]{360}.0.0: [c0000003c878fb30] [800000000695af04] .ll_file_aio_read+0x1d4/0x3a0 [lustre]
2013-03-11 18:11:27.964988 {DefaultControlEventListener} [mmcs]{360}.0.0: [c0000003c878fc00] [800000000695b220] .ll_file_read+0x150/0x320 [lustre]
2013-03-11 18:11:27.965018 {DefaultControlEventListener} [mmcs]{360}.0.0: [c0000003c878fce0] [c0000000000d429c] .vfs_read+0xd0/0x1c4
2013-03-11 18:11:27.965048 {DefaultControlEventListener} [mmcs]{360}.0.0: [c0000003c878fd80] [c0000000000d448c] .SyS_read+0x54/0x98
2013-03-11 18:11:27.965078 {DefaultControlEventListener} [mmcs]{360}.0.0: [c0000003c878fe30] [c000000000000580] syscall_exit+0x0/0x2c
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Prakash pointed out that used_mb is maxed out when we found the hung process:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;seqio345-ib0:/proc/fs/lustre/llite/ls1-c0000003cc505000$ cat max_cached_mb
users: 384
max_cached_mb: 4096
used_mb: 4096
unused_mb: 0
reclaim_count: 11201
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;When I tried to cat a file in the same ls1 filesystem, it got stuck in the same call path as the sysiod shown above.&lt;/p&gt;

&lt;p&gt;Issuing an &quot;echo 3 &amp;gt; /proc/sys/vm/drop_caches&quot; gets things moving again.&lt;/p&gt;

&lt;p&gt;This on a Sequoia ION, ppc64 Lustre client.  Lustre version 2.3.58-14chaos.&lt;/p&gt;</description>
                <environment>Lustre 2.3.58-14chaos (github.com/chaos/lustre)</environment>
        <key id="17830">LU-2948</key>
            <summary>Client read hang in osc_page_init()</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="bfaccini">Bruno Faccini</assignee>
                                    <reporter username="morrone">Christopher Morrone</reporter>
                        <labels>
                            <label>LB</label>
                            <label>sequoia</label>
                            <label>topsequoia</label>
                    </labels>
                <created>Mon, 11 Mar 2013 21:50:02 +0000</created>
                <updated>Tue, 26 Mar 2013 21:33:32 +0000</updated>
                            <resolved>Tue, 26 Mar 2013 20:32:48 +0000</resolved>
                                    <version>Lustre 2.4.0</version>
                                    <fixVersion>Lustre 2.4.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="53789" author="bfaccini" created="Tue, 12 Mar 2013 09:14:48 +0000"  >&lt;p&gt;Hello Chris,&lt;br/&gt;
I wonder if it is possible that something is broken/missing in the Cache/OSC LRU list management here ??&lt;br/&gt;
Sorry to ask, but how we can have a look to the source for the Lustre version you indicate ?&lt;/p&gt;</comment>
                            <comment id="53792" author="pjones" created="Tue, 12 Mar 2013 11:18:54 +0000"  >&lt;p&gt;Bruno&lt;/p&gt;

&lt;p&gt;LLNL&apos;s source is on github and the details are listed under environment. Talk to me directly if you need fuller information about this&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="53797" author="prakash" created="Tue, 12 Mar 2013 12:53:28 +0000"  >&lt;p&gt;Bruno, this was on tag &lt;a href=&quot;https://github.com/chaos/lustre/tree/2.3.58-14chaos&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;2.3.58-14chaos&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Some source level information:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;(gdb) l *osc_page_init+0xb18
0x28c58 is in osc_page_init (/builddir/build/BUILD/lustre-2.3.58/lustre/osc/osc_page.c:964).

964                 rc = l_wait_event(osc_lru_waitq,                                
965                                 cfs_atomic_read(cli-&amp;gt;cl_lru_left) &amp;gt; 0 ||        
966                                 (cfs_atomic_read(&amp;amp;cli-&amp;gt;cl_lru_in_list) &amp;gt; 0 &amp;amp;&amp;amp;   
967                                  gen != cfs_atomic_read(&amp;amp;cli-&amp;gt;cl_lru_in_list)), 
968                                 &amp;amp;lwi);
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It is definitely something &quot;broken&quot; with the OSC LRU implementation, as that&apos;s the waitq that it&apos;s sleeping on, but I&apos;m not quite sure what yet.&lt;/p&gt;

&lt;p&gt;I&apos;m curious why &lt;tt&gt;osc_lru_reclaim&lt;/tt&gt; was not able to free up any LRU slots just prior to sleeping on the waitq. By the time I was able to look at the system, &lt;b&gt;none&lt;/b&gt; of the per OSC pages on the LRU were busy, and a &lt;tt&gt;cat&lt;/tt&gt; of a lustre file hung on the waitq listed in the description. I think getting a log with &lt;tt&gt;D_CACHE&lt;/tt&gt; enabled when this happens might prove useful.&lt;/p&gt;</comment>
                            <comment id="53890" author="bfaccini" created="Wed, 13 Mar 2013 06:33:30 +0000"  >&lt;p&gt;Thank&apos;s Prakash, I got the source tree and I am working on it now. Yes, you right, try to enable D_CACHE debug flag and get the log/trace, at least when running the cat command that hang.&lt;/p&gt;
</comment>
                            <comment id="54030" author="bfaccini" created="Thu, 14 Mar 2013 12:58:08 +0000"  >&lt;p&gt;Yes, I agree that osc_lru_reclaim() should have been able to free some ...&lt;/p&gt;

&lt;p&gt;So yes, would be nice to enable D_CACHE when this situation re-occurs, and take a trace after some time, but also to get snapshots of /proc/fs/lustre/llite/&amp;lt;FS&amp;gt;/max_cached_mb and /proc/fs/lustre/osc/&amp;lt;OSC&amp;gt;/osc_cached_mb.&lt;/p&gt;

&lt;p&gt;Also I am asking me why we don&apos;t wait with a time-out, just in case nothing happen for a single OSC and give osc_lru_reclaim() a new chance to steal LRU-pages from others ??... Humm, but I need to double-check that this is not useless.&lt;/p&gt;

</comment>
                            <comment id="54033" author="prakash" created="Thu, 14 Mar 2013 13:44:43 +0000"  >&lt;p&gt;I did look at the &lt;tt&gt;osc_cached_mb&lt;/tt&gt; files when this hit, but unfortunately I didn&apos;t save them anywhere. IIRC, there were two OSCs with pages in the LRU and none of them were labelled as &quot;busy&quot;. Also, then sum of the pages in the two OSC LRU pages didn&apos;t add up to 4096 (the max reported by &lt;tt&gt;max_cached_mb&lt;/tt&gt;). Which seems like there might have been a leak of some kind, but I still need to verify that the &lt;tt&gt;osc_cached_mb&lt;/tt&gt; values should sum up to &lt;tt&gt;max_cached_mb&lt;/tt&gt;. If I can reproduce the issue, I&apos;ll gather D_CACHE logs and post them.&lt;/p&gt;

&lt;p&gt;I&apos;m not convinced a time out would help. Running a &quot;cat&quot; hung in the same location, which ran osc_lru_reclaim prior to getting stuck. So I&apos;d imagine using a timeout would just cause the treads to repeatedly fail to reclaim as well.&lt;/p&gt;</comment>
                            <comment id="54276" author="bfaccini" created="Mon, 18 Mar 2013 17:14:44 +0000"  >&lt;p&gt;Yes, I agree the &quot;cat&quot; should have caused one more osc_lru_reclaim() call, so it is unlikely the blocking condition will change ...&lt;/p&gt;

&lt;p&gt;So we definitely need D_CACHE enabled at least during the &quot;cat&quot;, and from beginning if possible. And the /proc counters aleady listed.&lt;/p&gt;

&lt;p&gt;I was also thinking that we may learn from a full stack-trace (Alt+SysRq+T) in case some threads are hung in the process of freeing pages.&lt;/p&gt;</comment>
                            <comment id="54283" author="prakash" created="Mon, 18 Mar 2013 17:56:24 +0000"  >&lt;p&gt;I attached `console-seqio345.txt` which contains sysrq-t information from a system stuck in this state.&lt;/p&gt;</comment>
                            <comment id="54316" author="jay" created="Mon, 18 Mar 2013 23:08:19 +0000"  >&lt;p&gt;Please try this patch: &lt;a href=&quot;http://review.whamcloud.com/5760&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/5760&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This patch has not been verified yet so take your own risk &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/wink.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;</comment>
                            <comment id="54323" author="prakash" created="Tue, 19 Mar 2013 00:04:56 +0000"  >&lt;p&gt;Out of curiosity, what makes you think that will fix the issue? What do you think the underlying issue is?&lt;/p&gt;</comment>
                            <comment id="54324" author="jay" created="Tue, 19 Mar 2013 00:21:26 +0000"  >&lt;p&gt;In the original implementation, it only tried to shrink other client_obd once if there is no available LRU budget. This is race because the just-freed slots may be immediately used by others this will put the current thread into sleep. And nobody will like to wake it up since we only do it when a page is being released.&lt;/p&gt;</comment>
                            <comment id="54358" author="bfaccini" created="Tue, 19 Mar 2013 09:43:27 +0000"  >&lt;p&gt;Prakash, thank&apos;s for the full stacks log. But it does not show any other issue/info than the threads waiting in osc_page_init().&lt;/p&gt;

&lt;p&gt;Jinshan, I agree with you and your patch, parsing all OSCs LRUs in osc_lru_relaim() will do a much better garbage collect job, that will ensure a maximum of pages to become available. But then I think we need to find a way to only do this full scan one time for a bunch of concurrent threads/requesters. It should be simply achieved in osc_lru_reclaim() by testing/spin_trylock() if cache-&amp;gt;ccc_lru_lock is already acquired by somebody and return 0 if yes (to wait for LRU frees in osc_lru_reserve()), but/and not release cache-&amp;gt;ccc_lru_lock inside the loop.&lt;/p&gt;

&lt;p&gt;What do you think ?&lt;/p&gt;</comment>
                            <comment id="54386" author="prakash" created="Tue, 19 Mar 2013 16:46:43 +0000"  >&lt;p&gt;Hmm.. I&apos;m not fully convinced this patch will help in that case. Even if osc_lru_reclaim returns &amp;gt; 0, there is still a chance another thread comes in and &quot;steals&quot; those slots before we can decrement cl_lru_left. That hasn&apos;t changed as far as I can tell.&lt;/p&gt;

&lt;p&gt;This patch does appear to help the case where an OSC has cl_lru_in_list &amp;gt; 0 but returns zero from osc_lru_reclaim. With the patch, it will move on to the next OSC, where it would just return zero previously. It&apos;s worth a try, but without evidence pointing to exactly why it failed in the first place it&apos;s hard to say whether this will work or not.&lt;/p&gt;

&lt;p&gt;One thing I noticed when I was reading the code, is we skip any pages that are &quot;in use&quot;:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;osc_lru_shrink:
697                 if (cl_page_in_use_noref(page)) {                               
698                         cfs_list_move_tail(&amp;amp;opg-&amp;gt;ops_lru, &amp;amp;cli-&amp;gt;cl_lru_list);   
699                         continue;                                               
700                 }
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;What are the chances that the first &lt;tt&gt;lru_shrink_min&lt;/tt&gt; number of pages will be in use? What circumstances marks a page in use?&lt;/p&gt;</comment>
                            <comment id="54677" author="jay" created="Fri, 22 Mar 2013 17:46:29 +0000"  >&lt;p&gt;Sorry for delay response.&lt;/p&gt;

&lt;p&gt;Hi Bruno, yes, that sounds pretty reasonable. It doesn&apos;t make any sense for the process to try it again if someone is already doing the work. Let&apos;s see if the patch works and I will do it if I have a chance to work out a new patch.&lt;/p&gt;

&lt;p&gt;Hi Prakash, probably there is less LRU pages than lru_shrink_min in that cli. The pages can be in use for rewriting or readahead pages are being read, or write and then read. It&apos;s quite common for the caching pages to be reused otherwise we don&apos;t need to cache them at all &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/wink.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;. BTW can you easily reproduce &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2576&quot; title=&quot;Hangs in osc_enter_cache due to dirty pages not being flushed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2576&quot;&gt;&lt;del&gt;LU-2576&lt;/del&gt;&lt;/a&gt; before? We recently found that the patch for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2576&quot; title=&quot;Hangs in osc_enter_cache due to dirty pages not being flushed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2576&quot;&gt;&lt;del&gt;LU-2576&lt;/del&gt;&lt;/a&gt; causing another problem we want to come up with another fix for that issue.&lt;/p&gt;</comment>
                            <comment id="54772" author="prakash" created="Mon, 25 Mar 2013 17:19:11 +0000"  >&lt;p&gt;Jinshan, unfortunately I&apos;m not able to easily reproduce &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2576&quot; title=&quot;Hangs in osc_enter_cache due to dirty pages not being flushed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2576&quot;&gt;&lt;del&gt;LU-2576&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="54858" author="jlevi" created="Tue, 26 Mar 2013 17:38:08 +0000"  >&lt;p&gt;Reducing priority until this is hit again.&lt;/p&gt;</comment>
                            <comment id="54865" author="jay" created="Tue, 26 Mar 2013 20:32:48 +0000"  >&lt;p&gt;Please reopen this issue if this problem has been seen again&lt;/p&gt;</comment>
                            <comment id="54871" author="prakash" created="Tue, 26 Mar 2013 21:33:32 +0000"  >&lt;p&gt;I haven&apos;t been able to test this on Sequoia the past few weeks, but I&apos;m sure it hasn&apos;t gone away on its own. Although, without a reproducer, I&apos;m OK dropping the priority until I can get you guys some more information.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="12401" name="console-seqio345.txt" size="227" author="prakash" created="Mon, 18 Mar 2013 17:56:24 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvkmv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>7073</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>