<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:43:50 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4561] threads stuck waiting on page bit</title>
                <link>https://jira.whamcloud.com/browse/LU-4561</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;As of yesterday, when testing master with mmstress, I saw a huge number of threads stuck waiting here, with IO failing to complete:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;sleep_on_page+0xe/0x20;
wait_on_page_bit+0x74/0x80;
vvp_io_fault_start+0x855/0xc20 [lustre]; 
cl_io_start+0x72/0x140 [obdclass]; 
cl_io_loop+0xac/0x1a0 [obdclass]; 
ll_page_mkwrite+0x280/0x6c0 [lustre]; 
__do_fault+0xe7/0x570;
 handle_pte_fault+0xa4/0xcc0; 
handle_mm_fault+0x1ae/0x240; 
do_page_fault+0x18f/0x420; 
page_fault+0x1f/0x30; 0x200007ea; 0xffffffffffffffff
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Effectively, they seem to be unable to do page faulting.  We ran a quick Cray IO regression suite on a system and many (or perhaps most) of those tests failed as well.&lt;/p&gt;

&lt;p&gt;I looked at the list of new commits since I had last built &amp;amp; used master successfully, and this one jumped out at me:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3531&quot; title=&quot;DNE2: striped directory&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3531&quot;&gt;&lt;del&gt;LU-3531&lt;/del&gt;&lt;/a&gt; mdc: release dir page cache after accessing &lt;/p&gt;

&lt;p&gt;Release the dir page cache in llite/lmv, so the page will be hold until entires was filled by filldir. &lt;/p&gt;

&lt;p&gt;Signed-off-by: wang di &amp;lt;di.wang@intel.com&amp;gt; &lt;br/&gt;
Change-Id: I8b24bec74b14ff2b65130c02294821fc16ca1421 &lt;br/&gt;
Reviewed-on: &lt;a href=&quot;http://review.whamcloud.com/8935&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/8935&lt;/a&gt; &lt;br/&gt;
Tested-by: Jenkins &lt;br/&gt;
Reviewed-by: John L. Hammond &amp;lt;john.hammond@intel.com&amp;gt; &lt;br/&gt;
Reviewed-by: Oleg Drokin &amp;lt;oleg.drokin@intel.com&amp;gt; &lt;br/&gt;
Tested-by: Oleg Drokin &amp;lt;oleg.drokin@intel.com&amp;gt;&lt;/p&gt;

&lt;p&gt;But I reverted only this commit and problems continued.&lt;/p&gt;

&lt;p&gt;I rolled back about a week of commits to get back to something I knew was good.  I rolled back everything after this and the problem went away:&lt;br/&gt;
commit b9b4614c1e302058ed9863b1ab73b7def2c5c924&lt;br/&gt;
Author: Oleg Drokin &amp;lt;oleg.drokin@intel.com&amp;gt;&lt;br/&gt;
Date:   Mon Jan 20 23:10:06 2014 +0000&lt;/p&gt;

&lt;p&gt;    Revert &quot;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3319&quot; title=&quot;Adapt to 3.10 upstream kernel proc_dir_entry change&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3319&quot;&gt;&lt;del&gt;LU-3319&lt;/del&gt;&lt;/a&gt; procfs: move osp proc handling to seq_files&quot;&lt;/p&gt;

&lt;p&gt;    This seems to be causing issues like &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-45&quot; title=&quot;building e2fsprogs should not require lustre to be installed or a built source tree&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-45&quot;&gt;&lt;del&gt;LU-45&lt;/del&gt;&lt;/a&gt;-13 and &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4510&quot; title=&quot;Oops (use after free) in osp_prealloc_next_seq_seq_show&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4510&quot;&gt;&lt;del&gt;LU-4510&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
    This reverts commit a97e4898ad9e0b65f457b01bdfa954f7d7cd272d.&lt;/p&gt;

&lt;p&gt;    Change-Id: I6066a255ded24dbdb76b4804e82a377f1069af5f&lt;br/&gt;
    Reviewed-on: &lt;a href=&quot;http://review.whamcloud.com/8931&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/8931&lt;/a&gt;&lt;br/&gt;
    Reviewed-by: Oleg Drokin &amp;lt;oleg.drokin@intel.com&amp;gt;&lt;br/&gt;
    Tested-by: Oleg Drokin &amp;lt;oleg.drokin@intel.com&amp;gt;&lt;br/&gt;
&amp;#8212;&lt;br/&gt;
That puts me 11 commits behind master (or it was 11 when I last checked).  I&apos;m not sure which patch caused the problem, but current master is broken.&lt;/p&gt;</description>
                <environment></environment>
        <key id="22919">LU-4561</key>
            <summary>threads stuck waiting on page bit</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="1" iconUrl="https://jira.whamcloud.com/images/icons/priorities/blocker.svg">Blocker</priority>
                        <status id="6" iconUrl="https://jira.whamcloud.com/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="paf">Patrick Farrell</reporter>
                        <labels>
                            <label>MB</label>
                    </labels>
                <created>Wed, 29 Jan 2014 16:12:06 +0000</created>
                <updated>Tue, 3 Jun 2014 15:17:01 +0000</updated>
                            <resolved>Tue, 3 Jun 2014 15:17:01 +0000</resolved>
                                    <version>Lustre 2.6.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="75875" author="green" created="Wed, 29 Jan 2014 18:19:10 +0000"  >&lt;p&gt;This is the patch form Jinshan I am testing ATM to combat this (I also see it in my testing):&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;--- a/lustre/llite/rw26.c
+++ b/lustre/llite/rw26.c
@@ -546,7 +546,8 @@ static int ll_write_begin(struct file *file, struct address_space *mapping,
 
 	/* To avoid deadlock, try to lock page first. */
 	vmpage = grab_cache_page_nowait(mapping, index);
-	if (unlikely(vmpage == NULL || PageDirty(vmpage))) {
+	if (unlikely(vmpage == NULL || PageDirty(vmpage) ||
+	    PageWriteback(vmpage))) {
 		struct ccc_io *cio = ccc_env_io(env);
 		struct cl_page_list *plist = &amp;amp;cio-&amp;gt;u.write.cui_queue;
 
@@ -555,7 +556,7 @@ static int ll_write_begin(struct file *file, struct address_space *mapping,
 		 * because it holds page lock of a dirty page and request for
 		 * more grants. It&apos;s okay for the dirty page to be the first
 		 * one in commit page list, though. */
-		if (vmpage != NULL &amp;amp;&amp;amp; PageDirty(vmpage) &amp;amp;&amp;amp; plist-&amp;gt;pl_nr &amp;gt; 0) {
+		if (vmpage != NULL &amp;amp;&amp;amp; plist-&amp;gt;pl_nr &amp;gt; 0) {
 			unlock_page(vmpage);
 			page_cache_release(vmpage);
 			vmpage = NULL;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="75916" author="jay" created="Thu, 30 Jan 2014 03:39:43 +0000"  >&lt;p&gt;probably a duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4540&quot; title=&quot;Test failure sanity-quota test_8: dbench hung in vvp_page_assume&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4540&quot;&gt;&lt;del&gt;LU-4540&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="85598" author="jlevi" created="Tue, 3 Jun 2014 15:16:50 +0000"  >&lt;p&gt;Reopening to remove fix version as it is a duplicate.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="22876">LU-4540</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwdu7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>12451</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>