<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:09:50 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-727] application hang waiting on page lock</title>
                <link>https://jira.whamcloud.com/browse/LU-727</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Application hang with the following stack:&lt;/p&gt;

&lt;p&gt;PID: 17906  TASK: ffff88063e3f94e0  CPU: 0   COMMAND: &quot;Migration_Clien&quot;&lt;br/&gt;
 #0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88060e017918&amp;#93;&lt;/span&gt; schedule at ffffffff8147b5d9&lt;br/&gt;
 #1 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88060e0179e0&amp;#93;&lt;/span&gt; io_schedule at ffffffff8147bd83&lt;br/&gt;
 #2 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88060e017a00&amp;#93;&lt;/span&gt; sync_page at ffffffff810facdd&lt;br/&gt;
 #3 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88060e017a10&amp;#93;&lt;/span&gt; sync_page_killable at ffffffff810facfe&lt;br/&gt;
 #4 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88060e017a20&amp;#93;&lt;/span&gt; __wait_on_bit_lock at ffffffff8147c48a&lt;br/&gt;
 #5 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88060e017a70&amp;#93;&lt;/span&gt; __lock_page_killable at ffffffff810fac07&lt;br/&gt;
 #6 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88060e017ad0&amp;#93;&lt;/span&gt; generic_file_aio_read at ffffffff810fc864&lt;br/&gt;
 #7 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88060e017bb0&amp;#93;&lt;/span&gt; vvp_io_read_start at ffffffffa08a971b&lt;br/&gt;
 #8 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88060e017c60&amp;#93;&lt;/span&gt; cl_io_start at ffffffffa04ec9c8&lt;br/&gt;
 #9 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88060e017cc0&amp;#93;&lt;/span&gt; cl_io_loop at ffffffffa04f0ad0&lt;br/&gt;
#10 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88060e017d30&amp;#93;&lt;/span&gt; ll_file_io_generic at ffffffffa0853db2&lt;br/&gt;
#11 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88060e017dd0&amp;#93;&lt;/span&gt; ll_file_aio_read at ffffffffa085406c&lt;br/&gt;
#12 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88060e017e60&amp;#93;&lt;/span&gt; ll_file_read at ffffffffa085a751&lt;br/&gt;
#13 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88060e017ef0&amp;#93;&lt;/span&gt; vfs_read at ffffffff8115d565&lt;br/&gt;
#14 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88060e017f30&amp;#93;&lt;/span&gt; sys_pread64 at ffffffff8115d892&lt;br/&gt;
#15 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88060e017f80&amp;#93;&lt;/span&gt; system_call_fastpath at ffffffff81003172&lt;br/&gt;
    RIP: 0000003fef40ee13  RSP: 00007fff9037d5f8  RFLAGS: 00010246&lt;br/&gt;
    RAX: 0000000000000011  RBX: ffffffff81003172  RCX: 0000000000000000&lt;br/&gt;
    RDX: 0000000000000e10  RSI: 00007fff9037d620  RDI: 0000000000000016&lt;br/&gt;
    RBP: 0000000000000000   R8: 00007fff9037e508   R9: 0000000000000000&lt;br/&gt;
    R10: 0000000000000000  R11: 0000000000000293  R12: 0000000000000016&lt;br/&gt;
    R13: 00007fff9037d620  R14: 0000000000000000  R15: 00007fff90381c50&lt;br/&gt;
    ORIG_RAX: 0000000000000011  CS: 0033  SS: 002b&lt;/p&gt;

&lt;p&gt;The file being read is on a lustre filesystem&lt;br/&gt;
crash&amp;gt; files 17906&lt;br/&gt;
 22 ffff8803370ccb00 ffff88033e20cec0 ffff880337f48930 REG  /scratch_lustre/ossau/HPC2012_BENCH/data/Test_MIGAL/INPUT/Seismic/Line_3745.segy&lt;/p&gt;

&lt;p&gt;The page structure, whose address was retrieved from the process stack, shows the flags PG_locked and PG_lru are set.&lt;br/&gt;
crash&amp;gt; p/x *(struct page *)0xffffea0004f06f18&lt;br/&gt;
$4 = {&lt;br/&gt;
  flags = 0x800000000000021,   PG_locked | PG_lru&lt;br/&gt;
  _count = &lt;/p&gt;
{
    counter = 0x2
  }
&lt;p&gt;, &lt;br/&gt;
  {&lt;br/&gt;
    _mapcount = &lt;/p&gt;
{
      counter = 0xffffffff
    }
&lt;p&gt;, &lt;/p&gt;
    {
      inuse = 0xffff, 
      objects = 0xffff
    }
&lt;p&gt;  }, &lt;br/&gt;
  {&lt;br/&gt;
    &lt;/p&gt;
{
      private = 0x0, 
      mapping = 0xffff880337f48a50
    }
&lt;p&gt;, &lt;br/&gt;
    ptl = {&lt;br/&gt;
      raw_lock = &lt;/p&gt;
{
        slock = 0x0
      }
&lt;p&gt;    }, &lt;br/&gt;
    slab = 0x0, &lt;br/&gt;
    first_page = 0x0&lt;br/&gt;
  }, &lt;br/&gt;
  &lt;/p&gt;
{
    index = 0x0, 
    freelist = 0x0
  }
&lt;p&gt;, &lt;br/&gt;
  lru = &lt;/p&gt;
{
    next = 0xffffea0005dc40c8, 
    prev = 0xffffea0004f06f08
  }
&lt;p&gt;}&lt;/p&gt;

&lt;p&gt;Unfortunately, the page lock is never released. Looking at the dump info I am not able to find the current owner of the PG_locked page lock.&lt;/p&gt;

&lt;p&gt;The application is the MIGAL benchmark.&lt;br/&gt;
I am going to upload the dump file in attachment to this ticket. The hang can be easily reproduced if more information is needed.&lt;/p&gt;


&lt;p&gt;The same problem has also been produced at CEA using DDT a debugger environment. The stack is similar:&lt;br/&gt;
PID: 7234   TASK: ffff881856477600  CPU: 14  COMMAND: &quot;gdb&quot;&lt;br/&gt;
 #0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88185ae558c8&amp;#93;&lt;/span&gt; schedule at ffffffff81467211&lt;br/&gt;
 #1 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88185ae55990&amp;#93;&lt;/span&gt; io_schedule at ffffffff814679a3&lt;br/&gt;
 #2 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88185ae559b0&amp;#93;&lt;/span&gt; sync_page at ffffffff810fa35d&lt;br/&gt;
 #3 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88185ae559c0&amp;#93;&lt;/span&gt; __wait_on_bit_lock at ffffffff814680ba&lt;br/&gt;
 #4 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88185ae55a10&amp;#93;&lt;/span&gt; __lock_page at ffffffff810fa2f7&lt;br/&gt;
 #5 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88185ae55a70&amp;#93;&lt;/span&gt; filemap_fault at ffffffff810fc09f&lt;br/&gt;
 #6 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88185ae55ae0&amp;#93;&lt;/span&gt; vvp_io_fault_start at ffffffffa080ab1f&lt;br/&gt;
 #7 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88185ae55b80&amp;#93;&lt;/span&gt; cl_io_start at ffffffffa04949c8&lt;br/&gt;
 #8 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88185ae55be0&amp;#93;&lt;/span&gt; cl_io_loop at ffffffffa0498ad0&lt;br/&gt;
 #9 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88185ae55c50&amp;#93;&lt;/span&gt; ll_fault0 at ffffffffa07eea76&lt;br/&gt;
#10 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88185ae55ce0&amp;#93;&lt;/span&gt; ll_fault at ffffffffa07eee68&lt;br/&gt;
#11 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88185ae55d50&amp;#93;&lt;/span&gt; __do_fault at ffffffff811237b4&lt;br/&gt;
#12 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88185ae55de0&amp;#93;&lt;/span&gt; handle_pte_fault at ffffffff81123cfd&lt;br/&gt;
#13 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88185ae55eb0&amp;#93;&lt;/span&gt; handle_mm_fault at ffffffff811248dd&lt;br/&gt;
#14 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88185ae55f00&amp;#93;&lt;/span&gt; do_page_fault at ffffffff8146c3d3&lt;br/&gt;
#15 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff88185ae55f50&amp;#93;&lt;/span&gt; page_fault at ffffffff81469e45&lt;br/&gt;
    RIP: 00000000005cff90  RSP: 00007fff2a3e0758  RFLAGS: 00010206&lt;br/&gt;
    RAX: 0000000000868820  RBX: 0000000000d8a370  RCX: 0000000000001030&lt;br/&gt;
    RDX: 00007fff2a3e083c  RSI: 00002b672fc89052  RDI: 00002b672fc89052&lt;br/&gt;
    RBP: 00002b672fc89052   R8: 0000000000000003   R9: 00000000fffffff7&lt;br/&gt;
    R10: 0000000000000fe0  R11: 000000380217b448  R12: 00007fff2a3e083c&lt;br/&gt;
    R13: 00002b672f9d1f69  R14: 0000000000daebf0  R15: 00000000000003f2&lt;br/&gt;
    ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b&lt;/p&gt;</description>
                <environment>lustre 2.0.0.1, kernel 2.6.32-71, lustre client</environment>
        <key id="11987">LU-727</key>
            <summary>application hang waiting on page lock</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="jay">Jinshan Xiong</assignee>
                                    <reporter username="pichong">Gregoire Pichon</reporter>
                        <labels>
                    </labels>
                <created>Fri, 30 Sep 2011 03:32:45 +0000</created>
                <updated>Tue, 4 Oct 2011 17:51:29 +0000</updated>
                            <resolved>Tue, 4 Oct 2011 17:51:29 +0000</resolved>
                                                                        <due></due>
                            <votes>0</votes>
                                    <watches>1</watches>
                                                                            <comments>
                            <comment id="20564" author="pichong" created="Fri, 30 Sep 2011 04:29:16 +0000"  >&lt;p&gt;I have uploaded the dump files on ftp.whamcloud.com:/uploads/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-727&quot; title=&quot;application hang waiting on page lock&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-727&quot;&gt;&lt;del&gt;LU-727&lt;/del&gt;&lt;/a&gt;.&lt;br/&gt;
ftp&amp;gt; ls&lt;br/&gt;
227 Entering Passive Mode (69,226,2,25,143,37).&lt;br/&gt;
150 Here comes the directory listing.&lt;br/&gt;
&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-    1 123      114            85 Sep 30 01:27 files.md5sum&lt;br/&gt;
&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-    1 123      114      80658925 Sep 30 01:13 vmcore&lt;br/&gt;
&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-    1 123      114      118388663 Sep 30 01:19 vmlinux&lt;br/&gt;
226 Directory send OK.&lt;/p&gt;</comment>
                            <comment id="20589" author="jay" created="Sun, 2 Oct 2011 02:10:07 +0000"  >&lt;p&gt;I took a look at this bug. It seems the page is being read while it&apos;s being truncated, but I&apos;m not sure why the truncate process can&apos;t finish. I tried to figure out by analyzing crash file, unfortunately it doesn&apos;t work:&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@client-7 tmp&amp;#93;&lt;/span&gt;# crash -s vmlinux vmcore &lt;br/&gt;
crash: vmlinux: not a supported file format&lt;/p&gt;

&lt;p&gt;and checksum seems to be wrong:&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@client-7 tmp&amp;#93;&lt;/span&gt;# md5sum vmlinux vmcore &lt;br/&gt;
2a290a894c709c08bc9855f41d6420d1  vmlinux&lt;br/&gt;
d736a0e451880ed519a8d8f3c018800e  vmcore&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@client-7 tmp&amp;#93;&lt;/span&gt;# cat files.md5sum &lt;br/&gt;
436acaa619023b4c3ec22162cb69a9fc  vmcore&lt;br/&gt;
b13de68997147e41d0adec71e5954910  vmlinux&lt;/p&gt;

&lt;p&gt;Can you please upload them once again, thanks.&lt;/p&gt;</comment>
                            <comment id="20593" author="pichong" created="Mon, 3 Oct 2011 04:35:04 +0000"  >&lt;p&gt;I have uploaded again the files (in binary mode this time!).&lt;br/&gt;
ftp&amp;gt; ls &lt;br/&gt;
227 Entering Passive Mode (69,226,2,25,156,156).&lt;br/&gt;
150 Here comes the directory listing.&lt;br/&gt;
&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-    1 123      114            85 Sep 30 01:27 files.md5sum&lt;br/&gt;
&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-    1 123      114      80421429 Oct 03 01:24 vmcore3&lt;br/&gt;
&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-    1 123      114      118009159 Oct 03 01:13 vmlinux2&lt;br/&gt;
226 Directory send OK.&lt;/p&gt;</comment>
                            <comment id="20594" author="pichong" created="Mon, 3 Oct 2011 04:37:07 +0000"  >&lt;p&gt;What is the path of the truncate that possibly does not complete ?&lt;br/&gt;
I can get some traces since I reproduce the issue with full traces enabled.&lt;/p&gt;</comment>
                            <comment id="20613" author="pichong" created="Mon, 3 Oct 2011 09:11:58 +0000"  >&lt;p&gt;I have uploaded the lustre trace logs of the system where application hang.&lt;br/&gt;
ftp&amp;gt; ls&lt;br/&gt;
&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-    1 123      114         14582 Oct 03 05:08 crash.txt&lt;br/&gt;
&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-    1 123      114      576956449 Oct 03 06:07 traces.txt&lt;br/&gt;
The crash.txt file contains context information (PID, file inode, locked page) since it is not the same instance of the problem than the uploaded vmcore file.&lt;/p&gt;
</comment>
                            <comment id="20617" author="pichong" created="Mon, 3 Oct 2011 10:09:13 +0000"  >&lt;p&gt;Finally, I have also uploaded the dump file (named vmcore5) in sync with the traces.txt file.&lt;/p&gt;

&lt;p&gt;&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-    1 123      114      576956449 Oct 03 06:07 traces.txt&lt;br/&gt;
&lt;del&gt;rw-r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;-    1 123      114      104198548 Oct 03 06:51 vmcore5&lt;/p&gt;</comment>
                            <comment id="20731" author="jay" created="Mon, 3 Oct 2011 18:31:22 +0000"  >&lt;p&gt;Can you please check if you include the patch lu-148 in your code?&lt;/p&gt;

&lt;p&gt;commit 59c1a8e7cd69c31bce09695681e2c9f889fed567&lt;br/&gt;
Author: Jinshan Xiong &amp;lt;jay@whamcloud.com&amp;gt;&lt;br/&gt;
Date:   Sat Mar 19 12:44:03 2011 -0700&lt;/p&gt;

&lt;p&gt;    &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-148&quot; title=&quot;ll_readpage has to unlock vmpage by any means&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-148&quot;&gt;&lt;del&gt;LU-148&lt;/del&gt;&lt;/a&gt; ll_readpage has to unlock vmpage by any means&lt;/p&gt;

&lt;p&gt;    Unlock vmpage in case ll_cl_init fails.&lt;/p&gt;

&lt;p&gt;It looks like this program is using fadvise(2) to drop cache, and use WILLNEED to read ahead. Unfortunately this isn&apos;t well supported by lustre.&lt;/p&gt;</comment>
                            <comment id="20732" author="jay" created="Mon, 3 Oct 2011 18:35:47 +0000"  >&lt;p&gt;BTW, I really appreciate for you guys to collect log and crash dump in this professional way, it helps debug a lot.&lt;/p&gt;</comment>
                            <comment id="20760" author="pichong" created="Tue, 4 Oct 2011 16:09:08 +0000"  >&lt;p&gt;Thanks a lot, the problem has been fixed with the patch for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-148&quot; title=&quot;ll_readpage has to unlock vmpage by any means&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-148&quot;&gt;&lt;del&gt;LU-148&lt;/del&gt;&lt;/a&gt;.&lt;br/&gt;
I think, this LU can be made duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-148&quot; title=&quot;ll_readpage has to unlock vmpage by any means&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-148&quot;&gt;&lt;del&gt;LU-148&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="20763" author="jay" created="Tue, 4 Oct 2011 17:51:29 +0000"  >&lt;p&gt;lu-148&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvhw7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>6548</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>