<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:33:43 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-3416] Hangs on write in osc_enter_cache()</title>
                <link>https://jira.whamcloud.com/browse/LU-3416</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;It looks like &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2576&quot; title=&quot;Hangs in osc_enter_cache due to dirty pages not being flushed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2576&quot;&gt;&lt;del&gt;LU-2576&lt;/del&gt;&lt;/a&gt; is back again.  The problem went away for a while, seemingly thanks to the patch from &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2576&quot; title=&quot;Hangs in osc_enter_cache due to dirty pages not being flushed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2576&quot;&gt;&lt;del&gt;LU-2576&lt;/del&gt;&lt;/a&gt;.  I note that a later fix for stack overflow in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2859&quot; title=&quot;Stack overflow issues on x86_64 clients in memory reclaim during writes&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2859&quot;&gt;&lt;del&gt;LU-2859&lt;/del&gt;&lt;/a&gt; changed the same lines where the fix was applied, so perhaps that reintroduced the problem?&lt;/p&gt;

&lt;p&gt;We are seeing hangs during writes on BG/Q hardware.  We find tasks that appear to be stuck sleeping indefinitely here:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2013-05-29 16:15:36.363547  sysiod        S 00000fffa85363bc     0  4111   3070 0x00000000
2013-05-29 16:15:36.363582  Call Trace:
2013-05-29 16:15:36.363617  [c0000002e1e12440] [c000000302e95cc8] 0xc000000302e95cc8 (unreliable)
2013-05-29 16:15:36.363653  [c0000002e1e12610] [c000000000008de0] .__switch_to+0xc4/0x100
2013-05-29 16:15:36.363688  [c0000002e1e126a0] [c00000000044dc68] .schedule+0x858/0x9c0
2013-05-29 16:15:36.363723  [c0000002e1e12950] [80000000004820a0] .cfs_waitq_wait+0x10/0x30 [libcfs]
2013-05-29 16:15:36.363758  [c0000002e1e129c0] [80000000015b3ccc] .osc_enter_cache+0xb6c/0x1410 [osc]
2013-05-29 16:15:36.363793  [c0000002e1e12ba0] [80000000015bbf30] .osc_queue_async_io+0xcd0/0x2690 [osc]
2013-05-29 16:15:36.363828  [c0000002e1e12db0] [8000000001598598] .osc_page_cache_add+0xf8/0x2a0 [osc]
2013-05-29 16:15:36.363863  [c0000002e1e12e70] [8000000000a04248] .cl_page_cache_add+0xf8/0x420 [obdclass]
2013-05-29 16:15:36.363898  [c0000002e1e12fa0] [800000000179ed28] .lov_page_cache_add+0xc8/0x340 [lov]
2013-05-29 16:15:36.363934  [c0000002e1e13070] [8000000000a04248] .cl_page_cache_add+0xf8/0x420 [obdclass]
2013-05-29 16:15:36.363968  [c0000002e1e131a0] [8000000001d2ac74] .vvp_io_commit_write+0x464/0x910 [lustre]
2013-05-29 16:15:36.364003  [c0000002e1e132c0] [8000000000a1df6c] .cl_io_commit_write+0x11c/0x2d0 [obdclass]
2013-05-29 16:15:36.364038  [c0000002e1e13380] [8000000001cebc00] .ll_commit_write+0x120/0x3e0 [lustre]
2013-05-29 16:15:36.364074  [c0000002e1e13450] [8000000001d0f134] .ll_write_end+0x34/0x80 [lustre]
2013-05-29 16:15:36.364109  [c0000002e1e134e0] [c000000000097238] .generic_file_buffered_write+0x1f4/0x388
2013-05-29 16:15:36.364143  [c0000002e1e13620] [c000000000097928] .__generic_file_aio_write+0x374/0x3d8
2013-05-29 16:15:36.364178  [c0000002e1e13720] [c000000000097a04] .generic_file_aio_write+0x78/0xe8
2013-05-29 16:15:36.364213  [c0000002e1e137d0] [8000000001d2df00] .vvp_io_write_start+0x170/0x3b0 [lustre]
2013-05-29 16:15:36.364248  [c0000002e1e138a0] [8000000000a1849c] .cl_io_start+0xcc/0x220 [obdclass]
2013-05-29 16:15:36.364283  [c0000002e1e13940] [8000000000a202a4] .cl_io_loop+0x194/0x2c0 [obdclass]
2013-05-29 16:15:36.364317  [c0000002e1e139f0] [8000000001ca0780] .ll_file_io_generic+0x4f0/0x850 [lustre]
2013-05-29 16:15:36.364352  [c0000002e1e13b30] [8000000001ca0f64] .ll_file_aio_write+0x1d4/0x3a0 [lustre]
2013-05-29 16:15:36.364387  [c0000002e1e13c00] [8000000001ca1280] .ll_file_write+0x150/0x320 [lustre]
2013-05-29 16:15:36.364422  [c0000002e1e13ce0] [c0000000000d4328] .vfs_write+0xd0/0x1c4
2013-05-29 16:15:36.364458  [c0000002e1e13d80] [c0000000000d4518] .SyS_write+0x54/0x98
2013-05-29 16:15:36.364492  [c0000002e1e13e30] [c000000000000580] syscall_exit+0x0/0x2c
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This was with Lustre &lt;a href=&quot;https://github.com/chaos/lustre/tree/2.4.0-RC1_3chaos&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;2.4.0-RC1_3chaos&lt;/a&gt;.&lt;/p&gt;</description>
                <environment>Lustre 2.4.0-RC1_3chaos.  &lt;a href=&quot;https://github.com/chaos/lustre/tree/2.4.0-RC1_3chaos,&quot;&gt;https://github.com/chaos/lustre/tree/2.4.0-RC1_3chaos,&lt;/a&gt; ZFS servers</environment>
        <key id="19217">LU-3416</key>
            <summary>Hangs on write in osc_enter_cache()</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="niu">Niu Yawei</assignee>
                                    <reporter username="morrone">Christopher Morrone</reporter>
                        <labels>
                            <label>mq313</label>
                    </labels>
                <created>Thu, 30 May 2013 00:07:14 +0000</created>
                <updated>Thu, 20 Oct 2022 13:42:31 +0000</updated>
                            <resolved>Thu, 29 Aug 2013 07:00:03 +0000</resolved>
                                    <version>Lustre 2.4.0</version>
                    <version>Lustre 2.4.1</version>
                                    <fixVersion>Lustre 2.5.0</fixVersion>
                    <fixVersion>Lustre 2.4.2</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>12</watches>
                                                                            <comments>
                            <comment id="59594" author="morrone" created="Thu, 30 May 2013 00:36:48 +0000"  >&lt;p&gt;Just as reported in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2576&quot; title=&quot;Hangs in osc_enter_cache due to dirty pages not being flushed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2576&quot;&gt;&lt;del&gt;LU-2576&lt;/del&gt;&lt;/a&gt;, the processes will wake up and make progress if a sysadmin issues:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;echo 3 &amp;gt; /proc/sys/vm/drop_caches&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="59598" author="niu" created="Thu, 30 May 2013 03:12:44 +0000"  >&lt;p&gt;Hi, Chris&lt;/p&gt;

&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;Is the fix of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2909&quot; title=&quot;Failure on test suite sanity-benchmark test_fsx: fsx bus error, core dumped&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2909&quot;&gt;&lt;del&gt;LU-2909&lt;/del&gt;&lt;/a&gt; also applied?&lt;/li&gt;
&lt;/ul&gt;


&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;Is there any lli_write_mutex changes in your branch, like &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3261&quot; title=&quot;Threads stuck in osc_enter_cache when testing LU-1669&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3261&quot;&gt;&lt;del&gt;LU-3261&lt;/del&gt;&lt;/a&gt;?&lt;/li&gt;
&lt;/ul&gt;


&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;What&apos;s the &apos;cur_dirty_bytes&apos; &amp;amp; &apos;max_dirty_mb&apos; of each OSC? (this can be checked by the OSC proc file)&lt;/li&gt;
&lt;/ul&gt;


&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;What&apos;s the &apos;dirty_expire_centisecs&apos; of your system? Is the thread being stuck longer than &apos;dirty_expire_centisecs&apos; (30 seconds by defatult)?&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Thanks.&lt;/p&gt;</comment>
                            <comment id="59675" author="morrone" created="Thu, 30 May 2013 18:18:54 +0000"  >&lt;blockquote&gt;&lt;p&gt;Is the fix of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2909&quot; title=&quot;Failure on test suite sanity-benchmark test_fsx: fsx bus error, core dumped&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2909&quot;&gt;&lt;del&gt;LU-2909&lt;/del&gt;&lt;/a&gt; also applied?&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;I pointed you to the exact tag that we are running.  Our patches in the branch are clearly based at 2.4.0-RC1.  So yes, of course we have that fix applied.  You guys applied it.&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;Is there any lli_write_mutex changes in your branch, like &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3261&quot; title=&quot;Threads stuck in osc_enter_cache when testing LU-1669&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3261&quot;&gt;&lt;del&gt;LU-3261&lt;/del&gt;&lt;/a&gt;?&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Again, see the tag that I included in the bug report.  This branch does not include any experimental work to improve the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1669&quot; title=&quot;lli-&amp;gt;lli_write_mutex (single shared file performance)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1669&quot;&gt;&lt;del&gt;LU-1669&lt;/del&gt;&lt;/a&gt; problem, which is what Prakash was talking about in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3261&quot; title=&quot;Threads stuck in osc_enter_cache when testing LU-1669&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3261&quot;&gt;&lt;del&gt;LU-3261&lt;/del&gt;&lt;/a&gt;.  I also ran:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;git diff 2.4.0-RC1 2.4.0-RC1_3chaos | grep lli_write_mutex&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and it did not report any results, so no direct changes that I am aware of.&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;What&apos;s the &apos;cur_dirty_bytes&apos; &amp;amp; &apos;max_dirty_mb&apos; of each OSC? (this can be checked by the OSC proc file)&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;max_dirty_mb is 32 for all OSC.  I didn&apos;t get cur_dirty_bytes.  I did capture dump_page_cache.  See attached.  Keep in mind that BG/Q is ppc64, and pages are 64K.&lt;/p&gt;

&lt;p&gt;Looks like there were 7129 dirty pages.  At 64K each, that is 456,256 KiB.  There are 16 OSTs, so on average there there were 28,516 KiB dirty.  It seems pretty likely that some of the OSTs were at the max_dirty_mb limit.&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;What&apos;s the &apos;dirty_expire_centisecs&apos; of your system? Is the thread being stuck longer than &apos;dirty_expire_centisecs&apos; (30 seconds by defatult)?&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;3000&lt;/p&gt;

</comment>
                            <comment id="59678" author="morrone" created="Thu, 30 May 2013 18:29:40 +0000"  >&lt;p&gt;And yes, the threads are stuck for much, much longer than the 30 second dirty_export_centiseconds value.  As I said, they appear to be stuck indefinitely.   My guess is that they will never make progress without external intervention.  They appeared to be stuck for at least 2 hours before I intervened by triggering drop_caches.&lt;/p&gt;</comment>
                            <comment id="59714" author="morrone" created="Fri, 31 May 2013 00:28:21 +0000"  >&lt;p&gt;A &quot;sync&quot; command on the command line also successfully clears the outstanding dirty data.&lt;/p&gt;

&lt;p&gt;Here is cur_dirty_bytes for one of the clients that is hanging on write:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;osc.lsrzb-OST0001-osc-c0000003c6ed3400.cur_dirty_bytes=27852800
osc.lsrzb-OST0002-osc-c0000003c6ed3400.cur_dirty_bytes=25755648
osc.lsrzb-OST0003-osc-c0000003c6ed3400.cur_dirty_bytes=28246016
osc.lsrzb-OST0004-osc-c0000003c6ed3400.cur_dirty_bytes=26935296
osc.lsrzb-OST0005-osc-c0000003c6ed3400.cur_dirty_bytes=26411008
osc.lsrzb-OST0006-osc-c0000003c6ed3400.cur_dirty_bytes=29229056
osc.lsrzb-OST0007-osc-c0000003c6ed3400.cur_dirty_bytes=26083328
osc.lsrzb-OST0008-osc-c0000003c6ed3400.cur_dirty_bytes=27459584
osc.lsrzb-OST0009-osc-c0000003c6ed3400.cur_dirty_bytes=27787264
osc.lsrzb-OST000a-osc-c0000003c6ed3400.cur_dirty_bytes=29163520
osc.lsrzb-OST000b-osc-c0000003c6ed3400.cur_dirty_bytes=27656192
osc.lsrzb-OST000c-osc-c0000003c6ed3400.cur_dirty_bytes=27131904
osc.lsrzb-OST000d-osc-c0000003c6ed3400.cur_dirty_bytes=25362432
osc.lsrzb-OST000e-osc-c0000003c6ed3400.cur_dirty_bytes=27525120
osc.lsrzb-OST000f-osc-c0000003c6ed3400.cur_dirty_bytes=27197440
osc.lsrzb-OST0010-osc-c0000003c6ed3400.cur_dirty_bytes=27262976
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The dirty data is actually pretty evenly spread.  There were 512 files being written through this client at the time of the hang.&lt;/p&gt;

&lt;p&gt;I also note that the flush-lustre-1 thread is waking up regularly, but does not seem to result in any of the dirty data being written.  I enabled full lustre debugging watched flush-lustre-1 run for a bit in top, and then dumped the lustre log.  See attached file &lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/12969/12969_rzuseqio14_lustre_log.txt&quot; title=&quot;rzuseqio14_lustre_log.txt attached to LU-3416&quot;&gt;rzuseqio14_lustre_log.txt&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;.  The flush-lustre-1 process was PID 10295 at the time of the log dump.&lt;/p&gt;

&lt;p&gt;I haven&apos;t stepped through the code and logs long enough yet to understand why it is doing so much but failing to trigger the sending of the dirty data.&lt;/p&gt;</comment>
                            <comment id="59718" author="niu" created="Fri, 31 May 2013 03:48:22 +0000"  >&lt;p&gt;The log (rzuseqio14_lustre_log.txt) shows that ll_writepages() didn&apos;t trigger any dirty flush at all:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;...
00000008:00000001:21.0:1369956379.455717:3872:10295:0:(osc_io.c:611:osc_io_fsync_start()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; entered
00000008:00000001:21.0:1369956379.455724:4368:10295:0:(osc_cache.c:2914:osc_cache_writeback_range()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; entered
00000008:00000020:21.0:1369956379.455732:4368:10295:0:(osc_cache.c:3010:osc_cache_writeback_range()) obj c0000002eacf46e0 ready 0|-|- wr 15|-|- rd 0|- cache page out.
00000008:00000001:21.0:1369956379.455744:4496:10295:0:(osc_cache.c:3011:osc_cache_writeback_range()) &lt;span class=&quot;code-object&quot;&gt;Process&lt;/span&gt; leaving (rc=0 : 0 : 0)
...
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I think these dirty pages are supposed to be on the oo_urgent_exts, but seems they were not on either oo_hp_exts or oo_urgent_exts. Jinshan, any idea how this could happen?&lt;/p&gt;</comment>
                            <comment id="59819" author="jay" created="Fri, 31 May 2013 22:47:52 +0000"  >&lt;p&gt;Obviously there existed an active extent so that osc_io_fsync_start() didn&apos;t move it into urgent list.&lt;/p&gt;


&lt;p&gt;Hi Chris, just to clarify, both `echo 3 &amp;gt; drop_caches&apos; and sync will trigger it out of hang state?&lt;/p&gt;</comment>
                            <comment id="59820" author="jay" created="Fri, 31 May 2013 22:52:12 +0000"  >&lt;p&gt;It looks like this issue can only be seen on ppc clients, so let&apos;s drop the priority to not block 2.4 release while we&apos;re working on it.&lt;/p&gt;</comment>
                            <comment id="59821" author="morrone" created="Fri, 31 May 2013 22:54:28 +0000"  >&lt;blockquote&gt;&lt;p&gt;Hi Chris, just to clarify, both `echo 3 &amp;gt; drop_caches&apos; and sync will trigger it out of hang state?&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Yes, that is correct.&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;It looks like this issue can only be seen on ppc clients, so let&apos;s drop the priority to not block 2.4 release while we&apos;re working on it.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;I disagree.  Just because we discovered it on ppc clients doesn&apos;t mean it is a ppc-only problem.&lt;/p&gt;</comment>
                            <comment id="59822" author="jay" created="Fri, 31 May 2013 22:58:31 +0000"  >&lt;p&gt;Hi Chris, if you see this issue again, try if `echo 1 &amp;gt; drop_caches&apos; can also release the hang state. Also it will be helpful to check how much grants are there at the corresponding osc.&lt;/p&gt;</comment>
                            <comment id="59825" author="jay" created="Fri, 31 May 2013 23:32:59 +0000"  >&lt;p&gt;here is another global dirty page parameter: obd_dirty_pages and obd_max_dirty_pages. Can you please check it as well?&lt;/p&gt;</comment>
                            <comment id="59826" author="adilger" created="Fri, 31 May 2013 23:39:14 +0000"  >&lt;p&gt;Chris, how frequent are the hangs, and how many systems does this affect?  We&apos;ve had several days of SWL runs on recent Lustre tags without seeing anything similar on x86_64 clients.  While I don&apos;t want to minimize the importance of this issue, the main question is whether this problem is serious and widespread enough that it needs to block 2.4.0 from being released to any users, or whether it is possible to fix this after the 2.4.0 release?&lt;/p&gt;

&lt;p&gt;LLNL will be running a patched version of Lustre regardless of whether this bug is fixed before 2.4.0 or afterward, so if this is more prevalent with PPC clients or ZFS servers it will not affect a majority of Lustre users.  Also, by making Lustre 2.4.0 available to a wider user base the 2.4.0 code will get more testing and fixes, which helps LLNL as well.  This bug is the only one holding the 2.4.0 release at this point, so we&apos;d really prefer to drop it from the blockers list and release 2.4.0.&lt;/p&gt;</comment>
                            <comment id="59828" author="pjones" created="Fri, 31 May 2013 23:54:35 +0000"  >&lt;p&gt;Andreas&lt;/p&gt;

&lt;p&gt;I just gave Chris a call and he agrees that, while this may not necessarily be a PPC issue, that the amount of testing elsewhere demonstrates that this is a rare issue to hit and one that we can defer to 2.4.1.&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="59829" author="morrone" created="Fri, 31 May 2013 23:58:07 +0000"  >&lt;p&gt;Where would I find those?&lt;/p&gt;

&lt;p&gt;Here&apos;s what I found in proc:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;rzuseqio16-ib0@root:lctl get_param &apos;osc.*.cur_dirty_bytes&apos;
osc.lsrzb-OST0001-osc-c0000003c737a400.cur_dirty_bytes=27656192
osc.lsrzb-OST0002-osc-c0000003c737a400.cur_dirty_bytes=27983872
osc.lsrzb-OST0003-osc-c0000003c737a400.cur_dirty_bytes=30146560
osc.lsrzb-OST0004-osc-c0000003c737a400.cur_dirty_bytes=28508160
osc.lsrzb-OST0005-osc-c0000003c737a400.cur_dirty_bytes=28901376
osc.lsrzb-OST0006-osc-c0000003c737a400.cur_dirty_bytes=30277632
osc.lsrzb-OST0007-osc-c0000003c737a400.cur_dirty_bytes=29294592
osc.lsrzb-OST0008-osc-c0000003c737a400.cur_dirty_bytes=27394048
osc.lsrzb-OST0009-osc-c0000003c737a400.cur_dirty_bytes=29884416
osc.lsrzb-OST000a-osc-c0000003c737a400.cur_dirty_bytes=28835840
osc.lsrzb-OST000b-osc-c0000003c737a400.cur_dirty_bytes=29753344
osc.lsrzb-OST000c-osc-c0000003c737a400.cur_dirty_bytes=27262976
osc.lsrzb-OST000d-osc-c0000003c737a400.cur_dirty_bytes=29818880
osc.lsrzb-OST000e-osc-c0000003c737a400.cur_dirty_bytes=27590656
osc.lsrzb-OST000f-osc-c0000003c737a400.cur_dirty_bytes=28049408
osc.lsrzb-OST0010-osc-c0000003c737a400.cur_dirty_bytes=27983872
rzuseqio16-ib0@root:lctl get_param &apos;osc.*.cur_grant_bytes&apos;
osc.lsrzb-OST0001-osc-c0000003c737a400.cur_grant_bytes=33554432
osc.lsrzb-OST0002-osc-c0000003c737a400.cur_grant_bytes=51511296
osc.lsrzb-OST0003-osc-c0000003c737a400.cur_grant_bytes=41746432
osc.lsrzb-OST0004-osc-c0000003c737a400.cur_grant_bytes=36831232
osc.lsrzb-OST0005-osc-c0000003c737a400.cur_grant_bytes=38010880
osc.lsrzb-OST0006-osc-c0000003c737a400.cur_grant_bytes=35848192
osc.lsrzb-OST0007-osc-c0000003c737a400.cur_grant_bytes=33488896
osc.lsrzb-OST0008-osc-c0000003c737a400.cur_grant_bytes=37093376
osc.lsrzb-OST0009-osc-c0000003c737a400.cur_grant_bytes=63897600
osc.lsrzb-OST000a-osc-c0000003c737a400.cur_grant_bytes=35454976
osc.lsrzb-OST000b-osc-c0000003c737a400.cur_grant_bytes=37683200
osc.lsrzb-OST000c-osc-c0000003c737a400.cur_grant_bytes=33488896
osc.lsrzb-OST000d-osc-c0000003c737a400.cur_grant_bytes=34275328
osc.lsrzb-OST000e-osc-c0000003c737a400.cur_grant_bytes=33751040
osc.lsrzb-OST000f-osc-c0000003c737a400.cur_grant_bytes=35454976
osc.lsrzb-OST0010-osc-c0000003c737a400.cur_grant_bytes=40828928
rzuseqio16-ib0@root:lctl get_param &apos;osc.*.cur_lost_grant_bytes&apos;
osc.lsrzb-OST0001-osc-c0000003c737a400.cur_lost_grant_bytes=0
osc.lsrzb-OST0002-osc-c0000003c737a400.cur_lost_grant_bytes=0
osc.lsrzb-OST0003-osc-c0000003c737a400.cur_lost_grant_bytes=0
osc.lsrzb-OST0004-osc-c0000003c737a400.cur_lost_grant_bytes=1073803264
osc.lsrzb-OST0005-osc-c0000003c737a400.cur_lost_grant_bytes=0
osc.lsrzb-OST0006-osc-c0000003c737a400.cur_lost_grant_bytes=1073803264
osc.lsrzb-OST0007-osc-c0000003c737a400.cur_lost_grant_bytes=0
osc.lsrzb-OST0008-osc-c0000003c737a400.cur_lost_grant_bytes=1073803264
osc.lsrzb-OST0009-osc-c0000003c737a400.cur_lost_grant_bytes=0
osc.lsrzb-OST000a-osc-c0000003c737a400.cur_lost_grant_bytes=0
osc.lsrzb-OST000b-osc-c0000003c737a400.cur_lost_grant_bytes=0
osc.lsrzb-OST000c-osc-c0000003c737a400.cur_lost_grant_bytes=1073803264
osc.lsrzb-OST000d-osc-c0000003c737a400.cur_lost_grant_bytes=1073803264
osc.lsrzb-OST000e-osc-c0000003c737a400.cur_lost_grant_bytes=2147606528
osc.lsrzb-OST000f-osc-c0000003c737a400.cur_lost_grant_bytes=0
osc.lsrzb-OST0010-osc-c0000003c737a400.cur_lost_grant_bytes=0
rzuseqio16-ib0@root:lctl get_param &apos;osc.*.max_dirty_mb&apos;
osc.lsrzb-OST0001-osc-c0000003c737a400.max_dirty_mb=32
osc.lsrzb-OST0002-osc-c0000003c737a400.max_dirty_mb=32
osc.lsrzb-OST0003-osc-c0000003c737a400.max_dirty_mb=32
osc.lsrzb-OST0004-osc-c0000003c737a400.max_dirty_mb=32
osc.lsrzb-OST0005-osc-c0000003c737a400.max_dirty_mb=32
osc.lsrzb-OST0006-osc-c0000003c737a400.max_dirty_mb=32
osc.lsrzb-OST0007-osc-c0000003c737a400.max_dirty_mb=32
osc.lsrzb-OST0008-osc-c0000003c737a400.max_dirty_mb=32
osc.lsrzb-OST0009-osc-c0000003c737a400.max_dirty_mb=32
osc.lsrzb-OST000a-osc-c0000003c737a400.max_dirty_mb=32
osc.lsrzb-OST000b-osc-c0000003c737a400.max_dirty_mb=32
osc.lsrzb-OST000c-osc-c0000003c737a400.max_dirty_mb=32
osc.lsrzb-OST000d-osc-c0000003c737a400.max_dirty_mb=32
osc.lsrzb-OST000e-osc-c0000003c737a400.max_dirty_mb=32
osc.lsrzb-OST000f-osc-c0000003c737a400.max_dirty_mb=32
osc.lsrzb-OST0010-osc-c0000003c737a400.max_dirty_mb=32
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="59830" author="morrone" created="Sat, 1 Jun 2013 00:04:59 +0000"  >&lt;blockquote&gt;&lt;p&gt;Chris, how frequent are the hangs, and how many systems does this affect?&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Like Peter said, I don&apos;t particularly care about if this problem gets fixed in 2.4.0.  But it is a very frequent problem.  In fact, it may have never been fixed like we thought a few months ago.  I&apos;ve learned that a sysadmins was running &quot;echo 3 &amp;gt; /proc/sys/vm/drop_caches&quot; to get around the problem.  But I was not aware that it was still a problem.&lt;/p&gt;

&lt;p&gt;It affects all of our production systems that are running pre-2.4.  Of course, we&apos;re only doing that on BG/Q systems with ZFS.  &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;  I just don&apos;t want to call this a ppc issue until we understand it better.  I kind of suspect that it has more to do with the scale of simultaneous writers or something, rather than being endianness or page size related.  I could be wrong.  Too soon to say, really.&lt;/p&gt;</comment>
                            <comment id="59842" author="adilger" created="Sat, 1 Jun 2013 17:54:02 +0000"  >&lt;p&gt;Chris, I&apos;m just on my phone so can&apos;t verify, but I believe the parameters Jinshan is referencing are global parameters in llite or at the top level. &lt;/p&gt;</comment>
                            <comment id="59902" author="morrone" created="Mon, 3 Jun 2013 17:30:09 +0000"  >&lt;p&gt;This is from the same useqio16 node as last Friday.  Still hung in the same place after the full weekend.  The obd_dirty_pages number exactly matches the sum of the 16 cur_dirty_bytes numbers (keeping in mind that pages are 64K on this platform): 459,341,824 bytes.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;crash&amp;gt; print obd_dirty_pages
$8 = {
  counter = 7009
}
crash&amp;gt; print obd_max_dirty_pages
$9 = 131071
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="60003" author="morrone" created="Tue, 4 Jun 2013 23:34:26 +0000"  >&lt;p&gt;Any ideas?&lt;/p&gt;</comment>
                            <comment id="60018" author="niu" created="Wed, 5 Jun 2013 08:57:09 +0000"  >&lt;p&gt;Chirs, I was looking at related code these days, but didn&apos;t find any clue so far. btw, were there many truncate operations?&lt;/p&gt;</comment>
                            <comment id="60046" author="jay" created="Wed, 5 Jun 2013 17:39:33 +0000"  >&lt;p&gt;I can take a look at this issue if it&apos;s okay for you guys.&lt;/p&gt;</comment>
                            <comment id="60047" author="pjones" created="Wed, 5 Jun 2013 17:50:31 +0000"  >&lt;p&gt;Yes please Jinshan!&lt;/p&gt;</comment>
                            <comment id="60054" author="jay" created="Wed, 5 Jun 2013 18:32:19 +0000"  >&lt;p&gt;Please apply patch: &lt;a href=&quot;http://review.whamcloud.com/6554&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/6554&lt;/a&gt; to your tree to print accurate information when this problem happens. I tend to suspect a wakeup is missed somewhere but need to verify.&lt;/p&gt;</comment>
                            <comment id="60059" author="morrone" created="Wed, 5 Jun 2013 20:20:51 +0000"  >&lt;p&gt;Ok, I&apos;ll apply that and get the info.&lt;/p&gt;

&lt;p&gt;But there certainly seems to be an additional problem here.  The Lustre client seems to be rather poorly integrated with the kernel&apos;s way of doing things.  The kernel is clearly trying to write out these dirty pages, and Lustre is completely ignores the kernel.  That is Bad.  I would like to see that problem fixed as well.&lt;/p&gt;</comment>
                            <comment id="60061" author="jay" created="Wed, 5 Jun 2013 20:37:58 +0000"  >&lt;p&gt;Hi Chris, not sure I understand. You mean the dirty page won&apos;t be written back if EDQUOT is returned? Actually in vvp_io_commit_write(), if EDQUOT is seen, Lustre will try to write the page in sync mode.&lt;/p&gt;</comment>
                            <comment id="60062" author="morrone" created="Wed, 5 Jun 2013 20:50:10 +0000"  >&lt;p&gt;I don&apos;t know what EDQUOT is, so no.  I am referring to the kernel&apos;s normal BDI mechanism for regularly pushing dirty to disk.  The calls are being made into Lustre, but Lustre decides to ignore them.  I don&apos;t understand the whole call stack yet (by a long shot), but clearly the kernel&apos;s normal mechanisms get the Lustre code into osc_cache_writeback_range, where it notes that dirty pages exist.  However, lustre chooses to do absolutely nothing with that information, rather than obeying the kernel&apos;s request to write out the dirty pages.  That has got to have all kinds of bad implications for low memory situations on nodes.  And not too surprising, we &lt;em&gt;have&lt;/em&gt; all kinds of memory problems with the new Lustre client.&lt;/p&gt;</comment>
                            <comment id="60069" author="jay" created="Wed, 5 Jun 2013 22:17:38 +0000"  >&lt;p&gt;Do you mean the call path of -&amp;gt;writepage()? In this case, it will mark the dirty page as urgent and then wait for it to be written ll_writepage().&lt;/p&gt;

&lt;p&gt;Yes, I agree that the current client has problem of memory and performance, and I will give you an explanation soon.&lt;/p&gt;</comment>
                            <comment id="60125" author="morrone" created="Thu, 6 Jun 2013 22:19:06 +0000"  >&lt;p&gt;Jinshan, your patch does hit ten minutes after the hang.  Here are the 6 lines that popped out on the console from one of the clients:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2013-06-06 15:05:48.739336 {DefaultControlEventListener} [mmcs]{22}.1.1: LustreError: 4820:0:(osc_cache.c:1559:osc_enter_cache()) lsrzb-OST000d-osc-c0000003c7540c00: {
 dirty: 33554432/33554432 dirty_pages: 6441/131071 unstable_pages: 0/131071 dropped: 2147606528 avail: 27459584, reserved: 0, flight: 0 } try to reserve 65536.
2013-06-06 15:05:48.740174 {DefaultControlEventListener} [mmcs]{22}.4.0: LustreError: 4726:0:(osc_cache.c:1559:osc_enter_cache()) lsrzb-OST000d-osc-c0000003c7540c00: {
 dirty: 33554432/33554432 dirty_pages: 6441/131071 unstable_pages: 0/131071 dropped: 2147606528 avail: 27459584, reserved: 0, flight: 0 } try to reserve 65536.
2013-06-06 15:05:48.740565 {DefaultControlEventListener} [mmcs]{22}.2.3: LustreError: 4751:0:(osc_cache.c:1559:osc_enter_cache()) lsrzb-OST000d-osc-c0000003c7540c00: {
 dirty: 33554432/33554432 dirty_pages: 6441/131071 unstable_pages: 0/131071 dropped: 0 avail: 27459584, reserved: 0, flight: 0 } try to reserve 65536.
2013-06-06 15:05:48.740946 {DefaultControlEventListener} [mmcs]{22}.7.2: LustreError: 4733:0:(osc_cache.c:1559:osc_enter_cache()) lsrzb-OST000d-osc-c0000003c7540c00: {
 dirty: 33554432/33554432 dirty_pages: 6441/131071 unstable_pages: 0/131071 dropped: 0 avail: 27459584, reserved: 0, flight: 0 } try to reserve 65536.
2013-06-06 15:05:48.741355 {DefaultControlEventListener} [mmcs]{22}.13.0: LustreError: 4830:0:(osc_cache.c:1559:osc_enter_cache()) lsrzb-OST000d-osc-c0000003c7540c00: 
{ dirty: 33554432/33554432 dirty_pages: 6441/131071 unstable_pages: 0/131071 dropped: 0 avail: 27459584, reserved: 0, flight: 0 } try to reserve 65536.
2013-06-06 15:05:48.741649 {DefaultControlEventListener} [mmcs]{22}.10.0: LustreError: 4724:0:(osc_cache.c:1559:osc_enter_cache()) lsrzb-OST000d-osc-c0000003c7540c00: { dirty: 33554432/33554432 dirty_pages: 6441/131071 unstable_pages: 0/131071 dropped: 0 avail: 27459584, reserved: 0, flight: 1 } try to reserve 65536.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You patch seems to keep data flowing.  But with 10 minute pauses here and there, of course. &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;

&lt;p&gt;Do you understand how to fix this, then?&lt;/p&gt;

&lt;p&gt;Also, be aware that we are using the following settings on our BG/Q clients:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;sysctl vm.min_free_kbytes=1048576
/usr/sbin/lctl set_param &quot;llite.*.max_cached_mb=4096&quot;
/usr/sbin/lctl set_param &quot;llite.*.max_read_ahead_mb=10&quot;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="60132" author="jay" created="Thu, 6 Jun 2013 23:58:33 +0000"  >&lt;p&gt;About the cache, another useful information would be:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;lctl get_param osc.*.osc_cached_mb
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;so that I know how busy it is.&lt;/p&gt;</comment>
                            <comment id="60133" author="jay" created="Fri, 7 Jun 2013 00:25:34 +0000"  >&lt;p&gt;Apparently I should have adjusted cl_dirty in osc_unreserve_grant() as well, working on a patch...&lt;/p&gt;</comment>
                            <comment id="60136" author="jay" created="Fri, 7 Jun 2013 01:14:01 +0000"  >&lt;p&gt;Can you please try patch set 2 of &lt;a href=&quot;http://review.whamcloud.com/6554?&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/6554?&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="60137" author="morrone" created="Fri, 7 Jun 2013 01:19:05 +0000"  >&lt;p&gt;Yes, I&apos;ll try it out.  Thanks, Jinshan.&lt;/p&gt;</comment>
                            <comment id="60139" author="morrone" created="Fri, 7 Jun 2013 02:11:55 +0000"  >&lt;p&gt;The problem seems to have disappeared with Patch Set 2 in place.&lt;/p&gt;

&lt;p&gt;But now write performance is unacceptably bad.  Four BG/Q clients are only pushing a &lt;em&gt;total&lt;/em&gt; of around 720 MB/s (180MB/S each).  That number should be closer to 1GB/s per client.&lt;/p&gt;</comment>
                            <comment id="60141" author="jay" created="Fri, 7 Jun 2013 03:13:24 +0000"  >&lt;p&gt;this patch should have nothing to do with the performance. Maybe you can try to roll back to unpatched version and see if it&apos;s due to env settings. Of course, we can simplify the problem by using dd to check performance.&lt;/p&gt;

&lt;p&gt;If you can still see the performance drop, please try to collect a log with D_CACHE enable (lctl set_param debug=cache), I will take a look to make sure it&apos;s in the correct path.&lt;/p&gt;</comment>
                            <comment id="60195" author="morrone" created="Fri, 7 Jun 2013 21:15:21 +0000"  >&lt;p&gt;I double checked, and I am certain that the patch is the only difference.  FYI, I&apos;ve moved to Lustre version &lt;a href=&quot;https://github.com/chaos/lustre/tree/2.4.0-RC2_2chaos&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;2.4.0-RC2_2chaos&lt;/a&gt; on both clients and servers.&lt;/p&gt;

&lt;p&gt;Without the patch, I see write throughput burst to 4.4GB/s intially (as seen in ltop), and then taper off to a fairly constant rate in the low 3GB/s range.&lt;/p&gt;

&lt;p&gt;With the patch, I see 720 MB/s. (These are aggregate throughput for 4 BG/Q clients.)&lt;/p&gt;

&lt;p&gt;Keep in mind that our tree has the unstable page tracking changes.  I don&apos;t know if there is a relation there or not, but you should be aware.&lt;/p&gt;

&lt;p&gt;Without your patch, at the beginning of an IOR write phase, you might see something like this:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;rzuseqio13-ib0@morrone:lctl get_param &quot;osc.*.cur_dirty_bytes&quot;
osc.lsrzb-OST0001-osc-c0000003e18f6400.cur_dirty_bytes=33554432
osc.lsrzb-OST0002-osc-c0000003e18f6400.cur_dirty_bytes=33554432
osc.lsrzb-OST0003-osc-c0000003e18f6400.cur_dirty_bytes=33554432
osc.lsrzb-OST0004-osc-c0000003e18f6400.cur_dirty_bytes=33554432
osc.lsrzb-OST0005-osc-c0000003e18f6400.cur_dirty_bytes=33554432
osc.lsrzb-OST0006-osc-c0000003e18f6400.cur_dirty_bytes=33161216
osc.lsrzb-OST0007-osc-c0000003e18f6400.cur_dirty_bytes=33554432
osc.lsrzb-OST0008-osc-c0000003e18f6400.cur_dirty_bytes=32899072
osc.lsrzb-OST0009-osc-c0000003e18f6400.cur_dirty_bytes=33554432
osc.lsrzb-OST000a-osc-c0000003e18f6400.cur_dirty_bytes=33554432
osc.lsrzb-OST000b-osc-c0000003e18f6400.cur_dirty_bytes=33554432
osc.lsrzb-OST000c-osc-c0000003e18f6400.cur_dirty_bytes=33030144
osc.lsrzb-OST000d-osc-c0000003e18f6400.cur_dirty_bytes=33554432
osc.lsrzb-OST000e-osc-c0000003e18f6400.cur_dirty_bytes=12582912
osc.lsrzb-OST000f-osc-c0000003e18f6400.cur_dirty_bytes=33554432
osc.lsrzb-OST0010-osc-c0000003e18f6400.cur_dirty_bytes=33554432
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And that seems fairly normal.&lt;/p&gt;

&lt;p&gt;But &lt;em&gt;with&lt;/em&gt; patch set 2 of &lt;a href=&quot;http://review.whamcloud.com/6554&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;6554&lt;/a&gt; applied I see these bogus numbers:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;rzuseqlac2:~/BG-191$ rsh rzuseqio13 &apos;/usr/sbin/lctl get_param &quot;osc.*.cur_dirty_bytes&quot;&apos;
osc.lsrzb-OST0001-osc-c0000003ec957c00.cur_dirty_bytes=18446744073707454464
osc.lsrzb-OST0002-osc-c0000003ec957c00.cur_dirty_bytes=18446744073707454464
osc.lsrzb-OST0003-osc-c0000003ec957c00.cur_dirty_bytes=18446744073707454464
osc.lsrzb-OST0004-osc-c0000003ec957c00.cur_dirty_bytes=18446744073703260160
osc.lsrzb-OST0005-osc-c0000003ec957c00.cur_dirty_bytes=18446744073707454464
osc.lsrzb-OST0006-osc-c0000003ec957c00.cur_dirty_bytes=18446744073707454464
osc.lsrzb-OST0007-osc-c0000003ec957c00.cur_dirty_bytes=18446744073706930176
osc.lsrzb-OST0008-osc-c0000003ec957c00.cur_dirty_bytes=18446744073707454464
osc.lsrzb-OST0009-osc-c0000003ec957c00.cur_dirty_bytes=18446744073707454464
osc.lsrzb-OST000a-osc-c0000003ec957c00.cur_dirty_bytes=18446744073707454464
osc.lsrzb-OST000b-osc-c0000003ec957c00.cur_dirty_bytes=18446744073706536960
osc.lsrzb-OST000c-osc-c0000003ec957c00.cur_dirty_bytes=18446744073707454464
osc.lsrzb-OST000d-osc-c0000003ec957c00.cur_dirty_bytes=18446744073707454464
osc.lsrzb-OST000e-osc-c0000003ec957c00.cur_dirty_bytes=18446744073707454464
osc.lsrzb-OST000f-osc-c0000003ec957c00.cur_dirty_bytes=18446744073707454464
osc.lsrzb-OST0010-osc-c0000003ec957c00.cur_dirty_bytes=18446744073707454464
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I&apos;m guessing that those large numbers are simply an arithmetic problem, where we&apos;re allowing the number to become negative.  That somehow results in very slow write throughput, perhaps because all operations are not completely synchronous?&lt;/p&gt;</comment>
                            <comment id="60197" author="jay" created="Fri, 7 Jun 2013 21:23:15 +0000"  >&lt;p&gt;ah yes, indeed. The dirty bytes are actually negative numbers. I&apos;ll check it.&lt;/p&gt;</comment>
                            <comment id="60199" author="jay" created="Fri, 7 Jun 2013 21:47:17 +0000"  >&lt;p&gt;I found the problem. Working on a patch.&lt;/p&gt;</comment>
                            <comment id="60299" author="jay" created="Mon, 10 Jun 2013 20:59:44 +0000"  >&lt;p&gt;Will you please try patch set 3 and see if it helps.&lt;/p&gt;</comment>
                            <comment id="60315" author="morrone" created="Tue, 11 Jun 2013 01:30:17 +0000"  >&lt;p&gt;Performance (and presumably the negative number) problem is gone, but it is still hanging for ten minutes like with patch one.  I&apos;ll get post some info tomorrow.&lt;/p&gt;</comment>
                            <comment id="60316" author="morrone" created="Tue, 11 Jun 2013 01:48:53 +0000"  >&lt;p&gt;Here&apos;s output from one of the debug messages before I leave for the day:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;LustreError: 3709:0:(osc_cache.c:1563:osc_enter_cache()) lsrzb-OST0004-osc-c0000003c70ca400: grant { dirty: 33554432/33554432 dirty_pages: 6934/131071 unstable_pages: 0/131071 dropped: 0 avail: 45613056, reserved: 0, flight: 0 } lru {in list: 2407, left: 515, waiters: 0 }try to reserve 65536.
LustreError: 3736:0:(osc_cache.c:1563:osc_enter_cache()) lsrzb-OST0004-osc-c0000003c70ca400: grant { dirty: 33554432/33554432 dirty_pages: 6934/131071 unstable_pages: 0/131071 dropped: 0 avail: 45613056, reserved: 0, flight: 0 } lru {in list: 2407, left: 515, waiters: 0 }try to reserve 65536.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="60317" author="jay" created="Tue, 11 Jun 2013 02:34:34 +0000"  >&lt;p&gt;This is awkward - I thought it must be sleeping for LRU slots while holding some dirty pages. The problem is obvious: cl_dirty couldn&apos;t be deducted for some reason, one possible reason would the ptlrpc threads were blocked somewhere so brw_interpret couldn&apos;t be called, or the processes are waiting for something while holding active osc_extents. Can you please take a look at backtrace of running processes and tell me if there exists any other process is blocking on something?&lt;/p&gt;

&lt;p&gt;Also, I worked out patch set 4 which dumps the extent tree when this problem happens. This can help us isolate the root cause.&lt;/p&gt;</comment>
                            <comment id="60392" author="morrone" created="Tue, 11 Jun 2013 21:29:10 +0000"  >&lt;p&gt;See new attachment &lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/13038/13038_rzuseqio15_console.txt.bz2&quot; title=&quot;rzuseqio15_console.txt.bz2 attached to LU-3416&quot;&gt;rzuseqio15_console.txt.bz2&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;.  It shows backtraces from all processes after a hang began, and then the messages that were printed by patch set 4.&lt;/p&gt;</comment>
                            <comment id="60402" author="jay" created="Wed, 12 Jun 2013 00:09:09 +0000"  >&lt;p&gt;Can you please try patch set 5?&lt;/p&gt;</comment>
                            <comment id="60410" author="morrone" created="Wed, 12 Jun 2013 00:45:44 +0000"  >&lt;p&gt;Will do.&lt;/p&gt;</comment>
                            <comment id="60415" author="morrone" created="Wed, 12 Jun 2013 02:10:54 +0000"  >&lt;p&gt;I have been running the reproducer, and haven&apos;t had a single hang yet with patch set 5 applied.  I will leave it running over night.&lt;/p&gt;</comment>
                            <comment id="60483" author="jay" created="Wed, 12 Jun 2013 21:55:42 +0000"  >&lt;p&gt;No news assumes good news, no?&lt;/p&gt;

&lt;p&gt;If this patch can survive, can you please do me a favor to roll back this patch, make the hang happen, and collect logs when doing `echo 3 &amp;gt; drop_caches&apos;, I don&apos;t understand why this can make it go forward?&lt;/p&gt;</comment>
                            <comment id="60495" author="morrone" created="Thu, 13 Jun 2013 00:00:37 +0000"  >&lt;p&gt;Yes, I haven&apos;t been able to reproduce the problem again.  Performance is in the accepatble range.&lt;/p&gt;

&lt;p&gt;I should be able to get that log tomorrow.&lt;/p&gt;</comment>
                            <comment id="60724" author="morrone" created="Fri, 14 Jun 2013 22:51:34 +0000"  >&lt;p&gt;See the attached rzuseq*_drop_caches.bz2 files.  When I got a write hange, I enabled full lustre debugging, issued an &apos;echo 2 &amp;gt; drop_caches&apos;, and waited a few seconds, and then dumped the logs on all four clients.&lt;/p&gt;

&lt;p&gt;Hopefully there is something useful in there.&lt;/p&gt;</comment>
                            <comment id="61006" author="jay" created="Fri, 21 Jun 2013 16:22:54 +0000"  >&lt;p&gt;Hi Chris, can you please try the last patch set? If this works, I will land it into master.&lt;/p&gt;</comment>
                            <comment id="61013" author="jay" created="Fri, 21 Jun 2013 17:19:16 +0000"  >&lt;p&gt;I took a look at the log but unfortunately I didn&apos;t find anything interesting.&lt;/p&gt;

&lt;p&gt;For example, for log rzuseqio15_drop_caches.txt, obviously process 15336 is the drop_caches process and 15302 is flush process. So I tried to dig activities other than these two processes by:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;[jinxiong@intel tmp]$ cat rzuseqio15_drop_caches.txt |grep -v &lt;span class=&quot;code-quote&quot;&gt;&apos;15336:&apos;&lt;/span&gt; |grep -v &lt;span class=&quot;code-quote&quot;&gt;&apos;:15302:&apos;&lt;/span&gt; |grep cl_io
[jinxiong@intel tmp]$ 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;but nothing is printed out. I don&apos;t know why this can happen because the pages are already existing but no process is related to them. It will be helpful to figure out which process was stuck before dropping caches.&lt;/p&gt;

&lt;p&gt;The purpose of doing this is to fully understand why dropping cache can make the hung process go forward. Though we can fix the problem by latest patch, it&apos;s really bad if we don&apos;t understand a code path clearly.&lt;/p&gt;</comment>
                            <comment id="61127" author="morrone" created="Mon, 24 Jun 2013 20:50:34 +0000"  >&lt;p&gt;It takes quite a while to run drop_caches.  I fear that getting the log of the stuck processes will be quite difficult with the other threads swamping the logs.  But I&apos;ll keep this as a background task when I have time.&lt;/p&gt;

&lt;p&gt;In the mean time, I&apos;ll update to the latest patch.&lt;/p&gt;</comment>
                            <comment id="61342" author="morrone" created="Wed, 26 Jun 2013 01:39:15 +0000"  >&lt;p&gt;Jinshan, testing of change &lt;a href=&quot;http://review.whamcloud.com/6554&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;6554&lt;/a&gt; patch set 7 went well.  No failures, and no new performance problems noted.&lt;/p&gt;

&lt;p&gt;The commit message will need to be rewritten before landing.&lt;/p&gt;
</comment>
                            <comment id="65344" author="niu" created="Thu, 29 Aug 2013 07:00:03 +0000"  >&lt;p&gt;patch landed for 2.5&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="17090">LU-2576</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="17690">LU-2859</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="13061" name="rzuseqio13_drop_caches.txt.bz2" size="256" author="morrone" created="Fri, 14 Jun 2013 22:50:13 +0000"/>
                            <attachment id="13062" name="rzuseqio14_drop_caches.txt.bz2" size="4847250" author="morrone" created="Fri, 14 Jun 2013 22:50:13 +0000"/>
                            <attachment id="12969" name="rzuseqio14_lustre_log.txt" size="237" author="morrone" created="Fri, 31 May 2013 00:23:31 +0000"/>
                            <attachment id="13038" name="rzuseqio15_console.txt.bz2" size="80541" author="morrone" created="Tue, 11 Jun 2013 21:27:59 +0000"/>
                            <attachment id="13063" name="rzuseqio15_drop_caches.txt.bz2" size="256" author="morrone" created="Fri, 14 Jun 2013 22:50:13 +0000"/>
                            <attachment id="13064" name="rzuseqio16_drop_caches.txt.bz2" size="3625347" author="morrone" created="Fri, 14 Jun 2013 22:50:13 +0000"/>
                            <attachment id="12968" name="rzuseqio16_dump_page_cache.txt.bz2" size="801480" author="morrone" created="Thu, 30 May 2013 18:13:29 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvs8f:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>8471</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>