<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:03:41 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-93] Kernel panic caused by Lustre 1.8.5/1.8.6 on RH6 (patchless client)</title>
                <link>https://jira.whamcloud.com/browse/LU-93</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;bug already described in &lt;a href=&quot;https://bugzilla.redhat.com/show_bug.cgi?id=678175&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://bugzilla.redhat.com/show_bug.cgi?id=678175&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Lustre 1.8.5 works without problems in RH5. Switching to RH6 I get Kernel Panic&lt;br/&gt;
errors halting the system. As far as I can tell this happens only if Read and&lt;br/&gt;
Writes calls are used at the same time. Using 12 threads Read or 12 threads&lt;br/&gt;
Writing has not caused this issue to appear. Note - it&apos;s not possible to&lt;br/&gt;
completely reproduce this issue. Sometimes the error happens after 2min,&lt;br/&gt;
sometimes it takes 15min. So I&apos;m pretty sure this is a race condition.&lt;/p&gt;

&lt;p&gt;The Kernel Panic happens in conjunction with the &quot;page&quot; structure. Lustre uses&lt;br/&gt;
&quot;page-&amp;gt;private&quot; to store a pointer to private information. Sometimes this value&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;which obviously should be either NULL or a pointer to a valid Kernel address&lt;/li&gt;
	&lt;li&gt;is set to &quot;2&quot;. This disables all checks in Lustre for &quot;page-&amp;gt;private ==&lt;br/&gt;
NULL&quot;, and a subsequent access &quot;page-&amp;gt;private-&amp;gt;ll_magic&quot; causes the panic. I&lt;br/&gt;
probably could set &quot;page-&amp;gt;private&quot; to NULL in such a case, but that would only&lt;br/&gt;
hide the sympton, not remedy the root cause and thereby most likley introduce a&lt;br/&gt;
new error.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;From the Kernel traces I was able to recreate the callpath:&lt;/p&gt;

&lt;p&gt;llap_cast_private+0x18/0xa0&lt;br/&gt;
llap_from_page_with_lockh.clone.8+0x1484/0x1cf0 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
(inline: ll_readahead_page)&lt;br/&gt;
(inline: ll_readahead_pages)&lt;br/&gt;
ll_readahead+0xf59/0x1ca0 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
ll_readpage+0x29b/0x2270 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
generic_file_aio_read+0x1f0/0x730&lt;br/&gt;
ll_file_aio_read+0xcd6/0x2720 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
ll_file_read+0xd0/0xf0 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
vfs_read+0xb5/0x1a0&lt;br/&gt;
sys_read+0x51/0x90&lt;br/&gt;
system_call_fastpath+0x16/0x1b&lt;/p&gt;

&lt;p&gt;tracing back the origins of page-&amp;gt;private==2 lead me to the call&lt;/p&gt;

&lt;p&gt;  page = grab_cache_page_nowait(mapping, index);&lt;/p&gt;

&lt;p&gt;This function returns sometimes page-&amp;gt;private is already set to &quot;2&quot;. From what&lt;br/&gt;
I could see in the Lustre sources &quot;page-&amp;gt;private&quot; is always treated to store a&lt;br/&gt;
pointer into kernel space. It is therefore very unlikely that Lustre causes&lt;br/&gt;
this error itself, especially as the code works on RH5&lt;/p&gt;

&lt;p&gt;This leads me to the following questions: &lt;/p&gt;

&lt;p&gt;a) If the page was not in the cache and has been freshly allocated - should&lt;br/&gt;
page-&amp;gt;private be 0? Cause it looks to me it this is not always the case. The&lt;br/&gt;
page-&amp;gt;private==2 error is already present when the function returns. &lt;br/&gt;
b) Is there a simple way to distinguish between a page freshly allocated and&lt;br/&gt;
one already residing in the page cache?&lt;/p&gt;

&lt;p&gt;I attached a sosreport as well&lt;/p&gt;

&lt;p&gt;thanks&lt;br/&gt;
Michael&lt;/p&gt;


&lt;p&gt;############################################################################&lt;/p&gt;

&lt;p&gt;############################################################################&lt;/p&gt;


&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@et06 crt&amp;#93;&lt;/span&gt;# uname -r&lt;br/&gt;
2.6.32-71.14.1.el6.x86_64.crt&lt;/p&gt;



&lt;p&gt;############################################################################&lt;br/&gt;
A typical panic originally looked like:&lt;/p&gt;

&lt;p&gt;############################################################################&lt;/p&gt;


&lt;p&gt;BUG: unable to handle kernel NULL pointer dereference at 0000000000000002&lt;/p&gt;

&lt;p&gt;IP: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0889d08&amp;gt;&amp;#93;&lt;/span&gt; llap_cast_private+0x18/0xa0 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
PGD 32d72a067 PUD 330da0067 PMD 0&lt;br/&gt;
Oops: 0000 &lt;a href=&quot;#1&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;1&lt;/a&gt; SMP&lt;br/&gt;
last sysfs file: /sys/module/ipv6/initstate&lt;br/&gt;
CPU 20&lt;/p&gt;

&lt;p&gt;Modules linked in: mgc(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ko2iblnd(U)&lt;br/&gt;
ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) nfs lockd fscache nfs_acl&lt;br/&gt;
auth_rpcgss acpi_cpufreq freq_table sunrpc rdma_ucm(U) ib_sdp(U) rdma_cm(U)&lt;br/&gt;
iw_cm(U) ib_addr(U) ib_ipoib(U) ib_cm(U) ib_sa(U) ipv6 ib_uverbs(U) ib_umad(U)&lt;br/&gt;
mlx4_ib(U) ib_mad(U) ib_core(U) mlx4_core(U) ext2 dm_mirror dm_region_hash&lt;br/&gt;
dm_log dm_mod radeon ttm drm_kms_helper drm i2c_algo_bit serio_raw i2c_i801&lt;br/&gt;
i2c_core sg iTCO_wdt iTCO_vendor_support pata_jmicron i7core_edac edac_core&lt;br/&gt;
shpchp igb dca ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif usb_storage ahci&lt;br/&gt;
ata_generic pata_acpi &lt;span class=&quot;error&quot;&gt;&amp;#91;last unloaded: cpufreq_ondemand&amp;#93;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;Modules linked in: mgc(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ko2iblnd(U)&lt;br/&gt;
ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) nfs lockd fscache nfs_acl&lt;br/&gt;
auth_rpcgss acpi_cpufreq freq_table sunrpc rdma_ucm(U) ib_sdp(U) rdma_cm(U)&lt;br/&gt;
iw_cm(U) ib_addr(U) ib_ipoib(U) ib_cm(U) ib_sa(U) ipv6 ib_uverbs(U) ib_umad(U)&lt;br/&gt;
mlx4_ib(U) ib_mad(U) ib_core(U) mlx4_core(U) ext2 dm_mirror dm_region_hash&lt;br/&gt;
dm_log dm_mod radeon ttm drm_kms_helper drm i2c_algo_bit serio_raw i2c_i801&lt;br/&gt;
i2c_core sg iTCO_wdt iTCO_vendor_support pata_jmicron i7core_edac edac_core&lt;br/&gt;
shpchp igb dca ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif usb_storage ahci&lt;br/&gt;
ata_generic pata_acpi &lt;span class=&quot;error&quot;&gt;&amp;#91;last unloaded: cpufreq_ondemand&amp;#93;&lt;/span&gt;&lt;br/&gt;
Pid: 3477, comm: bonnie Not tainted 2.6.32-71.14.1.el6.x86_64.crt #1 X8DTN&lt;br/&gt;
RIP: 0010:&lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0889d08&amp;gt;&amp;#93;&lt;/span&gt;  &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0889d08&amp;gt;&amp;#93;&lt;/span&gt;&lt;br/&gt;
llap_cast_private+0x18/0xa0 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
RSP: 0018:ffff880553371748  EFLAGS: 00010202&lt;br/&gt;
RAX: 0000000000000002 RBX: ffff880264d18000 RCX: ffff88009e7487e8&lt;br/&gt;
RDX: ffff8805533719a8 RSI: 0000000000000002 RDI: ffffea0002988c48&lt;br/&gt;
RBP: ffff880553371788 R08: 0000000000000002 R09: ffffea0002988c50&lt;br/&gt;
R10: ffff8805fe3b43c0 R11: 0000000000000010 R12: ffff88032f6e2df0&lt;br/&gt;
R13: ffffea0002988c48 R14: ffff88032f6e2cd0 R15: ffff88032fca8800&lt;br/&gt;
FS:  00002b219d82fb20(0000) GS:ffff88034ad00000(0000) knlGS:0000000000000000&lt;br/&gt;
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b&lt;br/&gt;
CR2: 0000000000000002 CR3: 000000032d7ae000 CR4: 00000000000006e0&lt;br/&gt;
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000&lt;br/&gt;
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400&lt;br/&gt;
Process bonnie (pid: 3477, threadinfo ffff880553370000, task ffff8805761f54a0)&lt;br/&gt;
Stack:&lt;br/&gt;
 000000003f818000 ffffea0002988c10 ffff8805533717b8 ffffffffa05cbaef&lt;br/&gt;
&amp;lt;0&amp;gt; ffff88062f0a2200 0000000000000000 ffff880553371798 ffff88062f0a2200&lt;br/&gt;
&amp;lt;0&amp;gt; ffff880553371868 ffffffffa088e921 ffff880553371818 ffffffffa0648daa&lt;br/&gt;
Call Trace:&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa05cbaef&amp;gt;&amp;#93;&lt;/span&gt; ? class_handle2object+0x8f/0x1c0 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdclass&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa088e921&amp;gt;&amp;#93;&lt;/span&gt; llap_from_page_with_lockh.clone.8+0xa1/0x1120 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0648daa&amp;gt;&amp;#93;&lt;/span&gt; ? ldlm_lock_decref_internal+0xba/0x880 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0645bbf&amp;gt;&amp;#93;&lt;/span&gt; ? __ldlm_handle2lock+0x9f/0x3d0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa088d961&amp;gt;&amp;#93;&lt;/span&gt; ? ll_issue_page_read+0x131/0x420 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8110c07e&amp;gt;&amp;#93;&lt;/span&gt; ? find_get_page+0x1e/0xa0&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0891869&amp;gt;&amp;#93;&lt;/span&gt; ll_readahead+0xf59/0x1ca0 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0643651&amp;gt;&amp;#93;&lt;/span&gt; ? ldlm_lock_add_to_lru_nolock+0x51/0xe0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0643879&amp;gt;&amp;#93;&lt;/span&gt; ? ldlm_lock_add_to_lru+0x49/0x110 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa088eb21&amp;gt;&amp;#93;&lt;/span&gt; ? llap_from_page_with_lockh.clone.8+0x2a1/0x1120 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0894475&amp;gt;&amp;#93;&lt;/span&gt; ll_readpage+0x165/0x1da0 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0643984&amp;gt;&amp;#93;&lt;/span&gt; ? ldlm_lock_remove_from_lru+0x44/0x110 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0643bb2&amp;gt;&amp;#93;&lt;/span&gt; ? ldlm_lock_fast_match+0xc2/0x130 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff812637ed&amp;gt;&amp;#93;&lt;/span&gt; ? copy_user_generic_string+0x2d/0x40&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8110d3e0&amp;gt;&amp;#93;&lt;/span&gt; generic_file_aio_read+0x1f0/0x730&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa086ee8c&amp;gt;&amp;#93;&lt;/span&gt; ll_file_aio_read+0xc8c/0x2610 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa08708e0&amp;gt;&amp;#93;&lt;/span&gt; ll_file_read+0xd0/0xf0 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff81091de0&amp;gt;&amp;#93;&lt;/span&gt; ? autoremove_wake_function+0x0/0x40&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff811ffb76&amp;gt;&amp;#93;&lt;/span&gt; ? security_file_permission+0x16/0x20&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8116cf7d&amp;gt;&amp;#93;&lt;/span&gt; ? rw_verify_area+0x5d/0xc0&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8116d905&amp;gt;&amp;#93;&lt;/span&gt; vfs_read+0xb5/0x1a0&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8116da41&amp;gt;&amp;#93;&lt;/span&gt; sys_read+0x51/0x90&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff81013172&amp;gt;&amp;#93;&lt;/span&gt; system_call_fastpath+0x16/0x1b&lt;br/&gt;
Code: ff ff 90 4c 8b 5d c8 4d 29 f3 4c 03 5d c0 e9 d3 fe ff ff 55 48 89 e5 48&lt;br/&gt;
83 ec 40 0f 1f 44 00 00 48 8b 47 10 48 85 c0 75 02 c9 c3 &amp;lt;8b&amp;gt; 10 81 fa 21 06 e3&lt;br/&gt;
05 74 f4 89 54 24 28 48 89 44 24 20 ba 00&lt;/p&gt;

&lt;p&gt;RIP  &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0889d08&amp;gt;&amp;#93;&lt;/span&gt; llap_cast_private+0x18/0xa0 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
 RSP &amp;lt;ffff880553371748&amp;gt;&lt;br/&gt;
CR2: 0000000000000002&lt;br/&gt;
--&lt;del&gt;[ end trace 2a22ba0ec9d8ad43 ]&lt;/del&gt;--&lt;br/&gt;
Kernel panic - not syncing: Fatal exception&lt;br/&gt;
Pid: 3477, comm: bonnie Tainted: G      D    ---------------- &lt;br/&gt;
2.6.32-71.14.1.el6.x86_64.crt #1&lt;br/&gt;
Call Trace:&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff814c8183&amp;gt;&amp;#93;&lt;/span&gt; panic+0x78/0x137&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff814cc254&amp;gt;&amp;#93;&lt;/span&gt; oops_end+0xe4/0x100&lt;br/&gt;
Feb  4 10:56:01  &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8104651b&amp;gt;&amp;#93;&lt;/span&gt; no_context+0xfb/0x260&lt;br/&gt;
et06 kernel: BUG &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff810467a5&amp;gt;&amp;#93;&lt;/span&gt; __bad_area_nosemaphore+0x125/0x1e0&lt;br/&gt;
: unable to hand &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff810468ce&amp;gt;&amp;#93;&lt;/span&gt; bad_area+0x4e/0x60&lt;br/&gt;
le kernel NULL p &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff814cdda0&amp;gt;&amp;#93;&lt;/span&gt; do_page_fault+0x390/0x3a0&lt;br/&gt;
ointer dereferen &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff814cb5a5&amp;gt;&amp;#93;&lt;/span&gt; page_fault+0x25/0x30&lt;br/&gt;
ce at 0000000000 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0889d08&amp;gt;&amp;#93;&lt;/span&gt; ? llap_cast_private+0x18/0xa0 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
000002&lt;br/&gt;
Feb  4 10 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa05cbaef&amp;gt;&amp;#93;&lt;/span&gt; ? class_handle2object+0x8f/0x1c0 &lt;span class=&quot;error&quot;&gt;&amp;#91;obdclass&amp;#93;&lt;/span&gt;&lt;br/&gt;
:56:01 et06 kern &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa088e921&amp;gt;&amp;#93;&lt;/span&gt;&lt;br/&gt;
llap_from_page_with_lockh.clone.8+0xa1/0x1120 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
el: BUG: unable  &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0648daa&amp;gt;&amp;#93;&lt;/span&gt; ? ldlm_lock_decref_internal+0xba/0x880&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
to handle kernel &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0645bbf&amp;gt;&amp;#93;&lt;/span&gt; ? __ldlm_handle2lock+0x9f/0x3d0 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
 NULL pointer de &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa088d961&amp;gt;&amp;#93;&lt;/span&gt; ? ll_issue_page_read+0x131/0x420 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
reference at 000 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8110c07e&amp;gt;&amp;#93;&lt;/span&gt; ? find_get_page+0x1e/0xa0&lt;br/&gt;
0000000000002&lt;br/&gt;
Fe &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0891869&amp;gt;&amp;#93;&lt;/span&gt; ll_readahead+0xf59/0x1ca0 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
b  4 10:56:01 et &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0643651&amp;gt;&amp;#93;&lt;/span&gt; ? ldlm_lock_add_to_lru_nolock+0x51/0xe0&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
06 kernel: IP: [ &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0643879&amp;gt;&amp;#93;&lt;/span&gt; ? ldlm_lock_add_to_lru+0x49/0x110&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
&amp;lt;ffffffffa0889d0 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa088eb21&amp;gt;&amp;#93;&lt;/span&gt; ?&lt;br/&gt;
llap_from_page_with_lockh.clone.8+0x2a1/0x1120 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
8&amp;gt;] llap_cast_pr &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0894475&amp;gt;&amp;#93;&lt;/span&gt; ll_readpage+0x165/0x1da0 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
ivate+0x18/0xa0  &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0643984&amp;gt;&amp;#93;&lt;/span&gt; ? ldlm_lock_remove_from_lru+0x44/0x110&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
Feb  4  &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0643bb2&amp;gt;&amp;#93;&lt;/span&gt; ? ldlm_lock_fast_match+0xc2/0x130 &lt;span class=&quot;error&quot;&gt;&amp;#91;ptlrpc&amp;#93;&lt;/span&gt;&lt;br/&gt;
10:56:01 et06 ke &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff812637ed&amp;gt;&amp;#93;&lt;/span&gt; ? copy_user_generic_string+0x2d/0x40&lt;br/&gt;
rnel: IP: [&amp;lt;ffff &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8110d3e0&amp;gt;&amp;#93;&lt;/span&gt; generic_file_aio_read+0x1f0/0x730&lt;br/&gt;
ffffa0889d08&amp;gt;] l &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa086ee8c&amp;gt;&amp;#93;&lt;/span&gt; ll_file_aio_read+0xc8c/0x2610 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
lap_cast_private &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa08708e0&amp;gt;&amp;#93;&lt;/span&gt; ll_file_read+0xd0/0xf0 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
+0x18/0xa0 [lust &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff81091de0&amp;gt;&amp;#93;&lt;/span&gt; ? autoremove_wake_function+0x0/0x40&lt;br/&gt;
re]&lt;br/&gt;
Feb  4 10:56 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff811ffb76&amp;gt;&amp;#93;&lt;/span&gt; ? security_file_permission+0x16/0x20&lt;br/&gt;
:01 et06 kernel: &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8116cf7d&amp;gt;&amp;#93;&lt;/span&gt; ? rw_verify_area+0x5d/0xc0&lt;br/&gt;
 PGD 32d72a067 P &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8116d905&amp;gt;&amp;#93;&lt;/span&gt; vfs_read+0xb5/0x1a0&lt;br/&gt;
UD 330da0067 PMD &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8116da41&amp;gt;&amp;#93;&lt;/span&gt; sys_read+0x51/0x90&lt;br/&gt;
 0&lt;br/&gt;
Feb  4 10:56 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff81013172&amp;gt;&amp;#93;&lt;/span&gt; system_call_fastpath+0x16/0x1b&lt;br/&gt;
:01 et06 kernel:panic occurred, switching back to text console&lt;/p&gt;



&lt;p&gt;###########################################################################&lt;br/&gt;
############################################################################&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@et06 crt&amp;#93;&lt;/span&gt;# modinfo lustre&lt;br/&gt;
filename:      &lt;br/&gt;
/lib/modules/2.6.32-71.14.1.el6.x86_64.crt/updates/kernel/fs/lustre/lustre.ko&lt;br/&gt;
license:        GPL&lt;br/&gt;
description:    Lustre Lite Client File System&lt;br/&gt;
author:         Sun Microsystems, Inc. &amp;lt;&lt;a href=&quot;http://www.lustre.org/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://www.lustre.org/&lt;/a&gt;&amp;gt;&lt;br/&gt;
srcversion:     83AC865556E2AE50715D831&lt;br/&gt;
depends:        obdclass,mdc,ptlrpc,libcfs,lvfs,lov,osc,lnet&lt;br/&gt;
vermagic:       2.6.32-71.14.1.el6.x86_64.crt SMP mod_unload modversions&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@et06 crt&amp;#93;&lt;/span&gt;# gdb&lt;br/&gt;
/lib/modules/2.6.32-71.14.1.el6.x86_64.crt/updates/kernel/fs/lustre/lustre.ko&lt;br/&gt;
...&lt;/p&gt;

&lt;p&gt;Reading symbols from&lt;br/&gt;
/lib/modules/2.6.32-71.14.1.el6.x86_64.crt/updates/kernel/fs/lustre/lustre.ko...done.&lt;br/&gt;
(gdb) disassemble /m *(llap_cast_private+0x18/0xa0)&lt;br/&gt;
Dump of assembler code for function llap_cast_private:&lt;br/&gt;
538     &lt;/p&gt;
{
   0x0000000000036d20 &amp;lt;+0&amp;gt;:     push   %rbp
   0x0000000000036d21 &amp;lt;+1&amp;gt;:     mov    %rsp,%rbp
   0x0000000000036d24 &amp;lt;+4&amp;gt;:     sub    $0x40,%rsp
   0x0000000000036d28 &amp;lt;+8&amp;gt;:     callq  0x36d2d &amp;lt;llap_cast_private+13&amp;gt;

539             struct ll_async_page *llap = (struct ll_async_page
*)page_private(page);
   0x0000000000036d2d &amp;lt;+13&amp;gt;:    mov    0x10(%rdi),%rax

540
541             LASSERTF(llap == NULL || llap-&amp;gt;llap_magic == LLAP_MAGIC,
   0x0000000000036d31 &amp;lt;+17&amp;gt;:    test   %rax,%rax
   0x0000000000036d34 &amp;lt;+20&amp;gt;:    jne    0x36d38 &amp;lt;llap_cast_private+24&amp;gt;
   0x0000000000036d38 &amp;lt;+24&amp;gt;:    mov    (%rax),%edx
   0x0000000000036d3a &amp;lt;+26&amp;gt;:    cmp    $0x5e30621,%edx
   0x0000000000036d40 &amp;lt;+32&amp;gt;:    je     0x36d36 &amp;lt;llap_cast_private+22&amp;gt;
   0x0000000000036d42 &amp;lt;+34&amp;gt;:    mov    %edx,0x28(%rsp)
   0x0000000000036d46 &amp;lt;+38&amp;gt;:    mov    %rax,0x20(%rsp)
   0x0000000000036d4b &amp;lt;+43&amp;gt;:    mov    $0x40000,%edx
   0x0000000000036d50 &amp;lt;+48&amp;gt;:    mov    %rdi,0x18(%rsp)
   0x0000000000036d55 &amp;lt;+53&amp;gt;:    mov    $0x80,%esi
   0x0000000000036d5a &amp;lt;+58&amp;gt;:    xor    %edi,%edi
   0x0000000000036d5c &amp;lt;+60&amp;gt;:    mov    $0x21f,%r9d
   0x0000000000036d62 &amp;lt;+66&amp;gt;:    mov    $0x0,%r8
   0x0000000000036d69 &amp;lt;+73&amp;gt;:    mov    $0x0,%rcx
   0x0000000000036d70 &amp;lt;+80&amp;gt;:    xor    %eax,%eax
   0x0000000000036d72 &amp;lt;+82&amp;gt;:    movl   $0x5e30621,0x30(%rsp)
   0x0000000000036d7a &amp;lt;+90&amp;gt;:    movq   $0x0,0x10(%rsp)
   0x0000000000036d83 &amp;lt;+99&amp;gt;:    movq   $0x0,0x8(%rsp)
   0x0000000000036d8c &amp;lt;+108&amp;gt;:   movq   $0x0,(%rsp)
   0x0000000000036d94 &amp;lt;+116&amp;gt;:   callq  0x36d99 &amp;lt;llap_cast_private+121&amp;gt;
   0x0000000000036d99 &amp;lt;+121&amp;gt;:   mov    $0x21f,%edx
   0x0000000000036d9e &amp;lt;+126&amp;gt;:   mov    $0x0,%rsi
   0x0000000000036da5 &amp;lt;+133&amp;gt;:   mov    $0x0,%rdi
   0x0000000000036dac &amp;lt;+140&amp;gt;:   callq  0x36db1
   0x0000000000036db1:  data32 data32 data32 data32 data32 nopw
%cs:0x0(%rax,%rax,1)

542                      &quot;page %p private %lu gave magic %d which != %d\n&quot;,
543                      page, page_private(page), llap-&amp;gt;llap_magic,
LLAP_MAGIC);
544
545             return llap;
546     }
&lt;p&gt;   0x0000000000036d36 &amp;lt;+22&amp;gt;:    leaveq&lt;br/&gt;
   0x0000000000036d37 &amp;lt;+23&amp;gt;:    retq&lt;/p&gt;

&lt;p&gt;End of assembler dump.&lt;/p&gt;
</description>
                <environment>RH6, recompiled kernel, OFED 1.5.3 RC3/4/5 (bug does not depend on OFED version)</environment>
        <key id="10390">LU-93</key>
            <summary>Kernel panic caused by Lustre 1.8.5/1.8.6 on RH6 (patchless client)</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="1" iconUrl="https://jira.whamcloud.com/images/icons/priorities/blocker.svg">Blocker</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="laisiyao">Lai Siyao</assignee>
                                    <reporter username="michael.hebenstreit@intel.com">Michael Hebensteit</reporter>
                        <labels>
                    </labels>
                <created>Wed, 23 Feb 2011 10:49:55 +0000</created>
                <updated>Tue, 28 Jun 2011 15:01:41 +0000</updated>
                            <resolved>Wed, 1 Jun 2011 05:59:03 +0000</resolved>
                                    <version>Lustre 1.8.6</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="10723" author="michael.hebenstreit@intel.com" created="Wed, 23 Feb 2011 11:02:19 +0000"  >&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;inserting a lot of conditional printk statements (checking if page-&amp;gt;private is equal to 2) I got the following result:

LDEBUGZ0064: ffffea00014f57c8                    Lustre/llap_shrink_cache_internal after calling page_cache_release)
...
LDEBUGMM:26 ffffea00014f57c8                     Kernel:mm/filemap.c, find_get_page()
LDEBUGMM:27 ffffea00014f57c8
LDEBUGMM:28 ffffea00014f57c8
LDEBUGZ0175: ffffea00014f57c8                    copy of grab_cache_page_nowait(); copied to my lustre code and modified to differentiate between &quot;page found in cache&quot; and &quot;new page created&quot;)
LDEBUGLe: ffffea00014f57c8                        Lustre: Read-ahead function after grab_cache_page returns
LDEBUGLi: ffffea00014f57c8
LDEBUGA1: ffffea00014f57c8                        Lustre: llap_from_page_with_lockh()

first item is the location, second part is page (aka the kernel addres of the page structure). So the same page that later causes the trouble has been released once.

as part of shrinking an Lustre internal cache in llap_shrink_cache_internal Lustre calls

                unlock_page(page);
                page_cache_release(page);


                ll_pglist_cpu_lock(sbi, cpu);

After page_cache_release (location LDEBUGZ0064) the pointer page-&amp;gt;private==2 (never before)


#define page_cache_release(page) __free_pages(page, 0)

void __free_pages(struct page *page, unsigned int order)
{
        if (put_page_testzero(page)) {
                trace_mm_page_free_direct(page, order);
                if (order == 0)
                        free_hot_page(page);
                else
                        __free_pages_ok(page, order);
        }
}

void free_hot_page(struct page *page)
{
        trace_mm_page_free_direct(page, 0);
        free_hot_cold_page(page, 0);
}

static void free_hot_cold_page(struct page *page, int cold)
{
        struct zone *zone = page_zone(page);
        struct per_cpu_pages *pcp;
        unsigned long flags;
        int migratetype;
        int wasMlocked = __TestClearPageMlocked(page);

        kmemcheck_free_shadow(page, 0);

        if (PageAnon(page))
                page-&amp;gt;mapping = NULL;
        if (free_pages_check(page))
                return;

        if (!PageHighMem(page)) {
                debug_check_no_locks_freed(page_address(page), PAGE_SIZE);
                debug_check_no_obj_freed(page_address(page), PAGE_SIZE);
        }
        arch_free_page(page, 0);
        kernel_map_pages(page, 1, 0);

        pcp = &amp;amp;zone_pcp(zone, get_cpu())-&amp;gt;pcp;
        migratetype = get_pageblock_migratetype(page);
        set_page_private(page, migratetype);
        local_irq_save(flags);
        if (unlikely(wasMlocked))
                free_page_mlock(page);
        __count_vm_event(PGFREE);

        /*
         * We only track unmovable, reclaimable and movable on pcp lists.
         * Free ISOLATE pages back to the allocator because they are being
         * offlined but treat RESERVE as movable pages so we can get those
         * areas back if necessary. Otherwise, we may have to free
         * excessively into the page allocator
         */
        if (migratetype &amp;gt;= MIGRATE_PCPTYPES) {
                if (unlikely(migratetype == MIGRATE_ISOLATE)) {
                        free_one_page(zone, page, 0, migratetype);
                        goto out;
                }
                migratetype = MIGRATE_MOVABLE;
        }

        if (cold)
                list_add_tail(&amp;amp;page-&amp;gt;lru, &amp;amp;pcp-&amp;gt;lists[migratetype]);
        else
                list_add(&amp;amp;page-&amp;gt;lru, &amp;amp;pcp-&amp;gt;lists[migratetype]);
        pcp-&amp;gt;count++;
        if (pcp-&amp;gt;count &amp;gt;= pcp-&amp;gt;high) {
                free_pcppages_bulk(zone, pcp-&amp;gt;batch, pcp);
                pcp-&amp;gt;count -= pcp-&amp;gt;batch;
        }

out:
        local_irq_restore(flags);
        put_cpu();


/*
 * Drop a ref, return true if the refcount fell to zero (the page has no users)
 */
static inline int put_page_testzero(struct page *page)
{
        VM_BUG_ON(atomic_read(&amp;amp;page-&amp;gt;_count) == 0);
        return atomic_dec_and_test(&amp;amp;page-&amp;gt;_count);
}

/*
 * Try to grab a ref unless the page has a refcount of zero, return false if
 * that is the case.
 */
static inline int get_page_unless_zero(struct page *page)
{
        return atomic_inc_not_zero(&amp;amp;page-&amp;gt;_count);
}



Read-ahead
-&amp;gt; grab_cache_page_nowait()
        -&amp;gt; find_get_page()
                Success, but page-&amp;gt;private == 2

 llap = llap_from_page_with_lockh(page, LLAP_ORIGIN_READAHEAD, &amp;amp;lockh, flags);
 -&amp;gt; cast_private
        PANIC


struct page * grab_cache_page_nowait(struct address_space *mapping, pgoff_t index)
{
        struct page *page = find_get_page(mapping, index);

        if (page) {
                if (trylock_page(page))
                        return page;

                page_cache_release(page);
                return NULL;
        }
        page = __page_cache_alloc(mapping_gfp_mask(mapping) &amp;amp; ~__GFP_FS);
        if (page &amp;amp;&amp;amp; add_to_page_cache_lru(page, mapping, index, GFP_NOFS)) {
                page_cache_release(page);
                page = NULL;
        }
        return page;
}

struct page *find_get_page(struct address_space *mapping, pgoff_t offset)
{
        void **pagep;
        struct page *page;

        rcu_read_lock();
repeat:
        page = NULL;
        pagep = radix_tree_lookup_slot(&amp;amp;mapping-&amp;gt;page_tree, offset);
        if (pagep) {
                page = radix_tree_deref_slot(pagep);
                if (unlikely(!page || page == RADIX_TREE_RETRY))
                        goto repeat;

# AT THIS point page-&amp;gt;private is already ==2


Speculation: Could there be a race condition? Something like a read process on CPU A grabbing the page while it is still being &quot;freed&quot;?
Nevertheless sometimes a call to find_get_page(mapping, index) returns a valid page with same address, and naturally page-&amp;gt;private is STILL ==2
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="10725" author="michael.hebenstreit@intel.com" created="Wed, 23 Feb 2011 17:51:19 +0000"  >&lt;p&gt;first crashdumpfile uploaded&lt;br/&gt;
/uploads/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-93&quot; title=&quot;Kernel panic caused by Lustre 1.8.5/1.8.6 on RH6 (patchless client)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-93&quot;&gt;&lt;del&gt;LU-93&lt;/del&gt;&lt;/a&gt;/vmcore1&lt;/p&gt;

&lt;p&gt;second file in progress&lt;/p&gt;</comment>
                            <comment id="10740" author="michael.hebenstreit@intel.com" created="Thu, 24 Feb 2011 15:09:21 +0000"  >&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;the first upload might not have include everything necessary - specifically I did not have a vmlinux anymore - so I recompiled the kernel/OFED/Lustre, recreated the crashes - and here you go. Note - vmcore4 is incomplete, use vmcore4_try2; vmcore3 and vmcore4 are 2 crashes observed on the same kernel

ftp&amp;gt; cd LU-93
250 Directory successfully changed.
ftp&amp;gt; dir
200 PORT command successful. Consider using PASV.
150 Here comes the directory listing.
-rw-r--r--    1 123      114       2226496 Feb 24 08:46 System.map-2.6.32-71.14.1.el6.x86_64.crt.debug
-rw-r--r--    1 123      114          5070 Feb 24 08:46 et06_201102231954.log
-rw-r--r--    1 123      114          4992 Feb 24 08:46 et06_201102232023.log
-rw-r--r--    1 123      114       8975635 Feb 24 08:47 initramfs-2.6.32-71.14.1.el6.x86_64.crt.debug.img
-rw-r--r--    1 123      114       4592694 Feb 24 08:47 initrd-2.6.32-71.14.1.el6.x86_64.crt.debugkdump.img
-rw-r--r--    1 123      114      155613481 Feb 24 08:54 linux-2.6.32-71.14.1.el6.x86_64.crt.debug.tgz
-rw-r--r--    1 123      114       3523116 Feb 24 08:54 lustre-1.8.5-2.6.32_71.14.1.el6.x86_64.crt.debug_201102231933.x86_64.rpm
-rw-r--r--    1 123      114       7614664 Feb 24 08:54 lustre-debuginfo-1.8.5-2.6.32_71.14.1.el6.x86_64.crt.debug_201102231933.x86_64.rpm
-rw-r--r--    1 123      114         39056 Feb 24 08:54 lustre-iokit-1.2-201102231934.noarch.rpm
-rw-r--r--    1 123      114      11542584 Feb 24 08:55 lustre-modules-1.8.5-2.6.32_71.14.1.el6.x86_64.crt.debug_201102231933.x86_64.rpm
-rw-r--r--    1 123      114       3909728 Feb 24 08:55 lustre-source-1.8.5-2.6.32_71.14.1.el6.x86_64.crt.debug_201102231933.x86_64.rpm
-rw-r--r--    1 123      114       2876788 Feb 24 08:55 lustre-tests-1.8.5-2.6.32_71.14.1.el6.x86_64.crt.debug_201102231933.x86_64.rpm
-rw-r--r--    1 123      114      36536375 Feb 24 08:57 lustre.1.8.5.tgz
-rw-r--r--    1 123      114      380339516 Feb 24 09:15 modules.2.6.32-71.14.1.el6.x86_64.crt.debug.tgz
-rw-r--r--    1 123      114      1661148251 Feb 23 16:53 vmcore1
-rw-r--r--    1 123      114      1835102847 Feb 23 19:39 vmcore2
-rw-r--r--    1 123      114      2334515256 Feb 24 11:26 vmcore3
-rw-r--r--    1 123      114      148038120 Feb 24 13:29 vmcore4
-rw-r--r--    1 123      114      1057381374 Feb 24 14:59 vmcore4_try2
-rw-r--r--    1 123      114      123037042 Feb 24 13:18 vmlinux
-rw-r--r--    1 123      114      287984166 Feb 24 13:09 vmlinux.o
-rw-r--r--    1 123      114       3820416 Feb 24 13:09 vmlinuz-2.6.32-71.14.1.el6.x86_64.crt.debug
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="10752" author="rread" created="Fri, 25 Feb 2011 10:18:03 +0000"  >&lt;p&gt;Michael - jira tip: the little screen icon to the right of the comment edit window will switch to preview mode, and I think the {noformat} block command is probably what you were looking for. &lt;/p&gt;</comment>
                            <comment id="10753" author="michael.hebenstreit@intel.com" created="Fri, 25 Feb 2011 10:34:03 +0000"  >&lt;p&gt;thanks, that works - but I can not edit the original entry&lt;br/&gt;
could we have a quick phone call?&lt;/p&gt;

&lt;p&gt;thanks&lt;br/&gt;
Michael&lt;/p&gt;</comment>
                            <comment id="10939" author="pjones" created="Mon, 7 Mar 2011 19:13:59 +0000"  >&lt;p&gt;Lai&lt;/p&gt;

&lt;p&gt;Could you please work on this?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="10947" author="laisiyao" created="Tue, 8 Mar 2011 07:12:29 +0000"  >&lt;p&gt;Michael, your investigation into this issue is quite helpful, I agree that it&apos;s suspicious that page-&amp;gt;private turns into 2 after page_cache_release.&lt;/p&gt;

&lt;p&gt;And you may misunderstand that page_cache_release is not redefined in linux kernel,&lt;br/&gt;
#define page_cache_release(page) __free_pages(page, 0)&lt;br/&gt;
is for liblustre only (in user space).&lt;/p&gt;

&lt;p&gt;I couldn&apos;t reproduce this panic in my local test environment, could you tell me how you ran &apos;bonnie&apos; to trigger this?&lt;/p&gt;

&lt;p&gt;BTW, if you&apos;d help, could you dump page-&amp;gt;flags also in your printk statements, it may help.&lt;/p&gt;</comment>
                            <comment id="10950" author="michael.hebenstreit@intel.com" created="Tue, 8 Mar 2011 08:03:28 +0000"  >&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;I&apos;m already a step further in this - that&apos;s why I wanted a telephone meeting before you go hunting :)

The crash happened when running multiple Bonnies ate the same time (I&apos;ve seen it happenening with as low as 4, but mostly I experienced it with 12 bonnie threads per server, aka one per CPU; each command was &quot;bonnie -d $DIR -s 2000 -v 16 -y&quot;, $DIR set differently for every process).

This bug might be influenced by a BIOS bug. I had this problem only with a specific Supermicro Westmere platform and specific BIOS version. After upgrading the BIOS the panic occurred not so often - specifically in one week of testing it happened only once. Unfortunately at that time the kernel was not set up to create a crash dump, but I have the output below. In one week testing with 2 Intel Whitebox servers I did not have any crash.

At this point I&apos;d suggest to focus on the following:

a) you take a close look at my patch and see if there is room to improve it
b) take a look at the Redhat site (link at top of LU-93 description) - especially at the end - are the findings there correct?
c) did anyone @Whamcloud try to port 1.8.5 to RH6? What was their result?
d) we previously used 1.6.7.1 clients; with the new client I see reduced write performance (writing 50GB file 1.1 GB/s instead of 1.3 GB/s; same backend - a 1.8.1 server; in both cases checksums are OFF)

I will try to get another crash dump - but as long as it&apos;s not happening hunting it down might proof to be impossible

Regards
Michael

LDEBUG2: ffffea000aac8438 2
BUG: unable to handle kernel NULL pointer dereference at 0000000000000002
IP: [&amp;lt;ffffffffa0896d47&amp;gt;] llap_cast_private+0x57/0xe0 [lustre]
PGD 62fbe9067 PUD 51c03c067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/module/ipv6/initstate
CPU 12
Modules linked in: mgc(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) nfs lockd fscache nfs_acl auth_rpcgss acpi_cpufreq freq_table sunrpc rdma_ucm(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ib_cm(U) ib_sa(U) ipv6 ib_uverbs(U) ib_umad(U) mlx4_ib(U) ib_mad(U) ib_core(U) mlx4_core(U) ext2 dm_mirror dm_region_hash dm_log dm_mod serio_raw i2c_i801 sg iTCO_wdt iTCO_vendor_support ioatdma i7core_edac edac_core shpchp igb dca ext3 jbd mbcache sd_mod crc_t10dif sr_mod cdrom usb_storage ahci pata_acpi ata_generic pata_jmicron radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: ipmi_msghandler]

Modules linked in: mgc(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) nfs lockd fscache nfs_acl auth_rpcgss acpi_cpufreq freq_table sunrpc rdma_ucm(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ib_cm(U) ib_sa(U) ipv6 ib_uverbs(U) ib_umad(U) mlx4_ib(U) ib_mad(U) ib_core(U) mlx4_core(U) ext2 dm_mirror dm_region_hash dm_log dm_mod serio_raw i2c_i801 sg iTCO_wdt iTCO_vendor_support ioatdma i7core_edac edac_core shpchp igb dca ext3 jbd mbcache sd_mod crc_t10dif sr_mod cdrom usb_storage ahci pata_acpi ata_generic pata_jmicron radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: ipmi_msghandler]
Pid: 30539, comm: bonnie Not tainted 2.6.32-71.18.1.el6.crt.1.x86_64 #3 X8DTN
RIP: 0010:[&amp;lt;ffffffffa0896d47&amp;gt;]  [&amp;lt;ffffffffa0896d47&amp;gt;] llap_cast_private+0x57/0xe0 [lustre]
RSP: 0018:ffff880125659738  EFLAGS: 00010202
RAX: 0000000000000022 RBX: 0000000000000002 RCX: 00000000000040b3
RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246
RBP: ffff880125659788 R08: ffffffff818a7da0 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffea000aac8438
R13: ffffea000aac8438 R14: ffff8802fcd99210 R15: ffff88031b690600
FS:  00002adadaebbb20(0000) GS:ffff8800282c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000002 CR3: 000000062ebf8000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process bonnie (pid: 30539, threadinfo ffff880125658000, task ffff880233dae100)
Stack:
 ffff8801256597c8 ffffffffa08270f8 ffff880100000000 ffffffffa0808c1f
&amp;lt;0&amp;gt; 0000000000000000 ffff880300001000 000000001500e000 000000001500efff
&amp;lt;0&amp;gt; ffff88031b684000 ffff8802fcd99330 ffff880125659868 ffffffffa089b961
Call Trace:
 [&amp;lt;ffffffffa08270f8&amp;gt;] ? lov_stripe_size+0x248/0x3b0 [lov]
 [&amp;lt;ffffffffa0808c1f&amp;gt;] ? lov_queue_group_io+0x14f/0x490 [lov]
 [&amp;lt;ffffffffa089b961&amp;gt;] llap_from_page_with_lockh.clone.8+0xa1/0x1120 [lustre]
 [&amp;lt;ffffffffa0827527&amp;gt;] ? lov_merge_lvb+0xb7/0x240 [lov]
 [&amp;lt;ffffffffa0826cce&amp;gt;] ? lov_stripe_offset+0x28e/0x340 [lov]
 [&amp;lt;ffffffff8110c0be&amp;gt;] ? find_get_page+0x1e/0xa0
 [&amp;lt;ffffffffa089ec89&amp;gt;] ll_readahead+0xf59/0x1ca0 [lustre]
 [&amp;lt;ffffffffa0808c1f&amp;gt;] ? lov_queue_group_io+0x14f/0x490 [lov]
 [&amp;lt;ffffffffa08a2663&amp;gt;] ll_readpage+0x1313/0x1dd0 [lustre]
 [&amp;lt;ffffffffa0650984&amp;gt;] ? ldlm_lock_remove_from_lru+0x44/0x110 [ptlrpc]
 [&amp;lt;ffffffffa0650bb2&amp;gt;] ? ldlm_lock_fast_match+0xc2/0x130 [ptlrpc]
 [&amp;lt;ffffffff81263a4d&amp;gt;] ? copy_user_generic_string+0x2d/0x40
 [&amp;lt;ffffffff8110d420&amp;gt;] generic_file_aio_read+0x1f0/0x730
 [&amp;lt;ffffffffa087be8c&amp;gt;] ll_file_aio_read+0xc8c/0x2610 [lustre]
 [&amp;lt;ffffffffa087b201&amp;gt;] ? ll_file_aio_read+0x1/0x2610 [lustre]
 [&amp;lt;ffffffffa087d8e0&amp;gt;] ll_file_read+0xd0/0xf0 [lustre]
 [&amp;lt;ffffffff81091df0&amp;gt;] ? autoremove_wake_function+0x0/0x40
 [&amp;lt;ffffffff811ffdd6&amp;gt;] ? security_file_permission+0x16/0x20
 [&amp;lt;ffffffff8116d13d&amp;gt;] ? rw_verify_area+0x5d/0xc0
 [&amp;lt;ffffffff8116dac5&amp;gt;] vfs_read+0xb5/0x1a0
 [&amp;lt;ffffffff8116dc01&amp;gt;] sys_read+0x51/0x90
 [&amp;lt;ffffffff81013172&amp;gt;] system_call_fastpath+0x16/0x1b
Code: c2 76 12 48 85 db 75 23 48 89 d8 4c 8b 65 f8 48 8b 5d f0 c9 c3 48 89 fe 48 89 da 48 c7 c7 2c 45 8c a0 31 c0 e8 54 17 c3 e0 eb d8 &amp;lt;8b&amp;gt; 03 3d 21 06 e3 05 74 d4 89 44 24 28 49 8b 44 24 10 ba 00 00
RIP  [&amp;lt;ffffffffa0896d47&amp;gt;] llap_cast_private+0x57/0xe0 [lustre]
 RSP &amp;lt;ffff880125659738&amp;gt;
CR2: 0000000000000002
---[ end trace 84b72bca9b7c4710 ]---
Kernel panic - not syncing: Fatal exception
Pid: 30539, comm: bonnie Tainted: G      D    ----------------  2.6.32-71.18.1.el6.crt.1.x86_64 #3
Call Trace:
 [&amp;lt;ffffffff814c83da&amp;gt;] panic+0x78/0x137
 [&amp;lt;ffffffff814cc4a4&amp;gt;] oops_end+0xe4/0x100
 [&amp;lt;ffffffff8104652b&amp;gt;] no_context+0xfb/0x260
 [&amp;lt;ffffffff810467b5&amp;gt;] __bad_area_nosemaphore+0x125/0x1e0
 [&amp;lt;ffffffff810468de&amp;gt;] bad_area+0x4e/0x60
 [&amp;lt;ffffffff8106c621&amp;gt;] ? vprintk+0x1d1/0x4f0
 [&amp;lt;ffffffff814cdff0&amp;gt;] do_page_fault+0x390/0x3a0
 [&amp;lt;ffffffff814cb7f5&amp;gt;] page_fault+0x25/0x30
 [&amp;lt;ffffffffa0896d47&amp;gt;] ? llap_cast_private+0x57/0xe0 [lustre]
 [&amp;lt;ffffffffa0896d45&amp;gt;] ? llap_cast_private+0x55/0xe0 [lustre]
 [&amp;lt;ffffffffa08270f8&amp;gt;] ? lov_stripe_size+0x248/0x3b0 [lov]
 [&amp;lt;ffffffffa0808c1f&amp;gt;] ? lov_queue_group_io+0x14f/0x490 [lov]
 [&amp;lt;ffffffffa089b961&amp;gt;] llap_from_page_with_lockh.clone.8+0xa1/0x1120 [lustre]
 [&amp;lt;ffffffffa0827527&amp;gt;] ? lov_merge_lvb+0xb7/0x240 [lov]
 [&amp;lt;ffffffffa0826cce&amp;gt;] ? lov_stripe_offset+0x28e/0x340 [lov]
 [&amp;lt;ffffffff8110c0be&amp;gt;] ? find_get_page+0x1e/0xa0
 [&amp;lt;ffffffffa089ec89&amp;gt;] ll_readahead+0xf59/0x1ca0 [lustre]
 [&amp;lt;ffffffffa0808c1f&amp;gt;] ? lov_queue_group_io+0x14f/0x490 [lov]
 [&amp;lt;ffffffffa08a2663&amp;gt;] ll_readpage+0x1313/0x1dd0 [lustre]
 [&amp;lt;ffffffffa0650984&amp;gt;] ? ldlm_lock_remove_from_lru+0x44/0x110 [ptlrpc]
 [&amp;lt;ffffffffa0650bb2&amp;gt;] ? ldlm_lock_fast_match+0xc2/0x130 [ptlrpc]
 [&amp;lt;ffffffff81263a4d&amp;gt;] ? copy_user_generic_string+0x2d/0x40
 [&amp;lt;ffffffff8110d420&amp;gt;] generic_file_aio_read+0x1f0/0x730
 [&amp;lt;ffffffffa087be8c&amp;gt;] ll_file_aio_read+0xc8c/0x2610 [lustre]
 [&amp;lt;ffffffffa087b201&amp;gt;] ? ll_file_aio_read+0x1/0x2610 [lustre]
 [&amp;lt;ffffffffa087d8e0&amp;gt;] ll_file_read+0xd0/0xf0 [lustre]
 [&amp;lt;ffffffff81091df0&amp;gt;] ? autoremove_wake_function+0x0/0x40
 [&amp;lt;ffffffff811ffdd6&amp;gt;] ? security_file_permission+0x16/0x20
 [&amp;lt;ffffffff8116d13d&amp;gt;] ? rw_verify_area+0x5d/0xc0
 [&amp;lt;ffffffff8116dac5&amp;gt;] vfs_read+0xb5/0x1a0
 [&amp;lt;ffffffff8116dc01&amp;gt;] sys_read+0x51/0x90
 [&amp;lt;ffffffff81013172&amp;gt;] system_call_fastpath+0x16/0x1b
panic occurred, switching back to text console
~. [terminated ipmitool]
[mhebenst@endeavour3 ~]$ exit

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="10964" author="michael.hebenstreit@intel.com" created="Tue, 8 Mar 2011 12:08:27 +0000"  >&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;I created a special kernel with a lot of &quot;if(page-&amp;gt;private==2){printk &quot;LDEBUG##&quot;;} statementents. This test showed quite a few occurrences of this error long before the panic. So the pages where in page cache, but most likely got reclaimed before they were used.

first appearance of any page-&amp;gt;private == 2:

LDEBUGMN16 ffffea000b20de88									__set_page_dirty_nobuffers()
LDEBUGMM18 ffffea000b20de88 2 2 ffff88062f139511 40000000100059			unlock_page()
LDEBUGMM14 ffffea000b20de88 2 2 ffff88062f139511 40000000100058			page_waitqueue()
LDEBUGMM13 ffffea000b20de88 2 2 ffff88062f139511 40000000100058			page_waitqueue()
LDEBUGMM19 ffffea000b20de88 2 2 ffff88062f139511 40000000100058			unlock_page()
LDEBUGMQ20 ffffea000b20de88									lru_cache_add_lru()
LDEBUGMQ18 ffffea000b20de88									__lru_cache_add() ;
LDEBUGMQ19 ffffea000b20de88									__lru_cache_add() ;
LDEBUGMQ21 ffffea000b20de88									lru_cache_add_lru()
LDEBUGMQ08a ffffea000b20de88									put_page()
&#8230; Many more examples of same block

First appearance of page-&amp;gt;private == 2 actually leading to panic

LDEBUGMM18 ffffea0015a228d8 2 2 ffff880630f1ba30 c000000000000d			unlock_page()
LDEBUGMM14 ffffea0015a228d8 2 2 ffff880630f1ba30 c000000000000c			page_waitqueue()
LDEBUGMM13 ffffea0015a228d8 2 2 ffff880630f1ba30 c000000000000c			page_waitqueue()
LDEBUGMM19 ffffea0015a228d8 2 2 ffff880630f1ba30 c000000000000c			unlock_page()
..
.. repeated number of times
..
LDEBUGMM26 ffffea0015a10888 2 1 ffff880630f1ba30 c0000000000020			find_get_page()
LDEBUGMM27 ffffea0015a10888 2 2 ffff880630f1ba30 c0000000000020			find_get_page()
LDEBUGMM28 ffffea0015a10888 2 2 ffff880630f1ba30 c0000000000020			find_get_page()
LDEBUGMM36 ffffea0015a10888 2 2 ffff880630f1ba30 c0000000000020			do_generic_file_read()
LDEBUGMM42 ffffea0015a10888 2 2 ffff880630f1ba30 c0000000000021			do_generic_file_read()
LDEBUGMM43 ffffea0015a10888 2 2 ffff880630f1ba30 c0000000000021			do_generic_file_read()
LDEBUGB1a: ffffea0015a10888 2 2 ffff880630f1ba30 c0000000000021			lustre/rw.c/ll_readpage()


Kernel code with my printk:

#define MYDEBUG(A) if(page &amp;amp;&amp;amp; page-&amp;gt;private==2){printk(&quot;LDEBUGMM:&quot;A&quot; %lx %lx %d %lx %lx\n&quot;,(long int) page,(long int)page-&amp;gt;private, page_count(page), (long int) page-&amp;gt;mapping, page-&amp;gt;flags);}

### mm/page-writeback.c

int __set_page_dirty_nobuffers(struct page *page)
{
        if (!TestSetPageDirty(page)) {
                struct address_space *mapping = page_mapping(page);
                struct address_space *mapping2;
MYDEBUG(&quot;LDEBUGMN16&quot;);
                if (!mapping)
                        return 1;

                spin_lock_irq(&amp;amp;mapping-&amp;gt;tree_lock);
                mapping2 = page_mapping(page);
                if (mapping2) { /* Race with truncate? */
                        BUG_ON(mapping2 != mapping);
                        WARN_ON_ONCE(!PagePrivate(page) &amp;amp;&amp;amp; !PageUptodate(page));
                        account_page_dirtied(page, mapping);
                        radix_tree_tag_set(&amp;amp;mapping-&amp;gt;page_tree,
                                page_index(page), PAGECACHE_TAG_DIRTY);
                }
                spin_unlock_irq(&amp;amp;mapping-&amp;gt;tree_lock);
MYDEBUG(&quot;LDEBUGMN17&quot;);
                if (mapping-&amp;gt;host) {
                        /* !PageAnon &amp;amp;&amp;amp; !swapper_space */
                        __mark_inode_dirty(mapping-&amp;gt;host, I_DIRTY_PAGES);
                }
                return 1;
        }
        return 0;
}

### mm/filemap.c

void unlock_page(struct page *page)
{
MYDEBUGDS(&quot;LDEBUGMM18&quot;);
        VM_BUG_ON(!PageLocked(page));
        clear_bit_unlock(PG_locked, &amp;amp;page-&amp;gt;flags);
        smp_mb__after_clear_bit();
        wake_up_page(page, PG_locked);
MYDEBUG(&quot;LDEBUGMM19&quot;);
}

static wait_queue_head_t *page_waitqueue(struct page *page)
{
        const struct zone *zone = page_zone(page);

MYDEBUG(&quot;LDEBUGMM13&quot;);
        return &amp;amp;zone-&amp;gt;wait_table[hash_ptr(page, zone-&amp;gt;wait_table_bits)];
}

static inline void wake_up_page(struct page *page, int bit)
{
MYDEBUG(&quot;LDEBUGMM14&quot;);
        __wake_up_bit(page_waitqueue(page), &amp;amp;page-&amp;gt;flags, bit);
}

struct page *find_get_page(struct address_space *mapping, pgoff_t offset)
{
        void **pagep;
        struct page *page;

        rcu_read_lock();
repeat:
        page = NULL;
        pagep = radix_tree_lookup_slot(&amp;amp;mapping-&amp;gt;page_tree, offset);
        if (pagep) {
                page = radix_tree_deref_slot(pagep);
                if (unlikely(!page || page == RADIX_TREE_RETRY))
                        goto repeat;
MYDEBUG(&quot;LDEBUGMM 26&quot;);

                if (!page_cache_get_speculative(page))
                        goto repeat;
MYDEBUG(&quot;LDEBUGMM27&quot;);
                /*
                 * Has the page moved?
                 * This is part of the lockless pagecache protocol. See
                 * include/linux/pagemap.h for details.
                 */
                if (unlikely(page != *pagep)) {
                        page_cache_release(page);
                        goto repeat;
                }
        }
        rcu_read_unlock();

MYDEBUG(&quot;LDEBUGMM28&quot;);
        return page;
}

static void do_generic_file_read(struct file *filp, loff_t *ppos,
                read_descriptor_t *desc, read_actor_t actor)
{
        struct address_space *mapping = filp-&amp;gt;f_mapping;
        struct inode *inode = mapping-&amp;gt;host;
        struct file_ra_state *ra = &amp;amp;filp-&amp;gt;f_ra;
        pgoff_t index;
        pgoff_t last_index;
        pgoff_t prev_index;
        unsigned long offset;      /* offset into pagecache page */
        unsigned int prev_offset;
        int error;

        index = *ppos &amp;gt;&amp;gt; PAGE_CACHE_SHIFT;
        prev_index = ra-&amp;gt;prev_pos &amp;gt;&amp;gt; PAGE_CACHE_SHIFT;
        prev_offset = ra-&amp;gt;prev_pos &amp;amp; (PAGE_CACHE_SIZE-1);
        last_index = (*ppos + desc-&amp;gt;count + PAGE_CACHE_SIZE-1) &amp;gt;&amp;gt; PAGE_CACHE_SHIFT;
        offset = *ppos &amp;amp; ~PAGE_CACHE_MASK;

        for (;;) {
                struct page *page;
                pgoff_t end_index;
                loff_t isize;
                unsigned long nr, ret;

                cond_resched();
find_page:
                page = find_get_page(mapping, index);
                if (!page) {
                        page_cache_sync_readahead(mapping,
                                        ra, filp,
                                        index, last_index - index);
                        page = find_get_page(mapping, index);
                        if (unlikely(page == NULL))
                                goto no_cached_page;
                }
MYDEBUG(&quot;LDEBUGMM36&quot;);
                if (PageReadahead(page)) {
                        page_cache_async_readahead(mapping,
                                        ra, filp, page,
                                        index, last_index - index);
MYDEBUG(&quot;LDEBUGMM37&quot;);
                }
                if (!PageUptodate(page)) {
                        if (inode-&amp;gt;i_blkbits == PAGE_CACHE_SHIFT ||
                                        !mapping-&amp;gt;a_ops-&amp;gt;is_partially_uptodate)
                                goto page_not_up_to_date;
                        if (!trylock_page(page))
                                goto page_not_up_to_date;
                        if (!mapping-&amp;gt;a_ops-&amp;gt;is_partially_uptodate(page,
                                                                desc, offset))
                                goto page_not_up_to_date_locked;
MYDEBUG(&#8220;LDEBUGMM38&quot;);
                        unlock_page(page);
                }
page_ok:
                /*
                 * i_size must be checked after we know the page is Uptodate.
                 *
                 * Checking i_size after the check allows us to calculate
                 * the correct value for &quot;nr&quot;, which means the zero-filled
                 * part of the page is not copied back to userspace (unless
                 * another truncate extends the file - this is desired though).
                 */
MYDEBUG(&quot;LDEBUGMM39&quot;);
                isize = i_size_read(inode);
                end_index = (isize - 1) &amp;gt;&amp;gt; PAGE_CACHE_SHIFT;
                if (unlikely(!isize || index &amp;gt; end_index)) {
                        page_cache_release(page);
                        goto out;
                }

                /* nr is the maximum number of bytes to copy from this page */
                nr = PAGE_CACHE_SIZE;
                if (index == end_index) {
                        nr = ((isize - 1) &amp;amp; ~PAGE_CACHE_MASK) + 1;
                        if (nr &amp;lt;= offset) {
                                page_cache_release(page);
                                goto out;
                        }
                }
                nr = nr - offset;

                /* If users can be writing to this page using arbitrary
                 * virtual addresses, take care about potential aliasing
                 * before reading the page on the kernel side.
                 */
                if (mapping_writably_mapped(mapping))
                        flush_dcache_page(page);
MYDEBUG(&quot;LDEBUGMM40&quot;);
                /*
                 * When a sequential read accesses a page several times,
                 * only mark it as accessed the first time.
                 */
                if (prev_index != index || offset != prev_offset)
                        mark_page_accessed(page);
                prev_index = index;

                /*
                 * Ok, we have the page, and it&apos;s up-to-date, so
                 * now we can copy it to user space...
                 *
                 * The actor routine returns how many bytes were actually used..
                 * NOTE! This may not be the same as how much of a user buffer
                 * we filled up (we may be padding etc), so we can only update
                 * &quot;pos&quot; here (the actor routine has to update the user buffer
                 * pointers and the remaining count).
                 */
                ret = actor(desc, page, offset, nr);
                offset += ret;
                index += offset &amp;gt;&amp;gt; PAGE_CACHE_SHIFT;
                offset &amp;amp;= ~PAGE_CACHE_MASK;
                prev_offset = offset;
MYDEBUG(&quot;LDEBUGMM41&quot;);
                page_cache_release(page);
                if (ret == nr &amp;amp;&amp;amp; desc-&amp;gt;count)
                        continue;
                goto out;

page_not_up_to_date:
                /* Get exclusive access to the page ... */
                error = lock_page_killable(page);
                if (unlikely(error))
                        goto readpage_error;
MYDEBUG(&quot;LDEBUGMM42&quot;);
page_not_up_to_date_locked:
                /* Did it get truncated before we got the lock? */
                if (!page-&amp;gt;mapping) {
                        unlock_page(page);
                        page_cache_release(page);
                        continue;
                }

                /* Did somebody else fill it already? */
                if (PageUptodate(page)) {
                        unlock_page(page);
                        goto page_ok;
                }

readpage:
                /*
                 * A previous I/O error may have been due to temporary
                 * failures, eg. multipath errors.
                 * PG_error will be set again if readpage fails.
                 */
MYDEBUG(&quot;LDEBUGMM43&quot;);
                ClearPageError(page);
                /* Start the actual read. The read will unlock the page. */
                error = mapping-&amp;gt;a_ops-&amp;gt;readpage(filp, page);
MYDEBUG(&quot;LDEBUGMM44&quot;);
                if (unlikely(error)) {
                        if (error == AOP_TRUNCATED_PAGE) {
                                page_cache_release(page);
                                goto find_page;
                        }
                        goto readpage_error;
                }

                if (!PageUptodate(page)) {
MYDEBUG(&quot;LDEBUGMM45&quot;);
                        error = lock_page_killable(page);
                        if (unlikely(error))
                                goto readpage_error;
                        if (!PageUptodate(page)) {
                                if (page-&amp;gt;mapping == NULL) {
                                        /*
                                         * invalidate_inode_pages got it
                                         */
                                        unlock_page(page);
                                        page_cache_release(page);
                                        goto find_page;
                                }
MYDEBUG(&quot;LDEBUGMM46&quot;);
                                unlock_page(page);
                                shrink_readahead_size_eio(filp, ra);
                                error = -EIO;
                                goto readpage_error;
                        }
MYDEBUG(&quot;LDEBUGMM47&quot;);
                        unlock_page(page);
                }

                goto page_ok;

readpage_error:
                /* UHHUH! A synchronous read error occurred. Report it */
                desc-&amp;gt;error = error;
                page_cache_release(page);
                goto out;

no_cached_page:
                /*
                 * Ok, it wasn&apos;t cached, so we need to create a new
                 * page..
                 */
                page = page_cache_alloc_cold(mapping);
                if (!page) {
                        desc-&amp;gt;error = -ENOMEM;
                        goto out;
                }
                error = add_to_page_cache_lru(page, mapping,
                                                index, GFP_KERNEL);
                if (error) {
                        page_cache_release(page);
                        if (error == -EEXIST)
                                goto find_page;
                        desc-&amp;gt;error = error;
                        goto out;
                }
                goto readpage;
MYDEBUG(&quot;LDEBUGMM48&quot;);
        }
out:
        ra-&amp;gt;prev_pos = prev_index;
        ra-&amp;gt;prev_pos &amp;lt;&amp;lt;= PAGE_CACHE_SHIFT;
        ra-&amp;gt;prev_pos |= prev_offset;

        *ppos = ((loff_t)index &amp;lt;&amp;lt; PAGE_CACHE_SHIFT) + offset;
        file_accessed(filp);
}

### mm/swap.c

void __lru_cache_add(struct page *page, enum lru_list lru)
{
        struct pagevec *pvec = &amp;amp;get_cpu_var(lru_add_pvecs)[lru];
MYDEBUG(&quot;LDEBUGMQ18&quot;);
        page_cache_get(page);
        if (!pagevec_add(pvec, page))
                ____pagevec_lru_add(pvec, lru);
        put_cpu_var(lru_add_pvecs);
MYDEBUG(&quot;LDEBUGMQ19&quot;);
}

/**
 * lru_cache_add_lru - add a page to a page list
 * @page: the page to be added to the LRU.
 * @lru: the LRU list to which the page is added.
 */
void lru_cache_add_lru(struct page *page, enum lru_list lru)
{
MYDEBUG(&quot;LDEBUGMQ20&quot;);
        if (PageActive(page)) {
                VM_BUG_ON(PageUnevictable(page));
                ClearPageActive(page);
        } else if (PageUnevictable(page)) {
                VM_BUG_ON(PageActive(page));
                ClearPageUnevictable(page);
        }

        VM_BUG_ON(PageLRU(page) || PageActive(page) || PageUnevictable(page));
        __lru_cache_add(page, lru);
MYDEBUG(&quot;LDEBUGMQ21&quot;);
}

void put_page(struct page *page)
{
MYDEBUG(&quot;LDEBUGMQ08a&quot;);
        if (unlikely(PageCompound(page)))
                put_compound_page(page);
        else if (put_page_testzero(page))
                __put_single_page(page);
}


&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="10965" author="michael.hebenstreit@intel.com" created="Tue, 8 Mar 2011 12:51:51 +0000"  >&lt;p&gt;Patch used to compile Lustre 1.8.5 on RH6&lt;/p&gt;</comment>
                            <comment id="10969" author="laisiyao" created="Tue, 8 Mar 2011 18:42:27 +0000"  >&lt;p&gt;Michael, the patch of patchless client against 1.8.6 is at &lt;a href=&quot;http://review.whamcloud.com/#change,282&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,282&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="10978" author="laisiyao" created="Wed, 9 Mar 2011 00:11:17 +0000"  >&lt;p&gt;for &lt;a href=&quot;https://bugzilla.redhat.com/show_bug.cgi?id=678175#c8&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;redhat bugzilla comment 8&lt;/a&gt;, that piece of code is called in ll_removepage(), and this function is called in&lt;br/&gt;
  .commit_write   (ll_commit_write)&lt;br/&gt;
  .invalidatepage (ll_invalidatepage)&lt;br/&gt;
  .releasepage    (ll_releasepage)&lt;br/&gt;
while all these functions require page locked before entering, so the comment just explains this.&lt;/p&gt;

&lt;p&gt;for &lt;a href=&quot;https://bugzilla.redhat.com/show_bug.cgi?id=678175#c11&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;redhat bugzilla comment 11&lt;/a&gt;, I have explained above: page_cache_release is not redefined for linux kernel, that definition is for liblustre.&lt;/p&gt;</comment>
                            <comment id="10988" author="laisiyao" created="Wed, 9 Mar 2011 17:39:50 +0000"  >&lt;p&gt;Michael, you can checkout 1.8.6 code with this:&lt;br/&gt;
git clone git://git.whamcloud.com/fs/lustre-release.git&lt;/p&gt;

&lt;p&gt;and then you can apply patch the rhel6.&lt;/p&gt;</comment>
                            <comment id="10990" author="michael.hebenstreit@intel.com" created="Wed, 9 Mar 2011 18:06:12 +0000"  >&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Note: at this point it is NOT clear if this is a Lustre bug or root caused by a hardware/BIOS problem. Fact is, an upgrade of BIOS greatly reduced the frequency of the kernel panic. Another indicator for hardware problem is, that on a different platform the error could not be reproduced even after 10days of testing.

Current hypothesis: a page is cached by the kernel; multiple times &quot;unlock_page()&quot; is acting on this page while page-&amp;gt;private is already equal to &quot;2&quot;. When a &quot;ll_readpage()&quot; is called on this page, this leads to kernel panic. 

The only function found so far within this contexts that sets page-&amp;gt;private is free_hot_page. In some cases via inserting printk statements it was possible to show, that the Lustre function page_cache_release (which calls free_hot_page) had acted on the same page that later caused the panic (see below for example).

Tasks after meeting 2011-03-09: 
1) verify if panic is a Software or a Hardware problem:
  a) analyze the code paths - is it possible due to a race condition to execute free_hot_page() but keep the page in the page-cache?
  b) analyze code path - is it possible that re-using a page does not clear &quot;page-&amp;gt;private&quot; correctly?
2) Performance hit: 
  c) reproduce at Whamcloud single client performance. Intel reaches (writing with dd a single 50GB file) 1.3GB/s; client is NHM/WSM (2.8GHz-3GHz), RH 5.4, Lustre 1.6.7; Server is Lustre 1.8.1 (or 1.6X on different system, but dd performance remains identical). No checksums
3)  next meeting via Skype - Thursday 2011-03-10 evening PST



&amp;gt; &amp;gt; LDEBUGZ0064: ffffea00014f57c8                    Lustre/llap_shrink_cache_internal after calling page_cache_release)
&amp;gt; &amp;gt; ...
&amp;gt; &amp;gt; LDEBUGMM:26 ffffea00014f57c8                     Kernel:mm/filemap.c, find_get_page()
&amp;gt; &amp;gt; LDEBUGMM:27 ffffea00014f57c8
&amp;gt; &amp;gt; LDEBUGMM:28 ffffea00014f57c8
&amp;gt; &amp;gt; LDEBUGZ0175: ffffea00014f57c8                    copy of grab_cache_page_nowait(); copied to my lustre code and modified to differentiate between &quot;page found in cache&quot; and &quot;new page created&quot;)
&amp;gt; &amp;gt; LDEBUGLe: ffffea00014f57c8                        Lustre: Read-ahead function after grab_cache_page returns
&amp;gt; &amp;gt; LDEBUGLi: ffffea00014f57c8
&amp;gt; &amp;gt; LDEBUGA1: ffffea00014f57c8                        Lustre: llap_from_page_with_lockh()
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt; first item is the location, second part is page (aka the kernel address of the page structure). So the same page that later causes the trouble has been released once. let&apos;s go over the code parts in detail:
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt; as part of shrinking an Lustre internal cache in llap_shrink_cache_internal Lustre calls
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt;                 unlock_page(page);
&amp;gt; &amp;gt;                 page_cache_release(page);
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt;                 ll_pglist_cpu_lock(sbi, cpu);
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt; After page_cache_release (location LDEBUGZ0064) the pointer page-&amp;gt;private==2. page cache release is a simple #define, leading to free_hot_cold_page(). I guess at the lines marked with XX page-&amp;gt;private is set to 2:
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt; #define page_cache_release(page) __free_pages(page, 0)
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt; void __free_pages(struct page *page, unsigned int order)
&amp;gt; &amp;gt; {
&amp;gt; &amp;gt;         if (put_page_testzero(page)) {
&amp;gt; &amp;gt;                 trace_mm_page_free_direct(page, order);
&amp;gt; &amp;gt;                 if (order == 0)
&amp;gt; &amp;gt;                         free_hot_page(page);
&amp;gt; &amp;gt;                 else
&amp;gt; &amp;gt;                         __free_pages_ok(page, order);
&amp;gt; &amp;gt;         }
&amp;gt; &amp;gt; }
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt; void free_hot_page(struct page *page)
&amp;gt; &amp;gt; {
&amp;gt; &amp;gt;         trace_mm_page_free_direct(page, 0);
&amp;gt; &amp;gt;         free_hot_cold_page(page, 0);
&amp;gt; &amp;gt; }
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt; static void free_hot_cold_page(struct page *page, int cold)
&amp;gt; &amp;gt; {
&amp;gt; &amp;gt;         struct zone *zone = page_zone(page);
&amp;gt; &amp;gt;         struct per_cpu_pages *pcp;
&amp;gt; &amp;gt;         unsigned long flags;
&amp;gt; &amp;gt;         int migratetype;
&amp;gt; &amp;gt;         int wasMlocked = __TestClearPageMlocked(page);
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt;         kmemcheck_free_shadow(page, 0);
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt;         if (PageAnon(page))
&amp;gt; &amp;gt;                 page-&amp;gt;mapping = NULL;
&amp;gt; &amp;gt;         if (free_pages_check(page))
&amp;gt; &amp;gt;                 return;
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt;         if (!PageHighMem(page)) {
&amp;gt; &amp;gt;                 debug_check_no_locks_freed(page_address(page), PAGE_SIZE);
&amp;gt; &amp;gt;                 debug_check_no_obj_freed(page_address(page), PAGE_SIZE);
&amp;gt; &amp;gt;         }
&amp;gt; &amp;gt;         arch_free_page(page, 0);
&amp;gt; &amp;gt;         kernel_map_pages(page, 1, 0);
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt;         pcp = &amp;amp;zone_pcp(zone, get_cpu())-&amp;gt;pcp;
&amp;gt; &amp;gt; XX      migratetype = get_pageblock_migratetype(page);
&amp;gt; &amp;gt; XX      set_page_private(page, migratetype);
&amp;gt; &amp;gt;         local_irq_save(flags);
&amp;gt; &amp;gt;         if (unlikely(wasMlocked))
&amp;gt; &amp;gt;                 free_page_mlock(page);
&amp;gt; &amp;gt;         __count_vm_event(PGFREE);
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt;         /*
&amp;gt; &amp;gt;          * We only track unmovable, reclaimable and movable on pcp lists.
&amp;gt; &amp;gt;          * Free ISOLATE pages back to the allocator because they are being
&amp;gt; &amp;gt;          * offlined but treat RESERVE as movable pages so we can get those
&amp;gt; &amp;gt;          * areas back if necessary. Otherwise, we may have to free
&amp;gt; &amp;gt;          * excessively into the page allocator
&amp;gt; &amp;gt;          */
&amp;gt; &amp;gt;         if (migratetype &amp;gt;= MIGRATE_PCPTYPES) {
&amp;gt; &amp;gt;                 if (unlikely(migratetype == MIGRATE_ISOLATE)) {
&amp;gt; &amp;gt;                         free_one_page(zone, page, 0, migratetype);
&amp;gt; &amp;gt;                         goto out;
&amp;gt; &amp;gt;                 }
&amp;gt; &amp;gt;                 migratetype = MIGRATE_MOVABLE;
&amp;gt; &amp;gt;         }
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt;         if (cold)
&amp;gt; &amp;gt;                 list_add_tail(&amp;amp;page-&amp;gt;lru, &amp;amp;pcp-&amp;gt;lists[migratetype]);
&amp;gt; &amp;gt;         else
&amp;gt; &amp;gt;                 list_add(&amp;amp;page-&amp;gt;lru, &amp;amp;pcp-&amp;gt;lists[migratetype]);
&amp;gt; &amp;gt;         pcp-&amp;gt;count++;
&amp;gt; &amp;gt;         if (pcp-&amp;gt;count &amp;gt;= pcp-&amp;gt;high) {
&amp;gt; &amp;gt;                 free_pcppages_bulk(zone, pcp-&amp;gt;batch, pcp);
&amp;gt; &amp;gt;                 pcp-&amp;gt;count -= pcp-&amp;gt;batch;
&amp;gt; &amp;gt;         }
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt; out:
&amp;gt; &amp;gt;         local_irq_restore(flags);
&amp;gt; &amp;gt;         put_cpu();
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt; /*
&amp;gt; &amp;gt;  * Drop a ref, return true if the refcount fell to zero (the page has no users)
&amp;gt; &amp;gt;  */
&amp;gt; &amp;gt; static inline int put_page_testzero(struct page *page)
&amp;gt; &amp;gt; {
&amp;gt; &amp;gt;         VM_BUG_ON(atomic_read(&amp;amp;page-&amp;gt;_count) == 0);
&amp;gt; &amp;gt;         return atomic_dec_and_test(&amp;amp;page-&amp;gt;_count);
&amp;gt; &amp;gt; }
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt; /*
&amp;gt; &amp;gt;  * Try to grab a ref unless the page has a refcount of zero, return false if
&amp;gt; &amp;gt;  * that is the case.
&amp;gt; &amp;gt;  */
&amp;gt; &amp;gt; static inline int get_page_unless_zero(struct page *page)
&amp;gt; &amp;gt; {
&amp;gt; &amp;gt;         return atomic_inc_not_zero(&amp;amp;page-&amp;gt;_count);
&amp;gt; &amp;gt; }
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt; The Lustre part follows this calling structure:
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt; read_page (called by kernel via standard address_space operations)
&amp;gt; &amp;gt;   -&amp;gt; Read-ahead 
&amp;gt; &amp;gt;     -&amp;gt; grab_cache_page_nowait()
&amp;gt; &amp;gt;         -&amp;gt; find_get_page() (success, page is in page cache but page-&amp;gt;private == 2)
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt;   llap = llap_from_page_with_lockh(page, LLAP_ORIGIN_READAHEAD, &amp;amp;lockh, flags);
&amp;gt; &amp;gt;     -&amp;gt; cast_private (tries to dereference page-&amp;gt;private-&amp;gt;llap_magic)
&amp;gt; &amp;gt;         PANIC
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt; struct page * grab_cache_page_nowait(struct address_space *mapping, pgoff_t index)
&amp;gt; &amp;gt; {
&amp;gt; &amp;gt;         struct page *page = find_get_page(mapping, index);
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt;         if (page) {
&amp;gt; &amp;gt;                 if (trylock_page(page))
&amp;gt; &amp;gt;                         return page;
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt;                 page_cache_release(page);
&amp;gt; &amp;gt;                 return NULL;
&amp;gt; &amp;gt;         }
&amp;gt; &amp;gt;         page = __page_cache_alloc(mapping_gfp_mask(mapping) &amp;amp; ~__GFP_FS);
&amp;gt; &amp;gt;         if (page &amp;amp;&amp;amp; add_to_page_cache_lru(page, mapping, index, GFP_NOFS)) {
&amp;gt; &amp;gt;                 page_cache_release(page);
&amp;gt; &amp;gt;                 page = NULL;
&amp;gt; &amp;gt;         }
&amp;gt; &amp;gt;         return page;
&amp;gt; &amp;gt; }
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt; struct page *find_get_page(struct address_space *mapping, pgoff_t offset)
&amp;gt; &amp;gt; {
&amp;gt; &amp;gt;         void **pagep;
&amp;gt; &amp;gt;         struct page *page;
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt;         rcu_read_lock();
&amp;gt; &amp;gt; repeat:
&amp;gt; &amp;gt;         page = NULL;
&amp;gt; &amp;gt;         pagep = radix_tree_lookup_slot(&amp;amp;mapping-&amp;gt;page_tree, offset);
&amp;gt; &amp;gt;         if (pagep) {
&amp;gt; &amp;gt;                 page = radix_tree_deref_slot(pagep);
&amp;gt; &amp;gt;                 if (unlikely(!page || page == RADIX_TREE_RETRY))
&amp;gt; &amp;gt;                         goto repeat;
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt;                  # AT THIS point page-&amp;gt;private is already ==2
&amp;gt; &amp;gt; 
&amp;gt; &amp;gt; 


&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="10996" author="johann" created="Thu, 10 Mar 2011 06:24:16 +0000"  >&lt;p&gt;For the record, another customer also reported some weird assertion failures (although not the&lt;br/&gt;
same assertion as in this bug) with the SLES11 SP1 kernel. Somehow PG_locked was cleared&lt;br/&gt;
behind our back. Surprisingly, it also turned out to be a BIOS issue.&lt;/p&gt;

&lt;p&gt;Unfortunately, the bugzilla ticket (i.e. bug 23880) is private.&lt;/p&gt;</comment>
                            <comment id="10997" author="laisiyao" created="Thu, 10 Mar 2011 06:38:39 +0000"  >&lt;p&gt;dump bad page info:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;crash&amp;gt; struct page ffffea000b1d09a8
struct page {
  flags = 18014398509482017, 
  _count = {
    counter = 2
  }, 
  {
    _mapcount = {
      counter = -1
    }, 
    {
      inuse = 65535, 
      objects = 65535
    }
  }, 
  {
    {
      private = 2, 
      mapping = 0xffff88062f1cea70
    }, 
    ptl = {
      raw_lock = {
        slock = 2
      }
    }, 
    slab = 0x2, 
    first_page = 0x2
  }, 
  {
    index = 379628, 
    freelist = 0x5caec
  }, 
  lru = {
    next = 0xffffea000b1d0a08, 
    prev = 0xffffea000b1d0998
  }
}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="11009" author="laisiyao" created="Thu, 10 Mar 2011 17:27:17 +0000"  >&lt;p&gt;Michael, as you pointed out, page-&amp;gt;private is borrowed to store migratetype (the value might be 2) in free_hot_cold_page() in RH6, though this is suspicious, I can&apos;t imagine a free page but still in page cache (which may trigger panic not so hard, and not limited to RH6 either).&lt;/p&gt;

&lt;p&gt;I will wait for the latest test result; or after you get more direct evidence of the above scenario.&lt;/p&gt;</comment>
                            <comment id="11011" author="laisiyao" created="Thu, 10 Mar 2011 19:54:34 +0000"  >&lt;p&gt;Work done for this issue in my side (16 hours total):&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;analyzed RH6 kernel page allocation/free code path changes against RH5 kernel.&lt;/li&gt;
	&lt;li&gt;reviewed lustre page free allocation/free code path.&lt;/li&gt;
	&lt;li&gt;analyzed vmcore and test logs.&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="11061" author="pjones" created="Mon, 14 Mar 2011 10:27:25 +0000"  >&lt;p&gt;update from Michael&lt;/p&gt;

&lt;p&gt;&quot;thanks for the excellent work so far&lt;/p&gt;

&lt;p&gt;Lai - can you please answer the following question: based on your professional experience, with the data we have so far - how likely are the following scenarios:&lt;/p&gt;

&lt;p&gt;a) it&apos;s a bug in the Redhat Linux kernel&lt;br/&gt;
b) it&apos;s a bug in Lustre&lt;br/&gt;
c) it&apos;s a hardware/BIOS related bug&lt;br/&gt;
d) it&apos;s something completely different&lt;/p&gt;

&lt;p&gt;could you assign probabilities to those 4 option (they should add up to 100%) - my own asessment would currently be:&lt;/p&gt;

&lt;p&gt;1%   it&apos;s a bug in the Redhat Linux kernel&lt;br/&gt;
4%   it&apos;s a bug in Lustre&lt;br/&gt;
90% it&apos;s a hardware/BIOS related bug&lt;br/&gt;
5%   it&apos;s something completely different&lt;/p&gt;

&lt;p&gt;for the next steps I need to do some preparation first&lt;/p&gt;

&lt;p&gt;a) create a new image based on RH6.1 and start exhaustive test procedure to see if the error creeps up again&lt;br/&gt;
b) create 2 systems based on RH5, one with 1.6.7.2, one with 1.8.2 to show performance problems&lt;/p&gt;

&lt;p&gt;so there is no need for a meeting today&lt;br/&gt;
&quot;&lt;/p&gt;</comment>
                            <comment id="11062" author="pjones" created="Mon, 14 Mar 2011 10:48:55 +0000"  >&lt;p&gt;Michael&lt;/p&gt;

&lt;p&gt;I spoke with Johann Lombardi who was the tech lead for 1.6.x and 1.8.x for many years and he agrees with your rough probabilities based upon the information presently available&lt;/p&gt;

&lt;p&gt;Regards&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="11136" author="johann" created="Wed, 16 Mar 2011 04:15:36 +0000"  >&lt;p&gt;Another customer hit the same bug with lustre 2.0. See &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-130&quot; title=&quot;Kernel crash on lustre 2.0 client (page fault in ll_file_read, NULL pointer dereference)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-130&quot;&gt;&lt;del&gt;LU-130&lt;/del&gt;&lt;/a&gt;.&lt;br/&gt;
Michael, could you please tell us what BIOS version has fixed the problem for you?&lt;/p&gt;</comment>
                            <comment id="11146" author="michael.hebenstreit@intel.com" created="Wed, 16 Mar 2011 08:38:41 +0000"  >&lt;p&gt;commented in the 130 thread&lt;/p&gt;</comment>
                            <comment id="11320" author="sebastien.buisson" created="Thu, 24 Mar 2011 02:24:12 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;Michael I have a question for you. At Bull we are trying to identify which microcode can solve this issue. For our MESCA machines we are testing a new BIOS that integrates the microcode update codename M04206E6_00000008.&lt;br/&gt;
Do you think you could retrieve the microcode revision brought by the BIOS update of your SuperMicro machines? That would be very helpful for us!&lt;/p&gt;

&lt;p&gt;People from the Kernel Team here at Bull pointed us a kernel bug that is fixed in 2.6.32 vanilla and RHEL6.1 beta, dealing with TLB entries:&lt;br/&gt;
&lt;a href=&quot;https://patchwork.kernel.org/patch/564801/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://patchwork.kernel.org/patch/564801/&lt;/a&gt;&lt;br/&gt;
Do you think it could be related to the present bug, and fix this issue?&lt;/p&gt;

&lt;p&gt;TIA,&lt;br/&gt;
Sebastien.&lt;/p&gt;</comment>
                            <comment id="11325" author="michael.hebenstreit@intel.com" created="Thu, 24 Mar 2011 08:33:23 +0000"  >&lt;p&gt;it would fit the circumstances where/when it happened - but in my case a BIOS upgrade fixed it. This would rather speak against it. &lt;/p&gt;</comment>
                            <comment id="11329" author="michael.hebenstreit@intel.com" created="Thu, 24 Mar 2011 08:53:16 +0000"  >&lt;p&gt;do you have a good way to retrieve this number? otherwise here from our Linux output:&lt;/p&gt;

&lt;p&gt;From dmesg on et66, intel-ucode/06-2c-02&lt;/p&gt;

&lt;p&gt;microcode: CPU23 sig=0x206c2, pf=0x1, revision=0x13&lt;br/&gt;
platform microcode: firmware: requesting intel-ucode/06-2c-02&lt;/p&gt;</comment>
                            <comment id="11332" author="michael.hebenstreit@intel.com" created="Thu, 24 Mar 2011 09:46:44 +0000"  >&lt;p&gt;the BIOS used came from &lt;a href=&quot;http://www.supermicro.com/support/bios/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://www.supermicro.com/support/bios/&lt;/a&gt;&lt;br/&gt;
File: X8DTN1.zip&lt;/p&gt;</comment>
                            <comment id="11388" author="patrick.valentin" created="Fri, 25 Mar 2011 10:52:25 +0000"  >&lt;p&gt;If &quot;dmidecode&quot; command is available, you can run it with the following parameters to print BIOS information:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;dmidecode -t0 -t1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="11389" author="michael.hebenstreit@intel.com" created="Fri, 25 Mar 2011 11:06:01 +0000"  >&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# dmidecode 2.10
SMBIOS 2.6 present.

Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
        Vendor: American Megatrends Inc.
        Version: 080016
        Release Date: 02/11/2011
        Address: 0xF0000
        Runtime Size: 64 kB
        ROM Size: 4096 kB
        Characteristics:
                ISA is supported
                PCI is supported
                PNP is supported
                BIOS is upgradeable
                BIOS shadowing is allowed
                ESCD support is available
                Boot from CD is supported
                Selectable boot is supported
                BIOS ROM is socketed
                EDD is supported
                5.25&quot;/1.2 MB floppy services are supported (int 13h)
                3.5&quot;/720 kB floppy services are supported (int 13h)
                3.5&quot;/2.88 MB floppy services are supported (int 13h)
                Print screen service is supported (int 5h)
                8042 keyboard services are supported (int 9h)
                Serial services are supported (int 14h)
                Printer services are supported (int 17h)
                CGA/mono video services are supported (int 10h)
                ACPI is supported
                USB legacy is supported
                LS-120 boot is supported
                ATAPI Zip drive boot is supported
                BIOS boot specification is supported
                Targeted content distribution is supported
        BIOS Revision: 8.16

Handle 0x0001, DMI type 1, 27 bytes
System Information
        Manufacturer: Supermicro
        Product Name: X8DTN
        Version: 1234567890
        Serial Number: 1234567890
        UUID: 54443858-4E54-3000-48F4-003048F40FA6
        Wake-up Type: Power Switch
        SKU Number: 1234567890
        Family: Server

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="15351" author="pjones" created="Wed, 1 Jun 2011 05:59:03 +0000"  >&lt;p&gt;This work is now complete&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="10890">LU-339</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="10140" name="lustre.patch" size="6685" author="michael.hebenstreit@intel.com" created="Tue, 8 Mar 2011 12:51:51 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10040" key="com.atlassian.jira.plugin.system.customfieldtypes:labels">
                        <customfieldname>Epic</customfieldname>
                        <customfieldvalues>
                                        <label>client</label>
            <label>patchless_client</label>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvso7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>8549</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10023"><![CDATA[4]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>