<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:03:51 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-6854] Setting page_writeback on a non-dirty page</title>
                <link>https://jira.whamcloud.com/browse/LU-6854</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;A recent change in the upstream kernel uncovered what I think is a bug in our handling of writeback bit on pages.&lt;/p&gt;

&lt;p&gt;If we go by the &quot;sync io&quot; path: vvp_io_commit_write-&amp;gt;vvp_page_sync_io-&amp;gt;&#8230;..-&amp;gt;vvp_page_prep_write&lt;/p&gt;

&lt;p&gt;then the page is never dirty, vvp_page_prep_write then asserts the page is not dirty and then sets it as writeback:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;static int vvp_page_prep_write(const struct lu_env *env,
                               const struct cl_page_slice *slice,
                               struct cl_io *unused)
{
        struct page *vmpage = cl2vm_page(slice);

        LASSERT(PageLocked(vmpage));
        LASSERT(!PageDirty(vmpage));

        set_page_writeback(vmpage);
        vvp_write_pending(cl2ccc(slice-&amp;gt;cpl_obj), cl2ccc_page(slice));

        return 0;
}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now, the problem is, from kernel perspective page writeback means this is a cached page that is just in flight being written to the device, so it must start as dirty in the first place and we violate that assumption.&lt;/p&gt;

&lt;p&gt;So now in 4.2.0 there&apos;s new cgroup dirty page accounting logic and as part of that set_page_writeback updates a writeback structure hanging off inode (i_wb), but this structure is only initialized when either an inode is dirtied or a page is dirtied and we crash otherwise (that&apos;s how I uncovered this).&lt;/p&gt;

&lt;p&gt;Since this path I am talking about is a sync write, there are two trains of thoughts possible here, I imagine.&lt;br/&gt;
1: This is a sync write, so that&apos;s why we do not set dirty bit and we do a sync writeout =&amp;gt; we probably don&apos;t need to set page_writeback either then.&lt;br/&gt;
2: The page is in cache already anyway (that&apos;s how we got to it in the first place), so even though we cannot add it to OUR cache, we still need to set it dirty and then it&apos;ll become clean once the write completes anyway and we can throw it away out of cache in the same breath too if we want to.&lt;/p&gt;</description>
                <environment></environment>
        <key id="31095">LU-6854</key>
            <summary>Setting page_writeback on a non-dirty page</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="bobijam">Zhenyu Xu</assignee>
                                    <reporter username="green">Oleg Drokin</reporter>
                        <labels>
                    </labels>
                <created>Wed, 15 Jul 2015 17:00:05 +0000</created>
                <updated>Mon, 17 Apr 2017 16:12:37 +0000</updated>
                            <resolved>Tue, 10 Jan 2017 13:44:27 +0000</resolved>
                                    <version>Lustre 2.5.3</version>
                    <version>Lustre 2.8.0</version>
                                    <fixVersion>Lustre 2.10.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>9</watches>
                                                                            <comments>
                            <comment id="121364" author="green" created="Wed, 15 Jul 2015 17:01:30 +0000"  >&lt;p&gt;Jinshan told me a similar (though code is different) problem exists in master too, so that&apos;s why 2.8.0 is in the list of affected versions.&lt;/p&gt;</comment>
                            <comment id="121399" author="gerrit" created="Wed, 15 Jul 2015 21:09:39 +0000"  >&lt;p&gt;Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/15610&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/15610&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6854&quot; title=&quot;Setting page_writeback on a non-dirty page&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6854&quot;&gt;&lt;del&gt;LU-6854&lt;/del&gt;&lt;/a&gt; llite: Do not set writeback for sync write pages&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 7c44f46769de80862d252d90af3a56852b0aef83&lt;/p&gt;</comment>
                            <comment id="173446" author="jpiles" created="Mon, 14 Nov 2016 13:28:26 +0000"  >&lt;p&gt;We are hitting this bug using a Ubuntu 16.04 kernel (4.4.0-34) as reported &lt;a href=&quot;https://patchwork.kernel.org/patch/6907341/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;here&lt;/a&gt;, or at least the stack trace is pretty much the same:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[Mon Nov 14 11:44:36 2016] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
[Mon Nov 14 11:44:36 2016] IP: [&amp;lt;ffffffff8141995f&amp;gt;] __percpu_counter_add+0xf/0x70
[Mon Nov 14 11:44:36 2016] PGD 3fb7428067 PUD 3de0fb7067 PMD 0 
[Mon Nov 14 11:44:36 2016] Oops: 0000 [#13] SMP 
[Mon Nov 14 11:44:36 2016] Modules linked in: nfsv3 nfs_acl nfs lockd grace fscache nvidia_uvm(POE) ipmi_devintf 8021q garp mrp osc(OE) mgc(OE) lustre(OE) lmv(OE) fld(OE) mdc(OE) fid(OE) lov(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) configfs ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) mlx5_core(OE) mlx4_ib(OE) ib_sa(OE) ib_mad(OE) ib_core(OE) ib_addr(OE) ib_netlink(OE) mlx4_en(OE) mlx4_core(OE) mlx_compat(OE) nvidia(POE) intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul
[Mon Nov 14 11:44:36 2016]  input_leds joydev cryptd ipmi_ssif sb_edac mei_me mei edac_core lpc_ich ioatdma shpchp mac_hid acpi_power_meter ipmi_si 8250_fintek ipmi_msghandler acpi_pad binfmt_misc knem(OE) parport_pc ppdev lp sunrpc parport autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear ast ttm ixgbe drm_kms_helper syscopyarea sysfillrect vxlan igb sysimgblt ip6_udp_tunnel dca udp_tunnel fb_sys_fops raid1 hid_generic ptp usbhid ahci pps_core mdio drm hid libahci i2c_algo_bit wmi fjes
[Mon Nov 14 11:44:36 2016] CPU: 19 PID: 30139 Comm: python Tainted: P      D    OE   4.4.0-34-generic #53-Ubuntu
[Mon Nov 14 11:44:36 2016] Hardware name: Supermicro SYS-2028GR-TRH/X10DRG-H, BIOS 1.0c 05/20/2015
[Mon Nov 14 11:44:36 2016] task: ffff883fed1ca940 ti: ffff883b2b758000 task.ti: ffff883b2b758000
[Mon Nov 14 11:44:36 2016] RIP: 0010:[&amp;lt;ffffffff8141995f&amp;gt;]  [&amp;lt;ffffffff8141995f&amp;gt;] __percpu_counter_add+0xf/0x70
[Mon Nov 14 11:44:36 2016] RSP: 0018:ffff883b2b75b938  EFLAGS: 00010006
[Mon Nov 14 11:44:36 2016] RAX: 0000000000000005 RBX: ffffea00f51f47c0 RCX: 000000000000001b
[Mon Nov 14 11:44:36 2016] RDX: 0000000000000030 RSI: 0000000000000001 RDI: 0000000000000088
[Mon Nov 14 11:44:36 2016] RBP: ffff883b2b75b950 R08: ffff883cc575e600 R09: 0000000000000000
[Mon Nov 14 11:44:36 2016] R10: 0000000000000000 R11: ffff883fd8b907d0 R12: 0000000000000088
[Mon Nov 14 11:44:36 2016] R13: 0000000000000001 R14: ffff8839fcbf0090 R15: ffff8839fcbf0210
[Mon Nov 14 11:44:36 2016] FS:  00002b5017e18a40(0000) GS:ffff883fff240000(0000) knlGS:0000000000000000
[Mon Nov 14 11:44:36 2016] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Mon Nov 14 11:44:36 2016] CR2: 00000000000000a8 CR3: 0000003fd5ce0000 CR4: 00000000001406e0
[Mon Nov 14 11:44:36 2016] Stack:
[Mon Nov 14 11:44:36 2016]  ffffea00f51f47c0 ffff8839fcbf01f8 ffff881f69099000 ffff883b2b75b9a0
[Mon Nov 14 11:44:36 2016]  ffffffff8119b4d0 0000000000000287 0000000000000000 ffff883ff23fd118
[Mon Nov 14 11:44:36 2016]  ffff883cc575e650 0000000000000068 ffff883fd844e128 ffff883fd2768d20
[Mon Nov 14 11:44:36 2016] Call Trace:
[Mon Nov 14 11:44:36 2016]  [&amp;lt;ffffffff8119b4d0&amp;gt;] __test_set_page_writeback+0x190/0x1d0
[Mon Nov 14 11:44:36 2016]  [&amp;lt;ffffffffc0ba3d82&amp;gt;] vvp_page_prep_write+0x22/0x90 [lustre]
[Mon Nov 14 11:44:36 2016]  [&amp;lt;ffffffffc0847e8a&amp;gt;] cl_page_invoke+0x5a/0x160 [obdclass]
[Mon Nov 14 11:44:36 2016]  [&amp;lt;ffffffffc0848175&amp;gt;] cl_page_prep+0x35/0x1e0 [obdclass]
[Mon Nov 14 11:44:36 2016]  [&amp;lt;ffffffffc0c24f98&amp;gt;] osc_io_submit+0x138/0x5c0 [osc]
[Mon Nov 14 11:44:36 2016]  [&amp;lt;ffffffffc084ca80&amp;gt;] cl_io_submit_rw+0x60/0x150 [obdclass]
[Mon Nov 14 11:44:36 2016]  [&amp;lt;ffffffffc0aca6ae&amp;gt;] lov_io_submit+0x29e/0x480 [lov]
[Mon Nov 14 11:44:36 2016]  [&amp;lt;ffffffffc084ca80&amp;gt;] cl_io_submit_rw+0x60/0x150 [obdclass]
[Mon Nov 14 11:44:36 2016]  [&amp;lt;ffffffffc084eb68&amp;gt;] cl_io_submit_sync+0xb8/0x1a0 [obdclass]
[Mon Nov 14 11:44:36 2016]  [&amp;lt;ffffffffc0ba80d3&amp;gt;] vvp_io_write_commit+0x5a3/0x900 [lustre]
[Mon Nov 14 11:44:36 2016]  [&amp;lt;ffffffffc0ba890b&amp;gt;] vvp_io_write_start+0x4db/0x610 [lustre]
[Mon Nov 14 11:44:36 2016]  [&amp;lt;ffffffffc084ac76&amp;gt;] ? cl_lock_request+0x66/0x1d0 [obdclass]
[Mon Nov 14 11:44:36 2016]  [&amp;lt;ffffffffc084c82c&amp;gt;] cl_io_start+0x5c/0x110 [obdclass]
[Mon Nov 14 11:44:36 2016]  [&amp;lt;ffffffffc084e8c1&amp;gt;] cl_io_loop+0xa1/0x180 [obdclass]
[Mon Nov 14 11:44:36 2016]  [&amp;lt;ffffffffc0b57418&amp;gt;] ll_file_io_generic+0x768/0xad0 [lustre]
[Mon Nov 14 11:44:36 2016]  [&amp;lt;ffffffffc0b579cd&amp;gt;] ll_file_write_iter+0x7d/0xe0 [lustre]
[Mon Nov 14 11:44:36 2016]  [&amp;lt;ffffffff8120c97b&amp;gt;] new_sync_write+0x9b/0xe0
[Mon Nov 14 11:44:36 2016]  [&amp;lt;ffffffff8120c9e6&amp;gt;] __vfs_write+0x26/0x40
[Mon Nov 14 11:44:36 2016]  [&amp;lt;ffffffff8120d369&amp;gt;] vfs_write+0xa9/0x1a0
[Mon Nov 14 11:44:36 2016]  [&amp;lt;ffffffff8120e025&amp;gt;] SyS_write+0x55/0xc0
[Mon Nov 14 11:44:36 2016]  [&amp;lt;ffffffff8182def2&amp;gt;] entry_SYSCALL_64_fastpath+0x16/0x71
[Mon Nov 14 11:44:36 2016] Code: 40 41 00 48 89 d8 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 55 49 89 f5 41 54 49 89 fc 53 &amp;lt;48&amp;gt; 8b 47 20 48 63 ca 65 8b 18 48 63 db 48 01 f3 48 39 cb 7d 0a 
[Mon Nov 14 11:44:36 2016] RIP  [&amp;lt;ffffffff8141995f&amp;gt;] __percpu_counter_add+0xf/0x70
[Mon Nov 14 11:44:36 2016]  RSP &amp;lt;ffff883b2b75b938&amp;gt;
[Mon Nov 14 11:44:36 2016] CR2: 00000000000000a8
[Mon Nov 14 11:44:36 2016] ---[ end trace f8cd37dfb7aa1008 ]---
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We have found that the proposed patch has not been picked by the current master. Is it safe to apply?&lt;/p&gt;</comment>
                            <comment id="173859" author="ake_s" created="Wed, 16 Nov 2016 16:31:06 +0000"  >&lt;p&gt;I&apos;ve been running with the proposed patch on Ubuntu 16.04 based on the 2.8.56 tag plus the patch for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6808&quot; title=&quot;Interop 2.5.3&amp;lt;-&amp;gt;master sanity test_224c: Bulk IO write error&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6808&quot;&gt;&lt;del&gt;LU-6808&lt;/del&gt;&lt;/a&gt; for while on our cluster and haven yet seen that specific problem reappear.&lt;/p&gt;

&lt;p&gt;It so far looks safe enough to me, but since i&apos;m having other issues i&apos;m not 100% sure.&lt;/p&gt;

&lt;p&gt;Joan, which version of lustre client are you running?&lt;/p&gt;</comment>
                            <comment id="174032" author="jpiles" created="Thu, 17 Nov 2016 08:33:02 +0000"  >&lt;p&gt;So far we were using also 2.8.56, but without this patch. My question was because I saw the proposed patch was available since one year ago, yet had not been picked by master, and the bug is still open.&lt;/p&gt;

&lt;p&gt;We have now tried applying it on the master branch (a bit after 2.8.60), and it&apos;s apparently working well, so we&apos;ll deploy it.&lt;/p&gt;

&lt;p&gt;We were also having other issues, but I think some of them could be secondary effects from this problem... also we were likely hitting &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7927&quot; title=&quot;Deadlock between ll_setattr() and ll_file_write()-&amp;gt;ll_fsync()&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7927&quot;&gt;&lt;del&gt;LU-7927&lt;/del&gt;&lt;/a&gt; and / or &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7981&quot; title=&quot;double read of lli_trunc_sem in ll_page_mkwrite and vvp_io_fault_start leads to deadlock&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7981&quot;&gt;&lt;del&gt;LU-7981&lt;/del&gt;&lt;/a&gt;, whose fixes are also included I think.&lt;/p&gt;</comment>
                            <comment id="174065" author="jay" created="Thu, 17 Nov 2016 16:25:15 +0000"  >&lt;p&gt;We&apos;re in the process of landing this patch.&lt;/p&gt;</comment>
                            <comment id="174074" author="jpiles" created="Thu, 17 Nov 2016 16:38:38 +0000"  >&lt;p&gt;Great! Thank you very much!&lt;/p&gt;</comment>
                            <comment id="180009" author="gerrit" created="Mon, 9 Jan 2017 05:43:34 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/15610/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/15610/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6854&quot; title=&quot;Setting page_writeback on a non-dirty page&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6854&quot;&gt;&lt;del&gt;LU-6854&lt;/del&gt;&lt;/a&gt; llite: Do not set writeback for sync write pages&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 610ac5c64d92f95924da839d3a2da28e9909956a&lt;/p&gt;</comment>
                            <comment id="180229" author="pjones" created="Tue, 10 Jan 2017 13:44:27 +0000"  >&lt;p&gt;Landed for 2.10&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="42649">LU-8971</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                                        </outwardlinks>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzxi93:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>