<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:28:44 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-16638] LustreError: 18531:0:(osc_object.c:410:osc_req_attr_set()) LBUG</title>
                <link>https://jira.whamcloud.com/browse/LU-16638</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We&apos;re seeing a regular crash on one of our clients that reexports a lustre volume via nfs to other clients. For a while I thought it was related to atime updates as we were seeing that in the stack trace, but it&apos;s still crashing in the same spot in osc_object  after disabling atime.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[238496.543455] LustreError: 18531:0:(osc_object.c:396:osc_req_attr_set()) page@00000000124db7f5[4 000000001cc24e6a 4 1 0000000000000000]
[238496.543488] LustreError: 18531:0:(osc_object.c:396:osc_req_attr_set()) vvp-page@00000000644ae261(0:0) vm@0000000013d555b5 17ffffc0002001 4:0 ffff8b33354a0500 540 lru
[238496.543514] LustreError: 18531:0:(osc_object.c:396:osc_req_attr_set()) lov-page@00000000fac88b5b
[238496.543532] LustreError: 18531:0:(osc_object.c:396:osc_req_attr_set()) osc-page@00000000c5423838 540: 1&amp;lt; 0x845fed 1 + + &amp;gt; 2&amp;lt; 2211840 0 4096 0x7 0x9 | 0000000000000000 0000000032e9a87e 00000000b31fd886 &amp;gt; 3&amp;lt; 1 0 0 &amp;gt; 4&amp;lt; 0 0 8 156499967 - | - - - + &amp;gt; 5&amp;lt; - - - + | 0 - | 0 - -&amp;gt;
[238496.543569] LustreError: 18531:0:(osc_object.c:396:osc_req_attr_set()) end page@00000000124db7f5
[238496.543585] LustreError: 18531:0:(osc_object.c:396:osc_req_attr_set()) uncovered page!
[238496.543598] LustreError: 18531:0:(ldlm_resource.c:1783:ldlm_resource_dump()) --- Resource: [0xd3409f:0x0:0x0].0x0 (000000004660d5d9) refcount = 3
[238496.543618] LustreError: 18531:0:(ldlm_resource.c:1787:ldlm_resource_dump()) Granted locks (in reverse order):
[238496.543635] LustreError: 18531:0:(ldlm_resource.c:1790:ldlm_resource_dump()) ### ### ns: work-OST0003-osc-ffff8b3d1067e800 lock: 00000000552d990c/0x2904edfb430539b2 lrc: 3/1,0 mode: PR/PR res: [0xd3409f:0x0:0x0].0x0 rrc: 4 type: EXT [0-&amp;gt;2211839] (req 2146304-&amp;gt;2211839) gid 0 flags: 0x800420400020000 nid: local remote: 0x27d356efda730f51 expref: -99 pid: 18893 timeout: 0 lvb_type: 1
[238496.543687] LustreError: 18531:0:(ldlm_resource.c:1802:ldlm_resource_dump()) Waiting locks:
[238496.543701] LustreError: 18531:0:(ldlm_resource.c:1804:ldlm_resource_dump()) ### ### ns: work-OST0003-osc-ffff8b3d1067e800 lock: 0000000049878f3e/0x2904edfb430539b9 lrc: 4/1,0 mode: --/PR res: [0xd3409f:0x0:0x0].0x0 rrc: 4 type: EXT [2211840-&amp;gt;2277375] (req 2211840-&amp;gt;2277375) gid 0 flags: 0x20000 nid: local remote: 0x27d356efda730f5f expref: -99 pid: 18894 timeout: 0 lvb_type: 1
[238496.543746] Pid: 18531, comm: ptlrpcd_03_06 5.14.0-70.36.1.el9_0.x86_64 #1 SMP PREEMPT Thu Nov 24 11:28:21 EST 2022
[238496.543762] Call Trace TBD:
[238496.543767] LustreError: 18531:0:(osc_object.c:410:osc_req_attr_set()) LBUG
[238496.543779] Pid: 18531, comm: ptlrpcd_03_06 5.14.0-70.36.1.el9_0.x86_64 #1 SMP PREEMPT Thu Nov 24 11:28:21 EST 2022
[238496.543794] Call Trace TBD:
[238496.543799] Kernel panic - not syncing: LBUG
[238496.543807] CPU: 46 PID: 18531 Comm: ptlrpcd_03_06 Kdump: loaded Tainted: P           OE    --------- ---  5.14.0-70.36.1.el9_0.x86_64 #1
[238496.543827] Hardware name: Supermicro AS -1114S-WN10RT/H12SSW-NTR, BIOS 2.3 12/03/2021
[238496.543840] Call Trace:
[238496.543848]  dump_stack_lvl+0x34/0x48
[238496.543860]  panic+0x102/0x2d4
[238496.543869]  lbug_with_loc.cold+0x18/0x18 [libcfs]
[238496.543887]  osc_req_attr_set+0x32a/0x540 [osc]
[238496.543905]  cl_req_attr_set+0x5e/0x160 [obdclass]
[238496.543939]  osc_build_rpc+0x4a7/0x11f0 [osc]
[238496.544421]  osc_send_read_rpc+0x6de/0x810 [osc]
[238496.545787]  osc_check_rpcs+0x335/0x3c0 [osc]
[238496.546230]  osc_io_unplug0+0x75/0x90 [osc]
[238496.546662]  brw_queue_work+0x2f/0xd0 [osc]
[238496.547086]  work_interpreter+0x32/0x170 [ptlrpc]
[238496.547527]  ptlrpc_check_set+0x415/0x1ea0 [ptlrpc]
[238496.547966]  ptlrpcd_check+0x3d0/0x5c0 [ptlrpc]
[238496.548787]  ptlrpcd+0x20d/0x4a0 [ptlrpc]
[238496.550000]  kthread+0x149/0x170
[238496.550732]  ret_from_fork+0x22/0x30
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This crash is relatively new for us, we started to notice it after we switched from o2ib to tcp to address stability issues in our environment that we believe (we&apos;re still investigating) are related to rdma on rhel9 with omnipath.&lt;/p&gt;

&lt;p&gt;We have a couple vmcores from the crash kernel available if desired, however I&apos;d rather not attach them here.&lt;/p&gt;</description>
                <environment>RHEL 9.0 client running 2.15.2 with tcp networking.</environment>
        <key id="75024">LU-16638</key>
            <summary>LustreError: 18531:0:(osc_object.c:410:osc_req_attr_set()) LBUG</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="snehring">Shane Nehring</reporter>
                        <labels>
                    </labels>
                <created>Mon, 13 Mar 2023 15:45:35 +0000</created>
                <updated>Wed, 28 Jun 2023 17:58:56 +0000</updated>
                            <resolved>Mon, 13 Mar 2023 17:20:36 +0000</resolved>
                                    <version>Lustre 2.15.2</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                            <comments>
                            <comment id="365748" author="snehring" created="Mon, 13 Mar 2023 16:57:24 +0000"  >&lt;p&gt;Sorry, don&apos;t know how I missed that this is a duplicate. I&apos;m assuming this&apos;ll end up in b2_15 in time for 2.15.3?&lt;/p&gt;</comment>
                            <comment id="365756" author="pjones" created="Mon, 13 Mar 2023 17:20:36 +0000"  >&lt;p&gt;Yes - likely&lt;/p&gt;</comment>
                            <comment id="365764" author="adilger" created="Mon, 13 Mar 2023 17:44:12 +0000"  >&lt;p&gt;This looks like a duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16412&quot; title=&quot;check truncated page in -&amp;gt;read page()&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16412&quot;&gt;&lt;del&gt;LU-16412&lt;/del&gt;&lt;/a&gt;, which exposed a bug in the kernel. There were patches landed to the mainline kernel recently and backported to stable kernels, but likely still need to be added to vendor kernels.  You could potentially speed that process up by filing a ticket with your OS vendor and referencing the lore.kernel.org links in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16412&quot; title=&quot;check truncated page in -&amp;gt;read page()&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16412&quot;&gt;&lt;del&gt;LU-16412&lt;/del&gt;&lt;/a&gt; to request that patch be added to their kernel (it also improves performance in some workloads, so it is a win-win).&lt;/p&gt;

&lt;p&gt;A workaround has also been added to Lustre:&lt;br/&gt;
&lt;a href=&quot;https://review.whamcloud.com/50277&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/50277&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;and will likely appear in 2.15.3.&lt;/p&gt;</comment>
                            <comment id="365768" author="adilger" created="Mon, 13 Mar 2023 17:55:58 +0000"  >&lt;p&gt;Shane, I was on a meeting while writing my last comment, so I didn&apos;t see your previous exchange with Peter until after I submitted my comment. &lt;/p&gt;

&lt;p&gt;You likely missed that it was a duplicate because I just recently edited &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16412&quot; title=&quot;check truncated page in -&amp;gt;read page()&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16412&quot;&gt;&lt;del&gt;LU-16412&lt;/del&gt;&lt;/a&gt; to include enough information to find it, which was previously only in a customer ticket. Luckily you don&apos;t have to wait for the long time it took to debug this issue before we found it was a kernel bug. &lt;/p&gt;</comment>
                            <comment id="365772" author="snehring" created="Mon, 13 Mar 2023 18:02:21 +0000"  >&lt;p&gt;Thank you Andreas. I&apos;ve reached out to Red Hat to ask about incorporating that patch to el9.&lt;/p&gt;</comment>
                            <comment id="376641" author="snehring" created="Tue, 27 Jun 2023 17:05:04 +0000"  >&lt;p&gt;the kernel patch that corrects this has been incorporated into rhel 9.2 in RHSA-2023:3723&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                        <issuelink>
            <issuekey id="73678">LU-16412</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i03g5j:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>