<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:15:35 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-15116] crash when writing files in parallel on LTS lustre version</title>
                <link>https://jira.whamcloud.com/browse/LU-15116</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We are attempting to use Lustre as a parallel file system for StarCCM+ which can write large files in parallel.&#160; We have a combined MGS/MDS server and a single OSS.&#160; All file systems are zfs backed.&#160;&lt;/p&gt;

&lt;p&gt;Based on the compatibility matrix for the LTS release &lt;a href=&quot;https://wiki.whamcloud.com/display/PUB/Lustre+Support+Matrix&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://wiki.whamcloud.com/display/PUB/Lustre+Support+Matrix&lt;/a&gt;&#160; I&apos;m using Lustre 2.12.7 with Centos 7.9 on the servers and Rocky LInux 8.4 on the clients.&#160; So far I have only installed lustre from the whamcloud repositories.&#160; I have tried both the in kernel driver version and the MOFED version for Centos 7.9.&lt;/p&gt;

&lt;p&gt;Whenever we attempt to write a file in parallel (from Star) we get the following errors:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
[60662.673922] LustreError: 3073:0:(pack_generic.c:605:__lustre_unpack_msg()) message length 0 too small &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; magic/version check
[60662.674340] LustreError: 3073:0:(pack_generic.c:605:__lustre_unpack_msg()) Skipped 348 previous similar messages
[60662.674654] LustreError: 3073:0:(sec.c:2217:sptlrpc_svc_unwrap_request()) error unpacking request from 12345-172.24.33.1@o2ib1 x1713655326257920
[60662.675251] LustreError: 3073:0:(sec.c:2217:sptlrpc_svc_unwrap_request()) Skipped 348 previous similar messages
[60721.038766] LustreError: 3023:0:(events.c:310:request_in_callback()) event type 2, status -103, service ost_io
[60779.408713] LustreError: 3023:0:(events.c:310:request_in_callback()) event type 2, status -103, service ost_io
[60837.774776] LustreError: 3023:0:(events.c:310:request_in_callback()) event type 2, status -103, service ost_io
[60896.137635] LustreError: 3027:0:(events.c:310:request_in_callback()) event type 2, status -5, service ost_io

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Starccm+ also supports a serial (non parallel) write mode which works some of the time but fails intermittently with the same error.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;All of the software related to Lustre was installed from the whamcloud repos (&lt;a href=&quot;https://downloads.whamcloud.com/public/lustre/)&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://downloads.whamcloud.com/public/lustre/)&lt;/a&gt; including zfs.&#160; One thing to note is I had a lot of trouble getting the lustre zfs osd kernel module to install.&#160; When trying to install the kmod-lustre-osd-zfs package I get a long list of unresolved symbols and yum does not allow me to install the package.&#160; I also tried building it with dkms using the lustre-zfs-dkms package but it seems to be broken and does not build properly against the zfs-dkms and spl-dkms packages.&#160; I spent a fair bit of time going through the configure files it generated and it appeared to be looking in the wrong directories for the zfs and spl source.&#160; In the end I forced the kmod-lustre-osd-zfs package to install using&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
rpm -Uvh --nodeps $(repoquery --location kmod-lustre-osd-zfs)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;I do not receive any errors when I load the kernel modules including the zfs modules and there are no errors in dmesg that obviously connect to any missing symbols.&#160; Therefore, I suspect the yum issues are actually not correct, but I mention it here in case I just missed something.&lt;/p&gt;</description>
                <environment>Centos 7.9 Server with Rocky LInux 8.4 Clients. Infiniband network using HP branded Connect-X 3 Pro Mellanox cards.</environment>
        <key id="66690">LU-15116</key>
            <summary>crash when writing files in parallel on LTS lustre version</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="godbolt">Bryan Godbolt</reporter>
                        <labels>
                    </labels>
                <created>Fri, 15 Oct 2021 18:19:51 +0000</created>
                <updated>Mon, 18 Oct 2021 22:19:38 +0000</updated>
                            <resolved>Mon, 18 Oct 2021 20:25:40 +0000</resolved>
                                    <version>Lustre 2.12.7</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                            <comments>
                            <comment id="315910" author="adilger" created="Mon, 18 Oct 2021 20:25:25 +0000"  >&lt;p&gt;The error reported looks similar to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14733&quot; title=&quot;brw_bulk_ready() BRW bulk READ failed for RPC from 12345-192.168.128.126@o2ib18: -103&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14733&quot;&gt;&lt;del&gt;LU-14733&lt;/del&gt;&lt;/a&gt; on RHEL8.4.&lt;/p&gt;

&lt;p&gt;The patches from &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14773&quot; title=&quot;reduce run_one() overhead&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14773&quot;&gt;LU-14773&lt;/a&gt; are already landed on b2_12 for the upcoming 2.12.8 release, but you could test out the b2_12 branch, for which the most recent build is at &lt;a href=&quot;https://build.whamcloud.com/job/lustre-b2_12&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://build.whamcloud.com/job/lustre-b2_12&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="315923" author="JIRAUSER17100" created="Mon, 18 Oct 2021 22:19:38 +0000"  >&lt;p&gt;Hi Andreas,&lt;/p&gt;

&lt;p&gt;Thanks very much for the prompt reply.&#160; We have installed the version you mentioned and so far it has resolved the issue!&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="64515">LU-14733</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="41005" name="kmod-lustre-osd-zfs unsatisfied dependencies.log" size="30362" author="godbolt" created="Fri, 15 Oct 2021 18:17:06 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i027hj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>