<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:02:09 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-13541] Application hang and kernel &quot;divide error&quot; in ll_readpage</title>
                <link>https://jira.whamcloud.com/browse/LU-13541</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;While launching Slurm jobs on our cluster, some of jobs hung quite early, in a &quot;dd&quot; command where a dataset is copied to a local ext4 filesystem:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;ll_file_read_iter+0xa1/0x290 [lustre]
new_sync_read+0x122/0x1b0
__vfs_read+0x29/0x40
vfs_read+0x8e/0x130
ksys_read+0xa7/0xe0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;It happened on multiple nodes on the cluster. But on some nodes, it works fine. It seems to be correlated to a kernel &quot;divide error&quot; (division by zero?) in the kernel log of those nodes:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[ 2171.682001] divide error: 0000 1 SMP NOPTI
[ 2171.686888] CPU: 133 PID: 35015 Comm: python Tainted: P OE 5.3.0-24-generic #26~18.04.2-Ubuntu
[ 2171.706858] RIP: 0010:ll_readpage+0x25d/0x730 [lustre]
[ 2171.802801] Call Trace:
[ 2171.805539] filemap_fault+0x9be/0x9f0
[ 2171.830810] ll_fault+0xdb/0x710 [lustre]
[ 2171.839869] __do_fault+0x57/0x117
[ 2171.843668] __handle_mm_fault+0xda0/0x1230
[ 2171.848344] handle_mm_fault+0xcb/0x210
[ 2171.852634] __do_page_fault+0x2a1/0x4d0
[ 2171.857018] do_page_fault+0x2c/0xe0
[ 2171.861014] page_fault+0x34/0x40
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;It seems that some of these errors were caused by these jobs, according to the time. But some of them were not (probably by another unrelated job); but the bad state lingers and block anyone wanting to access this particular file. Other files seem fine, but this file is now poisoned.&lt;/p&gt;</description>
                <environment></environment>
        <key id="59123">LU-13541</key>
            <summary>Application hang and kernel &quot;divide error&quot; in ll_readpage</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="wshilong">Wang Shilong</assignee>
                                    <reporter username="adilger">Andreas Dilger</reporter>
                        <labels>
                    </labels>
                <created>Fri, 8 May 2020 21:48:43 +0000</created>
                <updated>Fri, 21 Aug 2020 20:40:14 +0000</updated>
                            <resolved>Sat, 16 May 2020 13:55:45 +0000</resolved>
                                                    <fixVersion>Lustre 2.14.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="269727" author="gerrit" created="Fri, 8 May 2020 22:01:54 +0000"  >&lt;p&gt;Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/38545&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/38545&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13541&quot; title=&quot;Application hang and kernel &amp;quot;divide error&amp;quot; in ll_readpage&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13541&quot;&gt;&lt;del&gt;LU-13541&lt;/del&gt;&lt;/a&gt; llite: fix possible divide zero in ll_use_fast_io()&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: bb54a8ac674f2ed4de3cbc1f5183bab2b43d5dd6&lt;/p&gt;</comment>
                            <comment id="269729" author="adilger" created="Fri, 8 May 2020 22:16:32 +0000"  >&lt;blockquote&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;(gdb) x/i ll_readpage+0x25d
0x38a2d &amp;lt;ll_readpage+605&amp;gt;: idiv %rcx

(gdb) disas /s ll_readpage

1524 skip_pages = (ras-&amp;gt;ras_stride_length +
0x0000000000038a20 &amp;lt;+592&amp;gt;: mov 0x50(%r10),%rax
0x0000000000038a24 &amp;lt;+596&amp;gt;: add %rcx,%rax

1525 ras-&amp;gt;ras_stride_bytes - 1) / ras-&amp;gt;ras_stride_bytes;
0x0000000000038a27 &amp;lt;+599&amp;gt;: sub $0x1,%rax

1524 skip_pages = (ras-&amp;gt;ras_stride_length +
0x0000000000038a2b &amp;lt;+603&amp;gt;: cqto
0x0000000000038a2d &amp;lt;+605&amp;gt;: idiv %rcx
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So, I guess &quot;ll_use_fast_io&quot; was inlined and ras-&amp;gt;ras_stride_bytes == 0?&lt;/p&gt;&lt;/blockquote&gt;</comment>
                            <comment id="270405" author="gerrit" created="Sat, 16 May 2020 06:23:08 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/38545/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/38545/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13541&quot; title=&quot;Application hang and kernel &amp;quot;divide error&amp;quot; in ll_readpage&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13541&quot;&gt;&lt;del&gt;LU-13541&lt;/del&gt;&lt;/a&gt; llite: fix possible divide zero in ll_use_fast_io()&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 7cd0afe583211a11cfe3c1041e5b982e65769f37&lt;/p&gt;</comment>
                            <comment id="270413" author="pjones" created="Sat, 16 May 2020 13:55:45 +0000"  >&lt;p&gt;Landed for 2.14&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                                        </outwardlinks>
                                                                <inwardlinks description="is duplicated by">
                                                        </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="56613">LU-12644</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00zzj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>