<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:30:45 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-9952] soft lockup in osd_inode_iteration() for lustre 2.8.1</title>
                <link>https://jira.whamcloud.com/browse/LU-9952</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;One of production file systems running lustre 2.8.1 experienced a soft lock up very similar to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9488&quot; title=&quot;soft lockup in osd_inode_iteration()&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9488&quot;&gt;&lt;del&gt;LU-9488&lt;/del&gt;&lt;/a&gt;. I attempted to back port the patch but way to many changes have happened between 2.8.1 and lustre 2.10.0. Unsure if I would get the port right. I have attached the back trace.&lt;/p&gt;</description>
                <environment>Lustre ldiskfs server back end running version 2.8.1 with a few additional patches. The OS is RHEL6.9</environment>
        <key id="48188">LU-9952</key>
            <summary>soft lockup in osd_inode_iteration() for lustre 2.8.1</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="yong.fan">nasf</assignee>
                                    <reporter username="simmonsja">James A Simmons</reporter>
                        <labels>
                    </labels>
                <created>Wed, 6 Sep 2017 18:56:52 +0000</created>
                <updated>Tue, 5 Jun 2018 16:42:57 +0000</updated>
                            <resolved>Tue, 5 Jun 2018 16:42:57 +0000</resolved>
                                    <version>Lustre 2.8.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="207786" author="pjones" created="Thu, 7 Sep 2017 17:21:44 +0000"  >&lt;p&gt;Fan Yong&lt;/p&gt;

&lt;p&gt;Can you please advise on this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="207848" author="yong.fan" created="Fri, 8 Sep 2017 04:31:35 +0000"  >&lt;p&gt;The known patches on master that are related with the OI scrub soft lockup are back ported as following:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://review.whamcloud.com/28903&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/28903&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://review.whamcloud.com/28904&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/28904&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://review.whamcloud.com/28905&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/28905&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://review.whamcloud.com/28906&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/28906&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="208633" author="simmonsja" created="Mon, 18 Sep 2017 15:49:03 +0000"  >&lt;p&gt;We are in the process of testing these patches. I attempted to recreate the problem with &quot;lctl set_param fail_loc=0x1504&quot; but that didn&apos;t work. What would you recommend to recreate this problem on a 2.8 system? Note we removed the offending files to make our production file system usable again.&lt;/p&gt;</comment>
                            <comment id="208713" author="yong.fan" created="Tue, 19 Sep 2017 01:17:01 +0000"  >&lt;p&gt;I think that we need some new fail_loc to simulate osd_inode_iteration() trouble. For example, inject the new failure stub in the osd_iit_next() to simulate kinds of bitmap layout cases.&lt;/p&gt;</comment>
                            <comment id="208761" author="simmonsja" created="Tue, 19 Sep 2017 16:47:07 +0000"  >&lt;p&gt;Could you create a test condition before the 30th of September? &lt;/p&gt;</comment>
                            <comment id="209024" author="yong.fan" created="Thu, 21 Sep 2017 10:11:58 +0000"  >&lt;p&gt;Let check whether this one &lt;a href=&quot;https://review.whamcloud.com/#/c/29133/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/#/c/29133/&lt;/a&gt; works or not.&lt;/p&gt;</comment>
                            <comment id="209645" author="simmonsja" created="Tue, 26 Sep 2017 19:00:30 +0000"  >&lt;p&gt;We wouldn&apos;t be running the test framework on our production system. It looks like I just need to create a bunch of files on the file system. &lt;/p&gt;

&lt;p&gt;lctl set_param -n osd*.&lt;b&gt;MDT&lt;/b&gt;.force_sync=1&lt;br/&gt;
lctl set_param fail_val=1 fail_loc=0x190&lt;br/&gt;
lctl lfsck_start -M lustre-MDT0000&lt;br/&gt;
lctl set_param fail_val=0 fail_loc=0x198&lt;/p&gt;

&lt;p&gt;While you check status:&lt;br/&gt;
lctl get_param -n osd-ldiskfs.lustre-MDT000.oi_scrub | grep status&lt;/p&gt;

&lt;p&gt;Does this look right? What values do I use to reset it back to normal working conditions.&lt;/p&gt;</comment>
                            <comment id="209733" author="simmonsja" created="Wed, 27 Sep 2017 16:10:19 +0000"  >&lt;p&gt;We have been trying your reproducer by itself and we see it get stuck but no soft lock ups. Why is no soft lockups being reported?&lt;/p&gt;</comment>
                            <comment id="209891" author="yong.fan" created="Fri, 29 Sep 2017 03:13:41 +0000"  >&lt;p&gt;&quot;fail_loc=0x190&quot; will slow down the OI scrub scanning, then we can have time to inject other failures before the OI scrub complete.&lt;br/&gt;
&quot;fail_loc=0x198&quot; will make the OI scrub iteration repeatedly scan the same bits for inode table. If without our former patches (28903/4/5/6), then the OI scrub will fall into soft lockup. But because we have such patches, then OI scrub can detect such dead repeat then move forward. So no soft lockup is the expected behavior.&lt;/p&gt;</comment>
                            <comment id="209968" author="simmonsja" created="Fri, 29 Sep 2017 16:33:41 +0000"  >&lt;p&gt;We have been trying patch 28903 by itself and have not seen the soft lockup &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/sad.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt; BTW do we need to  inject another failure?&lt;/p&gt;</comment>
                            <comment id="210597" author="yong.fan" created="Mon, 9 Oct 2017 10:48:29 +0000"  >&lt;blockquote&gt;
&lt;p&gt;We have been trying patch 28903 by itself and have not seen the soft lockup  BTW do we need to inject another failure?&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;You mean you can reproduce the soft lockup every time if without any of above patch, right? And the soft lockup will be disappear if only 28903 applied, right?&lt;/p&gt;</comment>
                            <comment id="214584" author="yong.fan" created="Fri, 24 Nov 2017 13:43:06 +0000"  >&lt;p&gt;Any further feedback?&lt;/p&gt;

&lt;p&gt;Thanks!&lt;/p&gt;</comment>
                            <comment id="229100" author="yong.fan" created="Tue, 5 Jun 2018 16:22:21 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=simmonsja&quot; class=&quot;user-hover&quot; rel=&quot;simmonsja&quot;&gt;simmonsja&lt;/a&gt;,&lt;br/&gt;
Any further feedback for this ticket?&lt;/p&gt;</comment>
                            <comment id="229106" author="simmonsja" created="Tue, 5 Jun 2018 16:38:33 +0000"  >&lt;p&gt;You can close this. With various patches applied we haven&apos;t seen problems in some time. Thanks.&lt;/p&gt;</comment>
                            <comment id="229109" author="yong.fan" created="Tue, 5 Jun 2018 16:42:57 +0000"  >&lt;p&gt;The issue has bee resolved via back porting the following patches:&lt;br/&gt;
&lt;a href=&quot;https://review.whamcloud.com/28903&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/28903&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://review.whamcloud.com/28904&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/28904&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://review.whamcloud.com/28905&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/28905&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://review.whamcloud.com/28906&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/28906&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="43212">LU-9040</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="46015">LU-9488</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="28209" name="vmcore-dmesg-f1.txt" size="524288" author="simmonsja" created="Wed, 6 Sep 2017 18:56:50 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzjo7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10021"><![CDATA[2]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>