<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:05:50 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-13980] Kernel panic on OST after removing files under &apos;/O&apos; folder</title>
                <link>https://jira.whamcloud.com/browse/LU-13980</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;I removed some data stripes under &apos;/O&apos; folder on OST and started LFSCK. Then OST was forced to reboot because of kernel panic. By looking into vmcore, I found the specific error line is:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[ 1057.367833] Lustre: lustre-OST0000: new disk, initializing
[ 1057.367877] Lustre: srv-lustre-OST0000: No data found on store. Initialize space
[ 1057.417121] Lustre: lustre-OST0000: Imperative Recovery not enabled, recovery window 300-900
[ 1062.018722] Lustre: lustre-OST0000: Connection restored to lustre-MDT0000-mdtlov_UUID (at 10.0.0.122@tcp)
[ 1089.010284] Lustre: lustre-OST0000: Connection restored to 89c68bff-12c8-9f48-f01e-f6306c666eb9 (at 10.0.0.98@tcp)
[ 1281.516928] LustreError: 10410:0:(osd_handler.c:1982:osd_object_release()) LBUG
[ 1281.516939] Pid: 10410, comm: ll_ost_out00_00 3.10.0-957.1.3.el7_lustre.x86_64 #1 SMP Mon May 27 03:45:37 UTC 2019
[ 1281.516944] Call Trace:
[ 1281.516960]  [&amp;lt;ffffffffc05fd7cc&amp;gt;] libcfs_call_trace+0x8c/0xc0 [libcfs]
[ 1281.516986]  [&amp;lt;ffffffffc05fd87c&amp;gt;] lbug_with_loc+0x4c/0xa0 [libcfs]
[ 1281.517004]  [&amp;lt;ffffffffc0b93820&amp;gt;] osd_get_ldiskfs_dirent_param+0x0/0x130 [osd_ldiskfs]
[ 1281.517173]  [&amp;lt;ffffffffc07442b0&amp;gt;] lu_object_put+0x190/0x3e0 [obdclass]
[ 1281.517244]  [&amp;lt;ffffffffc09d8bc3&amp;gt;] out_handle+0x1503/0x1bc0 [ptlrpc]
[ 1281.517369]  [&amp;lt;ffffffffc09ce7ca&amp;gt;] tgt_request_handle+0x92a/0x1370 [ptlrpc]
[ 1281.517481]  [&amp;lt;ffffffffc097705b&amp;gt;] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
[ 1281.517582]  [&amp;lt;ffffffffc097a7a2&amp;gt;] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;(The full dmesg log collected in vmcore is in the attachment)&lt;/p&gt;

&lt;p&gt;Then I found after removing files under &apos;/O&apos; on OST, even a simple write operation can result in the same kernel panic.&lt;/p&gt;

&lt;p&gt;I&apos;m just curious about why &apos;osd_object_release&apos; is evoked in above situations and the position of LFSCK functions in the error call trace.&lt;/p&gt;

&lt;p&gt;Thanks a lot!&#160;&lt;/p&gt;</description>
                <environment>CentOS Linux release 7.7.1908 (Core)  with kernel 3.10.0-957.1.3.el7_lustre.x86_64 for Lustre 2.10.8 and CentOS Linux release 7.7.1908 (Core) with  3.10.0-1062.9.1.el7_lustre.x86_64 for Lustre 2.12.4.</environment>
        <key id="60906">LU-13980</key>
            <summary>Kernel panic on OST after removing files under &apos;/O&apos; folder</summary>
                <type id="3" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11318&amp;avatarType=issuetype">Task</type>
                                            <priority id="5" iconUrl="https://jira.whamcloud.com/images/icons/priorities/trivial.svg">Trivial</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="adilger">Andreas Dilger</assignee>
                                    <reporter username="rzhan">Runzhou Han</reporter>
                        <labels>
                    </labels>
                <created>Wed, 23 Sep 2020 02:40:43 +0000</created>
                <updated>Wed, 19 May 2021 17:35:47 +0000</updated>
                                            <version>Lustre 2.10.8</version>
                    <version>Lustre 2.12.4</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="280321" author="adilger" created="Wed, 23 Sep 2020 02:59:12 +0000"  >&lt;p&gt;Can you please provide some more information about this problem.&lt;/p&gt;

&lt;p&gt;How did you remove the objects under the &lt;tt&gt;/O&lt;/tt&gt; directory?  Was the filesystem mounted as both type &quot;&lt;tt&gt;lustre&lt;/tt&gt;&quot; and type &quot;&lt;tt&gt;ldiskfs&lt;/tt&gt;&quot; at the same time, or was the &quot;&lt;tt&gt;lustre&lt;/tt&gt;&quot; OST unmounted first?&lt;/p&gt;

&lt;p&gt;Which files were removed?  The log messages makes it appear that the filesystem was sufficiently corrupted that the OST startup process wasn&apos;t able to detect the Lustre configuration files.&lt;/p&gt;

&lt;p&gt;If you are able to reproduce this, please enable full debugging with &quot;&lt;tt&gt;lctl set_param debug=-1&lt;/tt&gt;&quot; on the OST before starting LFSCK, and then attach the debug log which should be written to &lt;tt&gt;/tmp/lustre_log.&amp;lt;timestamp&amp;gt;&lt;/tt&gt; when the LBUG is triggered, or can be dumped manually like &quot;&lt;tt&gt;lctl dk /tmp/lustre_log.txt&lt;/tt&gt;&quot;.&lt;/p&gt;</comment>
                            <comment id="280392" author="rzhan" created="Wed, 23 Sep 2020 15:21:43 +0000"  >&lt;p&gt;Thank you for you reply. &#160;&lt;/p&gt;

&lt;p&gt;I mounted as both type &quot;&lt;tt&gt;lustre&lt;/tt&gt;&quot; and type &quot;&lt;tt&gt;ldiskfs&lt;/tt&gt;&quot; at the same time.&#160;&lt;/p&gt;

&lt;p&gt;The file removed is a data stripe of a client file. In my configuration my stripe setting is:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;lfs setstripe -i 0 -c -1 -S 64K /lustre
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;For example, on client node I create a file with the following command:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;dd if=/dev/zero of=/lustre/10M bs=1M count=10
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Then I use the &quot;&lt;tt&gt;lfs getstripe /lustre/10M&lt;/tt&gt;&quot; to locate its data stripes on OSTs,&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@mds Desktop]# lfs getstripe 10M 
10M
lmm_stripe_count:  3
lmm_stripe_size:   65536
lmm_pattern:       1
lmm_layout_gen:    0
lmm_stripe_offset: 0
	obdidx		 objid		 objid		 group
	     0	             2	          0x2	             0
	     1	             2	          0x2	             0
	     2	             2	          0x2	             0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Next I remove one of them under one OST &quot;&lt;tt&gt;ldiskfs&lt;/tt&gt;&quot; mount point.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@oss0 osboxes]# rm -f /ost0_ldiskfs/O/0/d2/2
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Then running LFSCK on MDT will trigger kernel panic caused by LBUG.&lt;/p&gt;

&lt;p&gt;I&apos;m able to reproduce the LBUG. However, after kernel panic takes place, I&apos;m not able to manipulate the VM any more (I was using a virtual machine cluster). The VM would either freezes or I configure the kernel to reboot in x seconds after kernel panic. In the next boot I&apos;m not able to find &lt;tt&gt;/tmp/lustre_log.&amp;lt;timestamp&amp;gt;&lt;/tt&gt;.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="280497" author="adilger" created="Thu, 24 Sep 2020 09:34:57 +0000"  >&lt;p&gt;Mounting the OST filesystem as both &quot;&lt;tt&gt;lustre&lt;/tt&gt;&quot; and &quot;&lt;tt&gt;ldiskfs&lt;/tt&gt;&quot; at the same time is not supported, since (as you can see with this assertion) the state is being changed from underneath the filesystem in an unexpected manner.  It would be the same as if modifying the blocks underneath ext4 while it is mounted.&lt;/p&gt;

&lt;p&gt;I mistakenly thought that the &quot;&lt;tt&gt;lustre-OST0000: new disk, initializing&lt;/tt&gt;&quot; was caused by a large number of files being deleted from the filesystem before startup, but I now can see from the low OST object numbers in your &quot;&lt;tt&gt;lfs getstripe&lt;/tt&gt;&quot; output that this &lt;b&gt;is&lt;/b&gt; a new filesystem, so this message is expected.&lt;/p&gt;

&lt;p&gt;I agree that it would be good to handle this error more gracefully (e.g. return an error instead of LBUG).  Looking elsewhere in Jira, it seems that this LBUG is hit often enough that the error handling should really be more tolerant, since the design policy is that the server should not &lt;tt&gt;LASSERT()&lt;/tt&gt; on bad values that come from the client or disk.&lt;/p&gt;</comment>
                            <comment id="280529" author="rzhan" created="Thu, 24 Sep 2020 16:12:48 +0000"  >&lt;p&gt;I see. Maybe I should not mount them at the same time. &lt;/p&gt;

&lt;p&gt;Actually, I was trying to emulate some special cases in which the underneath file system is corrupted by accident while the system is still running. I want to see how Lustre reacts to these unexpected failures (especially LFSCK&apos;s reaction). &lt;/p&gt;

&lt;p&gt;In fact, to help develop more robust fsck for PFS/DFS, I&apos;m also doing the same thing to other systems (e.g., BeeGFS, OrangeFS and Ceph). Since Lustre heavily relies on kernel modules, I observed more kernel crashes in Lustre when injecting faults. That&apos;s why I&apos;m here to learn more about Lustre &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;. If possible, I&apos;m willing to help with moderate solutions to unexpected crashes.&lt;/p&gt;</comment>
                            <comment id="280710" author="gerrit" created="Sat, 26 Sep 2020 00:19:03 +0000"  >&lt;p&gt;Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/40058&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/40058&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13980&quot; title=&quot;Kernel panic on OST after removing files under &amp;#39;/O&amp;#39; folder&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13980&quot;&gt;LU-13980&lt;/a&gt; osd: remove osd_object_release LASSERT&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 50d7b40a457a733130f346515ab37ad3e1b54424&lt;/p&gt;</comment>
                            <comment id="285850" author="gerrit" created="Tue, 24 Nov 2020 07:49:51 +0000"  >&lt;p&gt;Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/40738&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/40738&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13980&quot; title=&quot;Kernel panic on OST after removing files under &amp;#39;/O&amp;#39; folder&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13980&quot;&gt;LU-13980&lt;/a&gt; osd-ldiskfs: print label instead of device&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 1e1137a66c634288e40f3a2a017ce6d5c003fa2d&lt;/p&gt;</comment>
                            <comment id="286581" author="gerrit" created="Thu, 3 Dec 2020 07:27:51 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/40738/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/40738/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13980&quot; title=&quot;Kernel panic on OST after removing files under &amp;#39;/O&amp;#39; folder&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13980&quot;&gt;LU-13980&lt;/a&gt; osd-ldiskfs: print label instead of device&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 8f793f14bf9928352623e61122f005252605b136&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="50469">LU-10581</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="42881">LU-8992</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="55784">LU-12357</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="56867">LU-12741</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="56220">LU-12485</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="57443">LU-13000</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="36089" name="vmcore-dmesg.txt" size="35807" author="rzhan" created="Wed, 23 Sep 2020 02:21:27 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i01ajr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>