<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:50:45 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-5353] GPF in __mdd_orphan_cleanup()</title>
                <link>https://jira.whamcloud.com/browse/LU-5353</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;I have a crashed FS with 4 MDTs. On each restart I see the following GPF at the end of recovery:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[  239.572023] Lustre: lustre-MDT0001-osp-MDT0002: Connection restored to lustre-MDT0001 (at 0@lo)
[  239.572031] Lustre: lustre-MDT0002: Recovery over after 1:10, of 5 clients 3 recovered and 2 we\
re evicted.
[  239.593697] general protection fault: 0000 [#1] SMP
[  239.594184] last sysfs file: /sys/devices/system/cpu/possible
[  239.594184] CPU 0
[  239.594184] Modules linked in: lustre(U) ofd(U) osp(U) lod(U) ost(U) mdt(U) mdd(U) mgs(U) nodem\
ap(U) osd_ldiskfs(U) ldiskfs(U) exportfs lquota(U) lfsck(U) jbd obdecho(U) mgc(U) lov(U) osc(U) md\
c(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) ksocklnd(U) lnet(U) sha512_generic sha256_generic \
libcfs(U) autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipv6 microcode virtio_balloon virti\
o_net i2c_piix4 i2c_core ext4 jbd2 mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_\
generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
[  239.594184]
[  239.594184] Pid: 3307, comm: orph_cleanup_lu Not tainted 2.6.32-431.5.1.el6.lustre.x86_64 #1 Bo\
chs Bochs
[  239.594184] RIP: 0010:[&amp;lt;ffffffffa0b9b4dc&amp;gt;]  [&amp;lt;ffffffffa0b9b4dc&amp;gt;] ldiskfs_htree_lock+0xc/0xe0 [l\
diskfs]
[  239.594184] RSP: 0018:ffff8801e2f21d80  EFLAGS: 00010246
[  239.594184] RAX: ffff8801e2ef0000 RBX: ffff8801e2ef0570 RCX: 0000000000000003
[  239.594184] RDX: 6b6b6b6b6b6b6b6b RSI: 6b6b6b6b6b6b6b6b RDI: ffff8801f7f76720
[  239.594184] RBP: ffff8801e2f21d90 R08: 0000000000000000 R09: ffff8801f52b5c70
[  239.594184] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801e38f9488
[  239.594184] R13: ffff8801f7f76720 R14: 6b6b6b6b6b6b6b6b R15: ffff8801e2ef0578
[  239.594184] FS:  0000000000000000(0000) GS:ffff88002f800000(0000) knlGS:0000000000000000
[  239.594184] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[  239.594184] CR2: 00007f4526ddd000 CR3: 0000000217e23000 CR4: 00000000000006f0
[  239.594184] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  239.594184] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  239.594184] Process orph_cleanup_lu (pid: 3307, threadinfo ffff8801e2f20000, task ffff8801e2f1e\
2c0)
[  239.594184] Stack:
[  239.594184]  ffff8801e2ef0570 ffff8801e38f9488 ffff8801e2f21de0 ffffffffa0bfe43c
[  239.594184] &amp;lt;d&amp;gt; ffff8801e2ef0570 ffffffffa0455c4d ffff88021908a4b0 ffff8801e2ef0570
[  239.594184] &amp;lt;d&amp;gt; ffff88021908a4b0 ffffffffa0d71380 ffff8801f52b46e8 ffff8801f50dc9e8
[  239.594184] Call Trace:
[  239.594184]  [&amp;lt;ffffffffa0bfe43c&amp;gt;] osd_ldiskfs_it_fill+0x9c/0x240 [osd_ldiskfs]
[  239.594184]  [&amp;lt;ffffffffa0455c4d&amp;gt;] ? lu_object_put+0xad/0x330 [obdclass]
[  239.594184]  [&amp;lt;ffffffffa0bfe779&amp;gt;] osd_it_ea_next+0x99/0x190 [osd_ldiskfs]
[  239.594184]  [&amp;lt;ffffffffa0d4e941&amp;gt;] lod_it_next+0x21/0x90 [lod]
[  239.594184]  [&amp;lt;ffffffffa081c0da&amp;gt;] __mdd_orphan_cleanup+0x30a/0xfb0 [mdd]
[  239.594184]  [&amp;lt;ffffffffa081bdd0&amp;gt;] ? __mdd_orphan_cleanup+0x0/0xfb0 [mdd]
[  239.594184]  [&amp;lt;ffffffff8109eab6&amp;gt;] kthread+0x96/0xa0
[  239.594184]  [&amp;lt;ffffffff8100c30a&amp;gt;] child_rip+0xa/0x20
[  239.594184]  [&amp;lt;ffffffff81554710&amp;gt;] ? _spin_unlock_irq+0x30/0x40
[  239.594184]  [&amp;lt;ffffffff8100bb10&amp;gt;] ? restore_args+0x0/0x30
[  239.594184]  [&amp;lt;ffffffff8109ea20&amp;gt;] ? kthread+0x0/0xa0
[  239.594184]  [&amp;lt;ffffffff8100c300&amp;gt;] ? child_rip+0x0/0x20
[  239.594184] Code: 4c 89 e7 e8 37 ee ff ff 8b 55 cc 89 55 b0 e9 7c ff ff ff c7 45 b0 00 00 00 00\
 e9 68 ff ff ff 55 48 89 e5 41 54 53 0f 1f 44 00 00 &amp;lt;48&amp;gt; 8b 82 18 02 00 00 48 89 fb 49 89 d4 48 8b\
 80 f0 04 00 00 48
[  239.594184] RIP  [&amp;lt;ffffffffa0b9b4dc&amp;gt;] ldiskfs_htree_lock+0xc/0xe0 [ldiskfs]
[  239.594184]  RSP &amp;lt;ffff8801e2f21d80&amp;gt;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Messages leading up to this are unremarkable.&lt;/p&gt;

&lt;p&gt;It seems noteworthy that at the bottom of orph_index_iterate() we are clobbering the rc of iops-&amp;gt;get() and iops-&amp;gt;load():&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;                &lt;span class=&quot;code-comment&quot;&gt;/* after index delete reset iterator */&lt;/span&gt;
                &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (rc == 0)
                        rc = iops-&amp;gt;get(env, it, (&lt;span class=&quot;code-keyword&quot;&gt;const&lt;/span&gt; void *)&quot;&quot;);
                &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt;
                        rc = iops-&amp;gt;load(env, it, cookie);
next:
                rc = iops-&amp;gt;next(env, it);
        } &lt;span class=&quot;code-keyword&quot;&gt;while&lt;/span&gt; (rc == 0);
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment></environment>
        <key id="25591">LU-5353</key>
            <summary>GPF in __mdd_orphan_cleanup()</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="6" iconUrl="https://jira.whamcloud.com/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="bzzz">Alex Zhuravlev</assignee>
                                    <reporter username="jhammond">John Hammond</reporter>
                        <labels>
                            <label>MB</label>
                            <label>mdd</label>
                    </labels>
                <created>Tue, 15 Jul 2014 15:33:21 +0000</created>
                <updated>Mon, 27 Apr 2015 20:31:05 +0000</updated>
                            <resolved>Mon, 27 Apr 2015 20:31:05 +0000</resolved>
                                    <version>Lustre 2.6.0</version>
                    <version>Lustre 2.7.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="89104" author="jhammond" created="Tue, 15 Jul 2014 18:52:57 +0000"  >&lt;p&gt;To reproduce it&apos;s enough to have an orphan striped directory.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;export MDSCOUNT=4
sh lustre/tests/llmount.sh
cd /mnt/lustre
lfs mkdir -c4 d0
exec 7&amp;lt;d0
rmdir d0
echo &apos;Hi Alex!&apos; &amp;gt; /proc/sys/lnet/force_lbug
...
export MDSCOUNT=4
export NOFORMAT=1
sh lustre/tests/llmount.sh
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="89507" author="jhammond" created="Fri, 18 Jul 2014 17:15:10 +0000"  >&lt;p&gt;This is because the same osd_it_ea is being used twice in orph_index_iterate(). Once to start iterating through the orphans:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000004:00000001:2.0:1405700574.465351:0:6666:0:(mdd_orphans.c:451:orph_index_iterate()) Process entered
00080000:00000001:2.0:1405700574.468385:0:6666:0:(osd_handler.c:4735:osd_it_ea_init()) Process entered
00080000:00000001:2.0:1405700574.468392:0:6666:0:(osd_handler.c:4762:osd_it_ea_init()) Process leaving (rc=18446612139451008368 : -131934258543248 : ffff8801a962c570)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;and once to destroy an orphaned striped directory:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;orph_key_test_and_del()
 orphan_object_destroy()
  lod_declare_object_destroy()
   lod_load_striping_locked()
    lod_load_lmv_shards()
     osd_it_ea_init()
00080000:00000001:2.0:1405700574.468853:0:6666:0:(osd_handler.c:4762:osd_it_ea_init()) Process leaving (rc=18446612139451008368 : -13193\
4258543248 : ffff8801a962c570)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="104257" author="adilger" created="Wed, 21 Jan 2015 20:04:15 +0000"  >&lt;p&gt;Fan Yong, did this problem with the duplicate iterator usage get fixed in another patch of yours?  If yes, please link this to the other bug as a duplicate and then close as a duplicate. Please also paste in the Gerrit patch URL here. &lt;/p&gt;</comment>
                            <comment id="104513" author="yong.fan" created="Fri, 23 Jan 2015 15:17:23 +0000"  >&lt;p&gt;Andreas, we have hit similar trouble before, and introduced osd_thread_info::oti_it_inline to allocate osd_it_ea dynamically via the patch:&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/#/c/11847/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/11847/&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/#/c/11847/12/lustre/osd-ldiskfs/osd_handler.c,cm&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/11847/12/lustre/osd-ldiskfs/osd_handler.c,cm&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But such patch is not enough. Because the osd_thread_info::oti_it_dentry/oti_it_ea_buf still may be shared by multiple iterations. Wangdi have a patch to fix that:&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/#/c/13223/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/13223/&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/#/c/13223/7/lustre/osd-ldiskfs/osd_handler.c,cm&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/13223/7/lustre/osd-ldiskfs/osd_handler.c,cm&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="26086">LU-5519</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="19632">LU-3536</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwrhz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>14925</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>