<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:03:06 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-13656] lfs migrate -m hangs a few minutes at start (sometimes)</title>
                <link>https://jira.whamcloud.com/browse/LU-13656</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;With Lustre 2.12.5 RC1, &lt;tt&gt;lfs migrate -m&lt;/tt&gt;&#160;now sometimes hangs at start.&#160; The directory migrated is not accessible from any clients for a few minutes. For us, this seems to be a regression of 2.12.5 vs 2.12.4.&#160; Often, the hang doesn&apos;t generate any trace, but this morning, I saw one on the source MDT, MDT0001. The goal here is to migrate a directory from MDT0001 to MDT0003:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;lfs migrate -m 3 -v /fir/users/galvisf
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The first backtrace to be seen is pasted below. This is after a few minutes of hang. I&apos;m also attaching the kernel logs from MDT1 as &lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/35126/35126_fir-md1-s2_2.12.5_20200609_kern.log&quot; title=&quot;fir-md1-s2_2.12.5_20200609_kern.log attached to LU-13656&quot;&gt;fir-md1-s2_2.12.5_20200609_kern.log&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2656 Jun 09 09:28:01 fir-md1-s2 kernel: LNet: Service thread pid 23883 was inactive for 200.15s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
2657 Jun 09 09:28:01 fir-md1-s2 kernel: Pid: 23883, comm: mdt01_069 3.10.0-957.27.2.el7_lustre.pl2.x86_64 #1 SMP Thu Nov 7 15:26:16 PST 2019
2658 Jun 09 09:28:01 fir-md1-s2 kernel: Call Trace:
2659 Jun 09 09:28:01 fir-md1-s2 kernel:  [&amp;lt;ffffffffc0cefc80&amp;gt;] ldlm_completion_ast+0x430/0x860 [ptlrpc]
2660 Jun 09 09:28:01 fir-md1-s2 kernel:  [&amp;lt;ffffffffc0cf1dff&amp;gt;] ldlm_cli_enqueue_fini+0x96f/0xdf0 [ptlrpc]
2661 Jun 09 09:28:01 fir-md1-s2 kernel:  [&amp;lt;ffffffffc0cf46be&amp;gt;] ldlm_cli_enqueue+0x40e/0x920 [ptlrpc]
2662 Jun 09 09:28:01 fir-md1-s2 kernel:  [&amp;lt;ffffffffc16cfec2&amp;gt;] osp_md_object_lock+0x162/0x2d0 [osp]
2663 Jun 09 09:28:01 fir-md1-s2 kernel:  [&amp;lt;ffffffffc15e38c4&amp;gt;] lod_object_lock+0xf4/0x780 [lod]
2664 Jun 09 09:28:01 fir-md1-s2 kernel:  [&amp;lt;ffffffffc1665bbe&amp;gt;] mdd_object_lock+0x3e/0xe0 [mdd]
2665 Jun 09 09:28:01 fir-md1-s2 kernel:  [&amp;lt;ffffffffc14ce401&amp;gt;] mdt_remote_object_lock_try+0x1e1/0x750 [mdt]
2666 Jun 09 09:28:01 fir-md1-s2 kernel:  [&amp;lt;ffffffffc14ce99a&amp;gt;] mdt_remote_object_lock+0x2a/0x30 [mdt]
2667 Jun 09 09:28:01 fir-md1-s2 kernel:  [&amp;lt;ffffffffc14e356e&amp;gt;] mdt_rename_lock+0xbe/0x4b0 [mdt]
2668 Jun 09 09:28:01 fir-md1-s2 kernel:  [&amp;lt;ffffffffc14e58d5&amp;gt;] mdt_reint_rename+0x2c5/0x2b90 [mdt]
2669 Jun 09 09:28:01 fir-md1-s2 kernel:  [&amp;lt;ffffffffc14ee963&amp;gt;] mdt_reint_rec+0x83/0x210 [mdt]
2670 Jun 09 09:28:01 fir-md1-s2 kernel:  [&amp;lt;ffffffffc14cb273&amp;gt;] mdt_reint_internal+0x6e3/0xaf0 [mdt]
2671 Jun 09 09:28:01 fir-md1-s2 kernel:  [&amp;lt;ffffffffc14d66e7&amp;gt;] mdt_reint+0x67/0x140 [mdt]
2672 Jun 09 09:28:01 fir-md1-s2 kernel:  [&amp;lt;ffffffffc0d9066a&amp;gt;] tgt_request_handle+0xada/0x1570 [ptlrpc]
2673 Jun 09 09:28:01 fir-md1-s2 kernel:  [&amp;lt;ffffffffc0d3344b&amp;gt;] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
2674 Jun 09 09:28:01 fir-md1-s2 kernel:  [&amp;lt;ffffffffc0d36db4&amp;gt;] ptlrpc_main+0xb34/0x1470 [ptlrpc]
2675 Jun 09 09:28:01 fir-md1-s2 kernel:  [&amp;lt;ffffffff86ec2e81&amp;gt;] kthread+0xd1/0xe0
2676 Jun 09 09:28:01 fir-md1-s2 kernel:  [&amp;lt;ffffffff87577c24&amp;gt;] ret_from_fork_nospec_begin+0xe/0x21
2677 Jun 09 09:28:01 fir-md1-s2 kernel:  [&amp;lt;ffffffffffffffff&amp;gt;] 0xffffffffffffffff
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Then, finally the thread is completing:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Jun 09 09:28:11 fir-md1-s2 kernel: LNet: Service thread pid 23636 completed after 210.08s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;On the client, operations have resumed and the migration is now in progress:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@fir-rbh01 robinhood]# lfs getdirstripe /fir/users/galvisf/
lmv_stripe_count: 2 lmv_stripe_offset: 3 lmv_hash_type: fnv_1a_64,migrating
mdtidx		 FID[seq:oid:ver]
     3		 [0x2800401af:0x132aa:0x0]		
     1		 [0x240057d88:0xa6e5:0x0]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;But this had an impact on production during that time, as while the lfs migrate starts, jobs for this specific user were blocked on I/O... this is why we wanted to report this problem.&lt;/p&gt;

&lt;p&gt;Thanks!&lt;br/&gt;
 Stephane&lt;/p&gt;</description>
                <environment>Lustre 2.12.5 RC1 - CentOS 7.6 3.10.0-957.27.2.el7_lustre.pl2.x86_64</environment>
        <key id="59501">LU-13656</key>
            <summary>lfs migrate -m hangs a few minutes at start (sometimes)</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="laisiyao">Lai Siyao</assignee>
                                    <reporter username="sthiell">Stephane Thiell</reporter>
                        <labels>
                    </labels>
                <created>Tue, 9 Jun 2020 16:46:18 +0000</created>
                <updated>Wed, 4 Nov 2020 21:58:25 +0000</updated>
                            <resolved>Wed, 4 Nov 2020 21:58:25 +0000</resolved>
                                    <version>Lustre 2.12.5</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>2</watches>
                                                                            <comments>
                            <comment id="272477" author="pjones" created="Wed, 10 Jun 2020 17:07:51 +0000"  >&lt;p&gt;Lai&lt;/p&gt;

&lt;p&gt;Could you please advise?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="273825" author="sthiell" created="Fri, 26 Jun 2020 16:03:34 +0000"  >&lt;p&gt;Note that I&apos;ve not seen that problem anymore, and we&apos;ve been using &lt;tt&gt;lfs migrate&lt;/tt&gt; non-stop since then. Perhaps just a random glitch after the 2.12.4 -&amp;gt; 2.12.5 upgrade. I&apos;ll report back if we see this problem again.&lt;/p&gt;</comment>
                            <comment id="284279" author="adilger" created="Wed, 4 Nov 2020 21:58:25 +0000"  >&lt;p&gt;Closing this as &quot;Cannot Reproduce&quot; based on Stephane&apos;s last comments from 5 months ago.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="35126" name="fir-md1-s2_2.12.5_20200609_kern.log" size="358722" author="sthiell" created="Tue, 9 Jun 2020 16:43:40 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i012bb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>