<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:36:08 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-3697] racing HSM release and restore vs cat leaves cat hung in ll_layout_refresh()</title>
                <link>https://jira.whamcloud.com/browse/LU-3697</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Using 2.4.52-90-g8b8b7b3 with the patch set 12 of the coordinator main thread, I see what looks like lost agent request/missing layout lock release. To reproduce, start HSM with two client mount (/mnt/lustre and /mnt/lustre2) and one mount for the CT (/mnt/lustre-hsm). Then do&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# cd /mnt/lustre
# touch f0
# lfs hsm_archive f0
# # Wait for archive to complete.
# while true; do cat f0; done
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# cd /mnt/lustre
# while true; do lfs hsm_release f0; lfs hsm_restore f0; done
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After a few seconds cat will hang in layout refresh, while the CT, the coordinator, and all of the mdt/ldlm threads will be idle:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;q:lustre2# p-all cat
22461 cat
[&amp;lt;ffffffffa04e866e&amp;gt;] cfs_waitq_wait+0xe/0x10 [libcfs]
[&amp;lt;ffffffffa07f61fa&amp;gt;] ldlm_completion_ast+0x57a/0x960 [ptlrpc]
[&amp;lt;ffffffffa07f0746&amp;gt;] ldlm_cli_enqueue_fini+0x936/0xe70 [ptlrpc]
[&amp;lt;ffffffffa07f1025&amp;gt;] ldlm_cli_enqueue+0x3a5/0x770 [ptlrpc]
[&amp;lt;ffffffffa0a6ca2e&amp;gt;] mdc_enqueue+0x2ce/0x1a00 [mdc]
[&amp;lt;ffffffffa0a1da76&amp;gt;] lmv_enqueue+0x336/0x1060 [lmv]
[&amp;lt;ffffffffa0ee1e26&amp;gt;] ll_layout_refresh+0x556/0x1150 [lustre]
[&amp;lt;ffffffffa0f2fc4b&amp;gt;] vvp_io_fini+0x16b/0x260 [lustre]
[&amp;lt;ffffffffa0f310ec&amp;gt;] vvp_io_read_fini+0x5c/0x70 [lustre]
[&amp;lt;ffffffffa06adcf7&amp;gt;] cl_io_fini+0x77/0x280 [obdclass]
[&amp;lt;ffffffffa0ed0687&amp;gt;] ll_file_io_generic+0xe7/0x610 [lustre]
[&amp;lt;ffffffffa0ed0cef&amp;gt;] ll_file_aio_read+0x13f/0x2c0 [lustre]
[&amp;lt;ffffffffa0ed158c&amp;gt;] ll_file_read+0x16c/0x2a0 [lustre]
[&amp;lt;ffffffff81182e05&amp;gt;] vfs_read+0xb5/0x1a0
[&amp;lt;ffffffff81182f41&amp;gt;] sys_read+0x51/0x90
[&amp;lt;ffffffff8100b072&amp;gt;] system_call_fastpath+0x16/0x1b
[&amp;lt;ffffffffffffffff&amp;gt;] 0xffffffffffffffff

q:lustre2# p-all hsm
6332 hsm_cdtr
[&amp;lt;ffffffffa077d641&amp;gt;] cfs_waitq_timedwait+0x11/0x20 [libcfs]
[&amp;lt;ffffffffa05ebe0c&amp;gt;] mdt_coordinator+0xcac/0x1820 [mdt]
[&amp;lt;ffffffff81096936&amp;gt;] kthread+0x96/0xa0
[&amp;lt;ffffffff8100c0ca&amp;gt;] child_rip+0xa/0x20
[&amp;lt;ffffffffffffffff&amp;gt;] 0xffffffffffffffff

6344 lhsmtool_posix
[&amp;lt;ffffffff8118cc3b&amp;gt;] pipe_wait+0x5b/0x80
[&amp;lt;ffffffff8118d6e6&amp;gt;] pipe_read+0x3e6/0x4e0
[&amp;lt;ffffffff8118251a&amp;gt;] do_sync_read+0xfa/0x140
[&amp;lt;ffffffff81182e05&amp;gt;] vfs_read+0xb5/0x1a0
[&amp;lt;ffffffff81182f41&amp;gt;] sys_read+0x51/0x90
[&amp;lt;ffffffff8100b072&amp;gt;] system_call_fastpath+0x16/0x1b
[&amp;lt;ffffffffffffffff&amp;gt;] 0xffffffffffffffff

5792 mdt00_000
[&amp;lt;ffffffffa077d66e&amp;gt;] cfs_waitq_wait+0xe/0x10 [libcfs]
[&amp;lt;ffffffffa0a5e6da&amp;gt;] ptlrpc_wait_event+0x28a/0x290 [ptlrpc]
[&amp;lt;ffffffffa0a681a7&amp;gt;] ptlrpc_main+0x7f7/0x1700 [ptlrpc]
[&amp;lt;ffffffff81096936&amp;gt;] kthread+0x96/0xa0
[&amp;lt;ffffffff8100c0ca&amp;gt;] child_rip+0xa/0x20
[&amp;lt;ffffffffffffffff&amp;gt;] 0xffffffffffffffff

...
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;On /mnt/lustre the HSM state of f0 shows as exists and archived. The coordinator does not show any actions or requests in flight.&lt;/p&gt;</description>
                <environment></environment>
        <key id="20185">LU-3697</key>
            <summary>racing HSM release and restore vs cat leaves cat hung in ll_layout_refresh()</summary>
                <type id="7" iconUrl="https://jira.whamcloud.com/images/icons/issuetypes/task_agile.png">Technical task</type>
                            <parent id="20020">LU-3647</parent>
                                    <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="jay">Jinshan Xiong</assignee>
                                    <reporter username="jhammond">John Hammond</reporter>
                        <labels>
                            <label>HSM</label>
                    </labels>
                <created>Mon, 5 Aug 2013 18:37:30 +0000</created>
                <updated>Thu, 5 Sep 2013 03:10:57 +0000</updated>
                            <resolved>Thu, 5 Sep 2013 03:10:57 +0000</resolved>
                                    <version>Lustre 2.5.0</version>
                                    <fixVersion>Lustre 2.5.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="63681" author="jay" created="Mon, 5 Aug 2013 23:08:55 +0000"  >&lt;p&gt;I saw this kind of problems quite often before, the root cause was CT met errors so that the layout lock was failed to release which blocked the process on client.&lt;/p&gt;

&lt;p&gt;Did you see any error messages printed CT daemon?&lt;/p&gt;</comment>
                            <comment id="63683" author="jhammond" created="Tue, 6 Aug 2013 00:22:57 +0000"  >&lt;p&gt;No.&lt;/p&gt;</comment>
                            <comment id="65790" author="jay" created="Thu, 5 Sep 2013 03:10:57 +0000"  >&lt;p&gt;This is solved.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvx5r:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9543</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>