<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:12:07 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-959] kuc channels not reestablished after MDS crash</title>
                <link>https://jira.whamcloud.com/browse/LU-959</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;It seams the kuc channels are not reestablished after a MDS crash.&lt;br/&gt;
In particular, process that are listening for changelogs remain stuck listening to the kernel-&amp;gt;userspace pipe&lt;br/&gt;
whereas no message is sent from the MDS, as clients do not re-register kuc listeners after reconnecting to the MDS.&lt;/p&gt;

&lt;p&gt;It would probably need an action in mdc_import_event() to reregister kuc listeners,&lt;br/&gt;
something like:&lt;/p&gt;

&lt;p&gt;in mdc_import_event():&lt;/p&gt;

&lt;p&gt;         case IMP_EVENT_ACTIVE: {&lt;br/&gt;
                 rc = obd_notify_observer(obd, obd, OBD_NOTIFY_ACTIVE, NULL);&lt;br/&gt;
+                /* restore re-establish kuc registration after reconnecting */&lt;br/&gt;
+                if (rc == 0)&lt;br/&gt;
+                        rc = mdc_kuc_reregister(imp);&lt;/p&gt;</description>
                <environment></environment>
        <key id="12785">LU-959</key>
            <summary>kuc channels not reestablished after MDS crash</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="6" iconUrl="https://jira.whamcloud.com/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="niu">Niu Yawei</assignee>
                                    <reporter username="leibovici-cea">Thomas LEIBOVICI - CEA</reporter>
                        <labels>
                    </labels>
                <created>Wed, 4 Jan 2012 05:40:33 +0000</created>
                <updated>Wed, 11 Apr 2012 02:52:27 +0000</updated>
                            <resolved>Wed, 11 Apr 2012 02:52:27 +0000</resolved>
                                    <version>Lustre 2.0.0</version>
                    <version>Lustre 2.1.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                            <comments>
                            <comment id="25477" author="pjones" created="Wed, 4 Jan 2012 09:14:12 +0000"  >&lt;p&gt;Niu&lt;/p&gt;

&lt;p&gt;Could you please look into this one&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="25850" author="niu" created="Thu, 5 Jan 2012 02:08:47 +0000"  >&lt;p&gt;Hi, Thomas&lt;/p&gt;

&lt;p&gt;I don&apos;t quite follow your description of this ticket, and I didn&apos;t find mdc_kuc_reregister() neither. Could you elaborate this ticket a little bit more or post the patch on gerrit for review? Thank you.&lt;/p&gt;</comment>
                            <comment id="25853" author="leibovici-cea" created="Thu, 5 Jan 2012 03:55:04 +0000"  >&lt;p&gt;OK, I&apos;ll try to explain the issue with more details.&lt;/p&gt;

&lt;p&gt;To receive MDT changelogs from a client, llapi_changelog_start() is called by the user space program (like lfs):&lt;br/&gt;
1) this creates a &quot;kuc&quot; channel (communuication channel between mdc and user space, which is implemented in libcfs/libcfs/kernel_user_comm.c)&lt;br/&gt;
2) it calls an ioctl which results in calling mdc_ioc_changelog_send() in mdc that sends an RPC to MDS, to notify it it must send CL records to this client.&lt;br/&gt;
Then, the user space program calls llapi_changelog_recv() to get changelog records. This listens to incoming data from the kuc channel.&lt;/p&gt;

&lt;p&gt;The problem is there is no recovery mechanism of KUC channels when the MDS restarts:&lt;br/&gt;
the client remains blocked in llapi_changelog_recv() an no more data is sent from the MDS (it forgot a client was listening for changelogs).&lt;br/&gt;
I think there should be an internal mechanism in MDC to call mdc_ioc_changelog_send() again after a MDC/MDS reconnection.&lt;/p&gt;

&lt;p&gt;This is what I suggested: a mdc_kuc_reregister() should be implemented to be called in mdc_import_event(),&lt;br/&gt;
so mdc_ioc_changelog_send() it called for each registered process in the kuc layer.&lt;/p&gt;

&lt;p&gt;Do you have a better understanding of this issue?&lt;br/&gt;
Thanks&lt;/p&gt;</comment>
                            <comment id="25856" author="niu" created="Thu, 5 Jan 2012 06:37:18 +0000"  >&lt;p&gt;Thanks a lot for the details, Thomas. I think I have much better understanding now, but I still don&apos;t see why client was blocked in llapi_changelog_recv() after MDS restarted: mdc_ioc_changelog_send() just use the llog APIs to read changelog on MDS then put it in the pipe, so when MDS restarts, no matter if the client llog process procedure break earlier for an RPC error, the CL_EOF will always be written, and llapi_changelog_recv() should receive this EOF record and break reading.&lt;/p&gt;

&lt;p&gt;Do you have the debug log and stack trace when the process stuck in llapi_changelog_recv()? &lt;/p&gt;</comment>
                            <comment id="25861" author="leibovici-cea" created="Thu, 5 Jan 2012 08:17:21 +0000"  >&lt;p&gt;Right, I see what you mean. Maybe my initial understanding of the problem is wrong.&lt;br/&gt;
Unfortunately, I have no stack for you right now. Just something we noticed...&lt;br/&gt;
So we&apos;ll have to wait for the next MDS crash to get a detailed stack, which will be hopefully not too soon &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;</comment>
                            <comment id="34361" author="niu" created="Tue, 10 Apr 2012 02:16:48 +0000"  >&lt;p&gt;Thomas, is it still relevant? can we close it?&lt;/p&gt;</comment>
                            <comment id="34496" author="leibovici-cea" created="Wed, 11 Apr 2012 02:45:53 +0000"  >&lt;p&gt;OK, let&apos;s close it. I&apos;ll reopen it in case of new occurrence.&lt;/p&gt;</comment>
                            <comment id="34497" author="niu" created="Wed, 11 Apr 2012 02:52:27 +0000"  >&lt;p&gt;not reproduced, close it for now.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvhlb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>6499</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>