<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:18:49 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-1687] Unloading lustre modules and reloading again leaves MDS with an empty /proc/fs/lustre</title>
                <link>https://jira.whamcloud.com/browse/LU-1687</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;The sanity-quota test_32 failed in my testing.&lt;br/&gt;
After modules were unloaded and reloaded, the mds was left with an empty /proc/fs/lustre/. No file or subdirs under that directories. I had to reboot the mds and the clients to get them back. There were no error messages on service337 (mds) about problems of unloading and loading modules.&lt;/p&gt;

&lt;p&gt;The problem is reproducible. Fortunately, we do not need to perform this sequence of operation on mds often.&lt;/p&gt;</description>
                <environment>&lt;a href=&quot;https://github.com/jlan/lustre-nas/tree/nas-2.1.2&quot;&gt;https://github.com/jlan/lustre-nas/tree/nas-2.1.2&lt;/a&gt;&lt;br/&gt;
sanity-quota test_32&lt;br/&gt;
mds: service337&lt;br/&gt;
oss1: service261&lt;br/&gt;
oss2: service262&lt;br/&gt;
client: service331, service332</environment>
        <key id="15337">LU-1687</key>
            <summary>Unloading lustre modules and reloading again leaves MDS with an empty /proc/fs/lustre</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="6">Not a Bug</resolution>
                                        <assignee username="laisiyao">Lai Siyao</assignee>
                                    <reporter username="jaylan">Jay Lan</reporter>
                        <labels>
                    </labels>
                <created>Fri, 27 Jul 2012 20:17:54 +0000</created>
                <updated>Fri, 19 Apr 2019 00:14:36 +0000</updated>
                            <resolved>Mon, 6 Aug 2012 13:47:03 +0000</resolved>
                                    <version>Lustre 2.1.2</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>1</watches>
                                                                            <comments>
                            <comment id="42418" author="pjones" created="Fri, 27 Jul 2012 21:09:56 +0000"  >&lt;p&gt;Lai&lt;/p&gt;

&lt;p&gt;Could you please look into this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="42441" author="laisiyao" created="Mon, 30 Jul 2012 12:22:36 +0000"  >&lt;p&gt;I can&apos;t reproduce in my setup, I&apos;ll look more into debuglog and code to find the cause.&lt;/p&gt;</comment>
                            <comment id="42466" author="jaylan" created="Mon, 30 Jul 2012 16:20:30 +0000"  >&lt;p&gt;Fortunately I am still able to reproduce. Let me know what I can do to help debug this problem.&lt;/p&gt;

&lt;p&gt;BTW, the source of our lustre source can be git cloned from&lt;br/&gt;
&lt;a href=&quot;https://github.com/jlan/lustre-nas/commits/nas-2.1.2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/jlan/lustre-nas/commits/nas-2.1.2&lt;/a&gt;&lt;/p&gt;
</comment>
                            <comment id="42486" author="laisiyao" created="Tue, 31 Jul 2012 03:36:56 +0000"  >&lt;p&gt;I can&apos;t build lustre against your git code because LUSTRE_KERNEL_VERSION is undefined. Git log shows it&apos;s removed in commit c2751b31e55518d1791cd5b87adc842f4fbbee83, could you help verify it? And if the code can be built on your system, could you output /proc/fs/lustre/version?&lt;/p&gt;</comment>
                            <comment id="42505" author="jaylan" created="Tue, 31 Jul 2012 13:40:53 +0000"  >&lt;p&gt;I checked again, the commit c2751b3 is in nas-2.1.2 branch. It was not removed or reversed.&lt;/p&gt;

&lt;p&gt;service337 ~ # cat /proc/fs/lustre/version&lt;br/&gt;
lustre: 2.1.2&lt;br/&gt;
kernel: 2.6.32-220.4.1.el6.20120607.x86_64.lustre212&lt;br/&gt;
build:  2nasS_ofed154&lt;br/&gt;
service337 ~ #&lt;/p&gt;

&lt;p&gt;I built the 2.6.32-220.4.1.el6 kernel with kernel_patches for el6 from the 2.1.2 branch. Your kernel can be named different. Also, I used 1.5.4.1 ofa_kernel modules. &lt;/p&gt;

&lt;p&gt;I hereby attach two scripts file I used for my build: nas-config.sh.rhel62.212 and nas-make.sh.rhel62.212.&lt;br/&gt;
They are for your reference. You should adjust for your target system.&lt;/p&gt;</comment>
                            <comment id="42550" author="laisiyao" created="Wed, 1 Aug 2012 12:18:00 +0000"  >&lt;p&gt;Yes, I can compile with your script, but this test still passes on my system. After this test fails, could you successfully mount lustre on MDS?&lt;/p&gt;</comment>
                            <comment id="42571" author="jaylan" created="Wed, 1 Aug 2012 20:43:17 +0000"  >&lt;p&gt;I am able to reproduce &quot;good&quot; and &quot;bad&quot; cases without running sanity-quota test_32.&lt;/p&gt;

&lt;p&gt;After a lustre system has been set up, the &quot;good&quot; case operation sequence will be:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;#Start to tear down mds&lt;br/&gt;
#1 lustre_rmmod&lt;br/&gt;
#2 umount /mnt/mds1&lt;/li&gt;
	&lt;li&gt;#Now to recover mds&lt;br/&gt;
#1 modprobe lustre&lt;br/&gt;
#2 mount -t lustre -o errors=panic,acl,noextents  /dev/sdb1 /mnt/mds1&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;The filesystem will be in good shape and usable.&lt;/p&gt;

&lt;p&gt;If I &apos;umount /mnt/mds1&apos; before run &apos;lustre_rmmod&apos;, the the empty /proc/fs/lustre problem will happen after I do the same mds recovery opearations. The mount command will return and &apos;mount&apos; will show mds1 mounted. However, the filesystem is not usable.&lt;/p&gt;

&lt;p&gt;Can you try the operation sequence and let me know if you can reproduce the problem?&lt;/p&gt;</comment>
                            <comment id="42577" author="laisiyao" created="Wed, 1 Aug 2012 23:20:51 +0000"  >&lt;p&gt;No, I can&apos;t reproduce here. BTW could you explain why you need do mds recovery? IMO it&apos;s shutdown and started up normally in your case.&lt;/p&gt;</comment>
                            <comment id="42580" author="jaylan" created="Wed, 1 Aug 2012 23:49:52 +0000"  >&lt;p&gt;Ah, I meant to say restart. &lt;/p&gt;

&lt;p&gt;Somehow the restart after shutdown does not behave the same way as the initial start. I will do more debugging tomorrow.&lt;/p&gt;
</comment>
                            <comment id="42629" author="jaylan" created="Thu, 2 Aug 2012 21:17:33 +0000"  >&lt;p&gt;I put in printk to lprocfs_seq_create():&lt;/p&gt;

&lt;p&gt;int lprocfs_seq_create(cfs_proc_dir_entry_t *parent, char *name, mode_t mode,&lt;br/&gt;
                       struct file_operations *seq_fops, void *data)&lt;br/&gt;
{&lt;br/&gt;
        struct proc_dir_entry *entry;&lt;br/&gt;
        ENTRY;&lt;/p&gt;

&lt;p&gt;        LPROCFS_WRITE_ENTRY();&lt;br/&gt;
        entry = create_proc_entry(name, mode, parent);&lt;br/&gt;
        if (entry) &lt;/p&gt;
{
                entry-&amp;gt;proc_fops = seq_fops;
                entry-&amp;gt;data = data;
        }
&lt;p&gt;        LPROCFS_WRITE_EXIT();&lt;/p&gt;

&lt;p&gt;        if (entry == NULL) &lt;/p&gt;
{
                printk(&quot;lprocfs_seq_create: failed to create %s\n&quot;, name);
                RETURN(-ENOMEM);
        }
&lt;p&gt; else&lt;br/&gt;
                printk(&quot;lprocfs_seq_create: successfully created %s\n&quot;, name);&lt;/p&gt;

&lt;p&gt;        RETURN(0);&lt;br/&gt;
}&lt;/p&gt;

&lt;p&gt;And the syslog showed:&lt;br/&gt;
Aug  2 15:42:32 kern:info:service337 Lustre: Lustre: Build Version: 2.1nasS_ofed154&lt;br/&gt;
Aug  2 15:42:32 kern:warning:service337 class_procfs_init: registering /proc/fs/lustre&lt;br/&gt;
Aug  2 15:42:32 kern:warning:service337 lprocfs_seq_create: successfully created devices&lt;br/&gt;
Aug  2 15:42:32 kern:warning:service337 class_procfs_init: successfully created lustre/devices&lt;/p&gt;

&lt;p&gt;So the lprocfs_seq_create() thought the /proc/fs/devices was created successfully. Yet, `ls /proc/fs/lustre` returned empty. This is weird.&lt;/p&gt;

&lt;p&gt;I will continue look into this.&lt;/p&gt;</comment>
                            <comment id="42630" author="jaylan" created="Thu, 2 Aug 2012 21:36:29 +0000"  >&lt;p&gt;Oh, in addition to the changes to lproc_seq_create(), I also made changes to class_procfs_init(). Without this, the syslog output I quoted above did not make sense.&lt;/p&gt;


&lt;p&gt;diff -&lt;del&gt;git a/lustre/obdclass/linux/linux-module.c b/lustre/obdclass/linux/linux&lt;/del&gt;&lt;br/&gt;
index 06cda1f..05b0390 100644&lt;br/&gt;
&amp;#8212; a/lustre/obdclass/linux/linux-module.c&lt;br/&gt;
+++ b/lustre/obdclass/linux/linux-module.c&lt;br/&gt;
@@ -421,13 +421,17 @@ int class_procfs_init(void)&lt;br/&gt;
         int rc;&lt;br/&gt;
         ENTRY;&lt;/p&gt;

&lt;p&gt;+        printk(&quot;class_procfs_init: registering /proc/fs/lustre\n&quot;);&lt;br/&gt;
         obd_sysctl_init();&lt;br/&gt;
         proc_lustre_root = lprocfs_register(&quot;fs/lustre&quot;, NULL,&lt;br/&gt;
                                             lprocfs_base, NULL);&lt;br/&gt;
         rc = lprocfs_seq_create(proc_lustre_root, &quot;devices&quot;, 0444,&lt;br/&gt;
                                 &amp;amp;obd_device_list_fops, NULL);&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;if (rc)&lt;br/&gt;
+        if (rc) 
{
+                printk(&quot;class_procfs_init: Failed to add lustre/devices, rc=%d\
                 CERROR(&quot;error adding /proc/fs/lustre/devices file\n&quot;);
+        }
&lt;p&gt; else&lt;br/&gt;
+                printk(&quot;class_procfs_init: successfully created lustre/devices\&lt;br/&gt;
 #else&lt;br/&gt;
         ENTRY;&lt;br/&gt;
 #endif&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="42693" author="jaylan" created="Fri, 3 Aug 2012 20:45:12 +0000"  >&lt;p&gt;In the test case, when /proc/fs/lustre appeared to have been removed, it actually not.&lt;/p&gt;

&lt;p&gt;service337 /proc/fs # ls&lt;br/&gt;
fscache  jbd2  nfs  nfsd  nfsfs&lt;br/&gt;
service337 /proc/fs # ls -lid lustre&lt;br/&gt;
4026532497 dr-xr-xr-x 17 root root 0 Aug  3 16:30 lustre&lt;/p&gt;

&lt;p&gt;You need to use &apos;ls -lid&apos; to see it. So when next time we restart mds, another /proc/fs/lustre was created (with a different inode number.) All other inodes were created succssfully under the new /proc/fs/lustre. Unfortunately the system can not see it. &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/sad.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;

&lt;p&gt;What caused the original /proc/fs/lustre to hang around? From lustre&apos;s perspective, it was completed:&lt;br/&gt;
Aug  3 16:40:27 kern:warning:service337 lprocfs_remove: removing lustre, parent=fs&lt;br/&gt;
Aug  3 16:40:27 kern:warning:service337 	about to remove lustre&lt;br/&gt;
Aug  3 16:40:27 kern:warning:service337 	removing devices from lustre&lt;br/&gt;
Aug  3 16:40:27 kern:warning:service337 	removing health_check from lustre&lt;br/&gt;
Aug  3 16:40:27 kern:warning:service337 	removing pinger from lustre&lt;br/&gt;
Aug  3 16:40:27 kern:warning:service337 	removing version from lustre&lt;br/&gt;
Aug  3 16:40:27 kern:warning:service337 	removing lustre from fs&lt;/p&gt;


&lt;p&gt;Systemtap into kernel showed:&lt;br/&gt;
remove_proc_entry: lustre/devices&lt;br/&gt;
free_proc_entry: devices, inode=4026532501&lt;br/&gt;
remove_proc_entry: lustre/health_check&lt;br/&gt;
free_proc_entry: health_check, inode=4026532500&lt;br/&gt;
remove_proc_entry: lustre/pinger&lt;br/&gt;
free_proc_entry: pinger, inode=4026532499&lt;br/&gt;
remove_proc_entry: lustre/version&lt;br/&gt;
remove_proc_entry: fs/lustre&lt;/p&gt;

&lt;p&gt;So, /proc/fs/lustre/version was not removed for some reason. Function remove_proc_entry() did not call free_proc_entry() in the case of lustre/version. Consequently, /proc/lustre can not be removed I suppose.&lt;/p&gt;

&lt;p&gt;service337 /proc/fs # ls -lid lustre/version&lt;br/&gt;
4026532498 &lt;del&gt;r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;&lt;del&gt;r&lt;/del&gt;- 1 root root 0 Aug  3 16:30 lustre/version&lt;/p&gt;


&lt;p&gt;Why? I will investigate more...&lt;/p&gt;</comment>
                            <comment id="42748" author="jaylan" created="Mon, 6 Aug 2012 13:37:57 +0000"  >&lt;p&gt;service337 /proc/fs # fuser lustre/version&lt;br/&gt;
lustre/version:       4937&lt;br/&gt;
service337 /proc/fs # ps -ef |grep 4937&lt;br/&gt;
root      4937  4933  0 Aug03 ?        00:00:01 perl /var/lib/pcp/pmdas/lustre/pmdalustre.pl&lt;br/&gt;
root     28872 28473  0 10:35 pts/0    00:00:00 grep 4937&lt;br/&gt;
service337 /proc/fs #&lt;/p&gt;

&lt;p&gt;Please close this ticket. The problem was caused by /proc/fs/lustre/version being used by a NASA admin script.&lt;/p&gt;</comment>
                            <comment id="42749" author="pjones" created="Mon, 6 Aug 2012 13:47:03 +0000"  >&lt;p&gt;ok thanks Jay!&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="11750" name="nas-config.sh.rhel62.212" size="1017" author="jaylan" created="Tue, 31 Jul 2012 13:40:53 +0000"/>
                            <attachment id="11751" name="nas-make.sh.rhel62.212" size="234" author="jaylan" created="Tue, 31 Jul 2012 13:40:53 +0000"/>
                            <attachment id="11738" name="sanity-quota.test_32.tgz" size="309185" author="jaylan" created="Fri, 27 Jul 2012 20:17:54 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvfmv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>6179</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>