<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:38:19 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-3949] umount is lazy and takes hours</title>
                <link>https://jira.whamcloud.com/browse/LU-3949</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;With lustre 2.1.4-5chaos, we are finding that clients are not honoring umount correctly.  The sysadmins are using the normal &quot;umount&quot; command with no additional options, and it returns relatively quickly.&lt;/p&gt;

&lt;p&gt;Linux no longer has a record of the mount in /proc/mounts after the command returns, and the mount point (/p/lscratchrza) appears to be empty.  However the directory clearly still has a reference and cannot be removed:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# rzzeus26 /p &amp;gt; ls -la lscratchrza
total 0
drwxr-xr-x 2 root root  40 Aug 13 11:24 .
drwxr-xr-x 4 root root 140 Aug 13 11:24 ..
# rzzeus26 /p &amp;gt; rmdir lscratchrza
rmdir: failed to remove `lscratchrza&apos;: Device or resource busy
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;When we look in /proc/fs/lustre it is clear that most, if not all, objects for this filesystem are still present in llite, osc, mdc, ldlm/namespace, etc.&lt;/p&gt;

&lt;p&gt;The sysadmins issued the &quot;umount /p/lscratchrza&quot; command at around 9:42am, but this message did not appear on one of the nodes until over five hours later:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2013-09-13 15:18:11 Lustre: Unmounted lsa-client
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So there appear to be at least two problems here&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;umount is taking far too long&lt;/li&gt;
	&lt;li&gt;umount for lustre is not blocking until umount is complete (it is exhibiting umount &quot;lazy&quot; behavior)&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;I should note that this lustre client node is mounting &lt;em&gt;two&lt;/em&gt; lustre filesystems, and only one was being umounted.  I don&apos;t know if it is significant yet, but the servers that we were trying to umount are running Lustre 2.1.4-5chaos with ldiskfs, and servers for the other filesystem are running Lustre 2.4.0-15chaos with zfs.&lt;/p&gt;

&lt;p&gt;I did not seem to be able to speed up the umount process by running the sync command, or &quot;echo 3 &amp;gt; /proc/sys/vm/drop_caches&quot;.&lt;/p&gt;

&lt;p&gt;I did a &quot;foreach bt&quot; under crash, but I don&apos;t see any processes that are obviously stuck sleeping in umount related call paths.&lt;/p&gt;

&lt;p&gt;Real user applications are running on the client nodes while the umounts are going on.  &quot;lsof&quot; does not list any usage under /p/lscratchrza (the filesystem that we are trying to unmount).&lt;/p&gt;
</description>
                <environment>Lustre 2.1.4-5chaos on client, Lustre 2.1.4-5chaos on ldiskfs servers, Lustre 2.4.0-15chaos on zfs servers</environment>
        <key id="20961">LU-3949</key>
            <summary>umount is lazy and takes hours</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="hongchao.zhang">Hongchao Zhang</assignee>
                                    <reporter username="morrone">Christopher Morrone</reporter>
                        <labels>
                            <label>llnl</label>
                    </labels>
                <created>Fri, 13 Sep 2013 23:46:12 +0000</created>
                <updated>Wed, 13 Oct 2021 03:14:18 +0000</updated>
                            <resolved>Wed, 13 Oct 2021 03:14:18 +0000</resolved>
                                    <version>Lustre 2.4.0</version>
                    <version>Lustre 2.1.4</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="66685" author="pjones" created="Sun, 15 Sep 2013 03:30:43 +0000"  >&lt;p&gt;Hongchao&lt;/p&gt;

&lt;p&gt;Could you please help with this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="66712" author="hongchao.zhang" created="Mon, 16 Sep 2013 10:51:07 +0000"  >&lt;p&gt;Hi Chris,&lt;/p&gt;

&lt;p&gt;Is there any process related to Lustre that was processing something in the output of &quot;foreach bt&quot;?&lt;br/&gt;
can you have any chance to get the debug log while waiting umount to complete?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;</comment>
                            <comment id="67307" author="morrone" created="Tue, 24 Sep 2013 00:46:38 +0000"  >&lt;p&gt;There are lustre processes listed, of course, but they all looked like the normal contingent of processes, waiting for something to work on.  The only one that might be interesting is this:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;PID: 5833   TASK: ffff8802cec68040  CPU: 3   COMMAND: &quot;flush-lustre-2&quot;
 #0 [ffff8802b33c7ce0] schedule at ffffffff8150d6e2
 #1 [ffff8802b33c7da8] schedule_timeout at ffffffff8150e522
 #2 [ffff8802b33c7e58] schedule_timeout_interruptible at ffffffff8150e6ce
 #3 [ffff8802b33c7e68] bdi_writeback_task at ffffffff811ad100
 #4 [ffff8802b33c7eb8] bdi_start_fn at ffffffff8113c686
 #5 [ffff8802b33c7ee8] kthread at ffffffff81096c76
 #6 [ffff8802b33c7f48] kernel_thread at ffffffff8100c0ca
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;There is certainly traffic in the lustre logs that I dumped, but nothing that appeared to be for the filesystem in question.  As far as I can tell it was all related to the other filesystem (which was in use by an application).&lt;/p&gt;</comment>
                            <comment id="67334" author="hongchao.zhang" created="Tue, 24 Sep 2013 11:16:41 +0000"  >&lt;p&gt;the mnt_count of the vfs_mount used by Lustre should be not dropped to zero during umount, and it could be used by some other process, for the log&lt;br/&gt;
&quot;2013-09-13 15:18:11 Lustre: Unmounted lsa-client&quot; will be outputted in ll_put_super (there is no other kernel thread to call).&lt;/p&gt;

&lt;p&gt;will try to reproduce it with Lustre-2.1.4-5chaos.&lt;/p&gt;</comment>
                            <comment id="67394" author="morrone" created="Tue, 24 Sep 2013 17:12:33 +0000"  >&lt;p&gt;Even without reproducing, I should think that we can figure out from the code why umount is exhibiting lazy behavior.&lt;/p&gt;

&lt;p&gt;Obviously the unmount is not complete until ll_put_super is called and Lustre prints that &quot;Unmounted lsa-client&quot; message.  However the umount command returns well before that.  Is that not a bug?&lt;/p&gt;</comment>
                            <comment id="67513" author="hongchao.zhang" created="Wed, 25 Sep 2013 10:32:21 +0000"  >&lt;p&gt;normally, lazy umount is only used for &quot;umount -l&quot; option (kernel uses MNT_DETACH to represent it), but this option is not found to use in Lustre (2.1.4-llnl),&lt;br/&gt;
and I tried with two mount and the operations &quot;umount /mnt/lustre &amp;amp;&amp;amp; rmdir /mnt/lustre&quot; is successful while there is a test (dd in a loop) running&lt;br/&gt;
in the other mount &quot;/mnt/lustre2&quot;, the kernel is RHEL6 (2.6.32-279.2.1).&lt;/p&gt;

&lt;p&gt;Is there any chance the &quot;-l&quot; option to be used implicitly by umount in your environment?&lt;/p&gt;

&lt;p&gt;Thanks!&lt;/p&gt;</comment>
                            <comment id="67568" author="morrone" created="Wed, 25 Sep 2013 17:12:42 +0000"  >&lt;p&gt;No, I don&apos;t see anything to suggest an implicit use of -l.  But I&apos;m not sure what I should look for that could cause an &quot;implicit&quot; use.  I don&apos;t see -l used.&lt;/p&gt;

&lt;p&gt;And just to be clear, I&apos;m not trying to imply that the code thinks it is in lazy mode.  I am just saying that the behavior is &lt;em&gt;similar&lt;/em&gt; to lazy mode.  In other words, Lustre allows umount to return before the client really disconnects from the servers and removes the final references on the mount point.  It certainly is allowing umount to return before ll_put_super is complete.  I suspect that umount returns before ll_put_super is even called, but I don&apos;t have hard evidence for that.&lt;/p&gt;</comment>
                            <comment id="67697" author="hongchao.zhang" created="Thu, 26 Sep 2013 15:01:20 +0000"  >&lt;p&gt;the umount should return before ll_put_super is called for the &quot;Unmounted lsa-client&quot; message was printed long time after the umount command.&lt;br/&gt;
Lustre will disconnect from the servers in ll_put_super, and it should be the kernel responsibility to call put_super operation of specific&lt;br/&gt;
filesystem when the vfsmount is going to be destroyed.&lt;/p&gt;

&lt;p&gt;Hi Chris, Does the issue occur again in your site?&lt;/p&gt;

&lt;p&gt;Thanks!&lt;/p&gt;</comment>
                            <comment id="67770" author="morrone" created="Thu, 26 Sep 2013 21:14:23 +0000"  >&lt;p&gt;I think that the kernel pretty clearly does not wait until ll_put_super is complete before allowing whatever syscall umount is using to return.  That implies to me that lustre is doing the &lt;em&gt;wrong thing&lt;/em&gt;.  The Lustre client needs to disconnect from the servers &lt;em&gt;before&lt;/em&gt; umount returns.&lt;/p&gt;

&lt;p&gt;I have high confidence that we will see it again on thousands of nodes then next time we use umount.&lt;/p&gt;</comment>
                            <comment id="67771" author="morrone" created="Thu, 26 Sep 2013 21:27:20 +0000"  >&lt;p&gt;Maybe I should clarify my &quot;disconnect from the servers&quot; comment.  I mean things like the imports and other components associated with that filesystem should be cleanly shut down before umount returns.  I don&apos;t expect the lnet connections to necessarily be shut down.&lt;/p&gt;</comment>
                            <comment id="67906" author="morrone" created="Sat, 28 Sep 2013 01:20:29 +0000"  >&lt;p&gt;Apparently there isn&apos;t much in the way of explicit callbacks into Lustre at umount time.  The callbacks mostly trigger when sufficient references are dropped.  There is a call path that looks like this:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;umount
mntput_no_Expire
__mntput
deactivate_super
fs-&amp;gt;kill_sb
lustre_kill_super
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;deactivate_super() also calls put_super() after the call to fs-&amp;gt;kill_sb(), which should in theory kick off ll_put_super() if the super_block s_count is decremented to zero.&lt;/p&gt;

&lt;p&gt;So that leaves us to wonder what part of this failed...&lt;/p&gt;

&lt;p&gt;I think it seems most likely that the super_block s_count was higher than 1 when __put_super() was called, but it is not clear to me what else would be holding the reference.&lt;/p&gt;</comment>
                            <comment id="67957" author="hongchao.zhang" created="Mon, 30 Sep 2013 16:04:15 +0000"  >&lt;p&gt;the most possible case should be the super_block.s_active is higer than 1&lt;br/&gt;
1, the umount returns successfully (without -l option), then the vfsmnt.mnt_count will be dropped to zero in &quot;do_umount&quot;, which triggers deactivate_super&lt;br/&gt;
2, if the super_block.s_active is dropped to zero, then &quot;lustre_kill_super&quot; will be called, which will call &quot;kill_anon_super&quot;&lt;br/&gt;
3, &quot;kill_anon_super&quot; will call generic_shutdown_super, which calls &quot;ll_put_super&quot;&lt;/p&gt;

&lt;p&gt;the super_block.s_active will be increased in &quot;sget&quot;, &quot;get_active_super&quot;, &quot;copy_tree&quot; and &quot;do_loopback&quot;.&lt;br/&gt;
&quot;sget&quot; is mainly called during mount&lt;br/&gt;
&quot;get_active_super&quot; is called by &quot;freeze_bdev&quot;&lt;br/&gt;
&quot;copy_tree&quot; and &quot;do_loopback&quot; will be called when mount with --bind option&lt;/p&gt;

&lt;p&gt;btw, no &quot;sget&quot; and &quot;get_active_super&quot; call is found in Lustre.&lt;/p&gt;</comment>
                            <comment id="76014" author="morrone" created="Fri, 31 Jan 2014 23:51:49 +0000"  >&lt;p&gt;Still a problem.&lt;/p&gt;</comment>
                            <comment id="77214" author="hongchao.zhang" created="Tue, 18 Feb 2014 03:14:56 +0000"  >&lt;p&gt;status update:&lt;br/&gt;
I have investigated the kernel/Lustre, particularly the mount part, but didn&apos;t find anything that is interesting.&lt;/p&gt;

&lt;p&gt;does the issue occur again recently? and on which kernel versions?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;</comment>
                            <comment id="77247" author="hongchao.zhang" created="Tue, 18 Feb 2014 14:55:33 +0000"  >&lt;p&gt;how about printing the debug info (then it will not be overwritten by the debug logs of the other Lustre mount) in the two functions related to umount&lt;br/&gt;
(ll_umount_begin, ll_put_super) to check whether it is Lustre that costs so much time during umount.&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;diff --git a/lustre/llite/llite_lib.c b/lustre/llite/llite_lib.c
index aff4ccb..5787f5d 100644
--- a/lustre/llite/llite_lib.c
+++ b/lustre/llite/llite_lib.c
@@ -1138,7 +1138,7 @@ void ll_put_super(struct super_block *sb)
        &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; ccc_count, next, force = 1, rc = 0;
         ENTRY;
 
-        CDEBUG(D_VFSTRACE, &lt;span class=&quot;code-quote&quot;&gt;&quot;VFS Op: sb %p - %s\n&quot;&lt;/span&gt;, sb, profilenm);
+       LCONSOLE_WARN(&lt;span class=&quot;code-quote&quot;&gt;&quot;VFS Op: sb %p - %s\n&quot;&lt;/span&gt;, sb, profilenm);
 
         ll_print_capa_stat(sbi);
 
@@ -2116,8 +2116,8 @@ void ll_umount_begin(struct super_block *sb)
         }
 #endif
 
-        CDEBUG(D_VFSTRACE, &lt;span class=&quot;code-quote&quot;&gt;&quot;VFS Op: superblock %p count %d active %d\n&quot;&lt;/span&gt;, sb,
-               sb-&amp;gt;s_count, atomic_read(&amp;amp;sb-&amp;gt;s_active));
+       LCONSOLE_WARN(&lt;span class=&quot;code-quote&quot;&gt;&quot;VFS Op: superblock %p count %d active %d\n&quot;&lt;/span&gt;, sb,
+                     sb-&amp;gt;s_count, atomic_read(&amp;amp;sb-&amp;gt;s_active));
 
         obd = class_exp2obd(sbi-&amp;gt;ll_md_exp);
         &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (obd == NULL) {
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;btw, did you see any these debug info in the previous debug logs when the issue occurred.&lt;/p&gt;</comment>
                            <comment id="77710" author="hongchao.zhang" created="Mon, 24 Feb 2014 13:30:46 +0000"  >&lt;p&gt;I have tested it with by the following script,&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
#!/bin/bash

&lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; [ $# != 1 ]; then
    echo &lt;span class=&quot;code-quote&quot;&gt;&quot;Usage: $(basename $0) Server_IP&quot;&lt;/span&gt;
    exit 0
fi

ADDR=$1

mkdir -p /mnt/lustre_test || (echo &lt;span class=&quot;code-quote&quot;&gt;&quot;can&apos;t mkdir /mnt/lustre_test \n&quot;&lt;/span&gt; &amp;amp;&amp;amp; exit)

&lt;span class=&quot;code-keyword&quot;&gt;while&lt;/span&gt; [ TRUE ];
&lt;span class=&quot;code-keyword&quot;&gt;do&lt;/span&gt;
    mkdir -p /mnt/lustre_test || exit 1
    mount -t lustre $ADDR@tcp:/lustre /mnt/lustre_test || exit 2
    echo -e &lt;span class=&quot;code-quote&quot;&gt;&quot;$(date) Lustre mounted&quot;&lt;/span&gt;

    dd &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt;=/dev/zero of=/mnt/lustre_test/dd.file bs=1024 count=10000 || exit 3

    umount /mnt/lustre_test || exit 4
    rmdir /mnt/lustre_test || exit 5
    echo -e &lt;span class=&quot;code-quote&quot;&gt;&quot;$(date) Lustre umounted&quot;&lt;/span&gt;

    echo -e &lt;span class=&quot;code-quote&quot;&gt;&quot;\n\n&quot;&lt;/span&gt;
    sleep 1
done
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;there is another Lustre mount (test with same server and different server) on /mnt/lustre, which continues running some loads (say, tar, dd).&lt;br/&gt;
there is no umount problem for long time.&lt;/p&gt;</comment>
                            <comment id="77763" author="cliffw" created="Mon, 24 Feb 2014 22:39:55 +0000"  >&lt;p&gt;On Hyperion (12 OSS, 36 OSTs) we normally remount between test passes, have not noticed this issue. I am currently running the mount test script on some idle clients, will report if errors. &lt;/p&gt;</comment>
                            <comment id="77828" author="morrone" created="Tue, 25 Feb 2014 16:49:20 +0000"  >&lt;p&gt;In patch &lt;a href=&quot;http://review.whamcloud.com/6392&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;6392&lt;/a&gt; from &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3270&quot; title=&quot;ptlrpcd strnlen crash trying to log a message&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3270&quot;&gt;&lt;del&gt;LU-3270&lt;/del&gt;&lt;/a&gt;, Lai is adding a wait for running statahead threads to ll_kill_super().  Perhaps that relates to the issues in this ticket?&lt;/p&gt;</comment>
                            <comment id="78225" author="hongchao.zhang" created="Mon, 3 Mar 2014 15:20:16 +0000"  >&lt;p&gt;yes, it could be related to this issue, and there are also some &quot;mntget&quot; calls in &quot;push_ctxt&quot; (lustre/lvfs/lvfs_linux.c) and&lt;br/&gt;
&quot;llog_lvfs_destroy&quot; (lustre/obdclass/llog_lvfs.c), which could cause lazy umount?&lt;/p&gt;

&lt;p&gt;will create a debug to verify it.&lt;/p&gt;</comment>
                            <comment id="78337" author="hongchao.zhang" created="Tue, 4 Mar 2014 15:07:01 +0000"  >&lt;p&gt;status update:&lt;br/&gt;
the debug patch is still being tested.&lt;/p&gt;</comment>
                            <comment id="78466" author="hongchao.zhang" created="Wed, 5 Mar 2014 15:29:25 +0000"  >&lt;p&gt;the debug patch against b2_1 is tracked at &lt;a href=&quot;http://review.whamcloud.com/#/c/9502/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/9502/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="81011" author="morrone" created="Fri, 4 Apr 2014 00:26:09 +0000"  >&lt;p&gt;FYI, we are no longer using 2.1 in production.  We have completely migrated to 2.4.&lt;/p&gt;</comment>
                            <comment id="81043" author="hongchao.zhang" created="Fri, 4 Apr 2014 14:57:11 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;Did this issue occur again in 2.4?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;</comment>
                            <comment id="81076" author="morrone" created="Fri, 4 Apr 2014 18:39:11 +0000"  >&lt;p&gt;Yes, it is still a problem in 2.4.&lt;/p&gt;</comment>
                            <comment id="81937" author="hongchao.zhang" created="Fri, 18 Apr 2014 13:32:22 +0000"  >&lt;p&gt;the debug patch against b2_4 is tracked at &lt;a href=&quot;http://review.whamcloud.com/#/c/10012/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/10012/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="83135" author="hongchao.zhang" created="Sun, 4 May 2014 11:22:24 +0000"  >&lt;p&gt;Hi, &lt;/p&gt;

&lt;p&gt;Did you try the debug patch?  Did the issue occur again and are there any new logs generated by this debug patch?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;</comment>
                            <comment id="88710" author="cliffw" created="Thu, 10 Jul 2014 15:50:47 +0000"  >&lt;p&gt;I ran the umount script on a single client while running SWL on a different mount point. I was also mounting and starting other clients.&lt;br/&gt;
Within 10 minutes, client failed to delete the mount point, error was &apos;Device or resource busy&apos; Attached are console log and lctl dk for the client. &lt;/p&gt;</comment>
                            <comment id="88807" author="hongchao.zhang" created="Fri, 11 Jul 2014 11:10:48 +0000"  >&lt;p&gt;Hi, &lt;/p&gt;

&lt;p&gt;Do you run mount &amp;amp; umount in a loop? and as per the console log, the client has been mounted&amp;amp;umounted successfully for 5 times.&lt;br/&gt;
which mount point (or times) gave the error &quot;Device or resource busy&quot;?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;</comment>
                            <comment id="88830" author="cliffw" created="Fri, 11 Jul 2014 14:39:39 +0000"  >&lt;p&gt;I used your script from this bug.  The final mount was /mnt/lustre_test, and the script exits upon failure, so the final mount was the error.&lt;/p&gt;</comment>
                            <comment id="89005" author="hongchao.zhang" created="Tue, 15 Jul 2014 04:37:09 +0000"  >&lt;p&gt;the kill_super/put_super triggered in the first 4 umount is by the &quot;sys_umount&quot; syscall, and the final one&lt;br/&gt;
(which caused the &quot;Device or resource busy&quot;) is caused by the exit of process (the syscall is sys_exit_group,&lt;br/&gt;
PID is 126735, comm is &quot;slurmstepd&quot;), and it is very likely to cause this issue. &lt;/p&gt;

&lt;p&gt;Hi Cliff, could you please check what is the process &quot;slurmstepd&quot; in the client? and how the process get&lt;br/&gt;
the namespace of the newly mounted Lustre client, thanks!&lt;/p&gt;

&lt;p&gt;It could be the same issue in LLNL (some background process hold an extra reference of the namespace of&lt;br/&gt;
the Lustre client which was umounted explicitly by &quot;umount&quot; command).  &lt;/p&gt;</comment>
                            <comment id="89679" author="cliffw" created="Mon, 21 Jul 2014 21:22:08 +0000"  >&lt;p&gt;Slurmstepd is an LLNL product.&lt;br/&gt;
&quot; slurmstepd  is  a  job  step  manager  for SLURM.  It is spawned by the&lt;br/&gt;
       slurmd daemon when a job step is launched and terminates when  the  job&lt;br/&gt;
       step  does.   It  is  responsible for managing input and output (stdin,&lt;br/&gt;
       stdout and stderr) for the job step along with its accounting and  sig-&lt;br/&gt;
       nal  processing.  slurmstepd should not be initiated by users or system&lt;br/&gt;
       administrators.&lt;br/&gt;
&quot;&lt;/p&gt;</comment>
                            <comment id="91035" author="green" created="Thu, 7 Aug 2014 05:09:02 +0000"  >&lt;p&gt;Afteer rereading everything in the ticket here again and re-examining the in-kernel code, I think HongChao is right here.&lt;/p&gt;

&lt;p&gt;At least for the test Cliff has performed it is quite clear that slurmstepd has a private namespace that has an extra reference to lustre fs being unmounted by the admin.&lt;br/&gt;
It is caught red-handed with that release of the last reference to the namespace that triggers final mnt put.&lt;br/&gt;
Now - how did it end up having the namespace I am not really sure, my examination of the code seems to imply that the most easy way is to do a clone with CLONE_NEWNS, but I got the copy of the code and there&apos;s nothing like that unless  you have some really strange pthreads library on your system. Another easy way is when unshare() is called - I do not see any calls in your code, though.&lt;br/&gt;
I see there&apos;s cgroups used that also plays with namespaces that might influence this, but I am not really sure about that.&lt;/p&gt;

&lt;p&gt;As such, there&apos;s really nothing Lustre can do here until this additional namespace goes away, in fact Lustre is not even called to try and perform an actual unmount until this happens (Note, this is very different from a task having a file opened on Lustre or CWD there, because then umount itself  will fail with EBUSY).&lt;/p&gt;

&lt;p&gt;This is somewhat similar to the case when you do mount --bind /mnt/lustre /mnt/lustre2 &amp;#8211; after this you can do umount /mnt/lustre, but because there&apos;s still another reference - /mnt/lustre2 - the lustre is not going to be unmounted (though in this case you would be able to delete /mnt/lustre at least unlike this discussed case).&lt;/p&gt;

&lt;p&gt;So to summarize. In my view this is not a Lustre bug at all, Lustre has no control about other namespaces referencing it.&lt;br/&gt;
For the next step it would be great to understand how this new namespace is really getting created. I tried to make a simple proof of concept, but have not figured out how to create this additional namespace that would reference Lustre.&lt;br/&gt;
I imagine you cannot just kill all slurmstepd processes on the node as the workaround as this kills the job it happens to be running and that&apos;s not really acceptable. I imagine you have a lot of logging going somewhere that would let you correlate job termination times on the affected nodes with the lustre &quot;Unmounted&quot; message, though.&lt;br/&gt;
This is assuming there are really no any stray slurmstepd processes (do you have any easy way to check?) that might be hogging this separate namespace needlessly long after the job using it has terminated.&lt;br/&gt;
I suspect when the problem occurs cat /proc/mounts might show something useful.&lt;/p&gt;

&lt;p&gt;Cliff - can you please run slurm and slurmstepd under strace somehow and see if clone is called with CLONE_NEWNS anywhere? CLONE_NEWNS is designed to be exactly like we see here - to separate parent namespace from child, so that they can do further mounts and unmounts without affecting each other.&lt;/p&gt;</comment>
                            <comment id="91157" author="green" created="Fri, 8 Aug 2014 00:42:51 +0000"  >&lt;p&gt;Here&apos;s a simple reproducer for the CLONE_NEWNS behavior that matches what you are seeing I believe (it certainly does for me):&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;#include &amp;lt;stdio.h&amp;gt;
#include &amp;lt;stdlib.h&amp;gt;
#include &amp;lt;unistd.h&amp;gt;
#include &amp;lt;linux/sched.h&amp;gt;
#include &amp;lt;sched.h&amp;gt;
#include &amp;lt;signal.h&amp;gt;

void child_fn(void *ptr)
{
	sleep(100);
	return;
}

void main(void)
{
	long *stack=malloc(10000 * sizeof(stack));
	int rc;

	rc = clone(child_fn, stack+9999, CLONE_NEWNS | SIGCHLD, NULL);
	if (rc == -1)
		perror(&quot;Cannont clone&quot;);

	exit(0);
}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Just compile and run it, it&apos;ll spawn another thread that&apos;ll hang there for 100 seconds. During these 100 seconds try to unmount a lustre filesystem an you&apos;ll see that while mount exits right away with success and the filesystem is gone from /proc/mounts, the &quot;Unmounted&quot; lustre message is not printed.&lt;br/&gt;
I experimented a little and umount -f doe not help too much. It does mark the import invalid forcefully, but servers are never informed of that and moreover, does not even get any messages, so they end up evicting the client once ping timeout hits:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[383407.089403] Lustre: Mounted lustre-client
[383419.633431] Lustre: setting import lustre-MDT0000_UUID INACTIVE by administrator request
[383465.064335] Lustre: lustre-OST0000: haven&apos;t heard from client 9ecf059c-5c43-49c9-61c1-7df1b424d51d (at 0@lo) in 53 seconds. I think it&apos;s dead, and I am evicting it. exp ffff8800b00f7bf0, cur 1407457950 expire 1407457920 last 1407457897
[383465.084392] LustreError: 4666:0:(ofd_grant.c:183:ofd_grant_sanity_check()) ofd_statfs: tot_granted 281728 != fo_tot_granted 2378880
[383467.052644] Lustre: lustre-MDT0000: haven&apos;t heard from client 9ecf059c-5c43-49c9-61c1-7df1b424d51d (at 0@lo) in 55 seconds. I think it&apos;s dead, and I am evicting it. exp ffff88008dd3abf0, cur 1407457952 expire 1407457922 last 1407457897
[383467.056074] Lustre: Skipped 1 previous similar message
[383478.941396] Lustre: Unmounted lustre-client
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Granted, the entire idea behind umount -f was to just cut all communications, so we probably cannot easily turn around and change the semantics to &quot;tell the server we are gone too&quot;.&lt;/p&gt;</comment>
                            <comment id="91274" author="cliffw" created="Mon, 11 Aug 2014 14:18:30 +0000"  >&lt;p&gt;I straced while running SWL. Did not find any use of CLONE_NEWNS&lt;/p&gt;</comment>
                            <comment id="91333" author="morrone" created="Mon, 11 Aug 2014 20:21:58 +0000"  >&lt;blockquote&gt;&lt;p&gt;I straced while running SWL&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Could you please explain exactly what you did?&lt;/p&gt;</comment>
                            <comment id="91344" author="cliffw" created="Mon, 11 Aug 2014 22:14:07 +0000"  >&lt;p&gt;While running SWL, on multiple clients I straced slurmd, like this:&lt;br/&gt;
SPID=slurmd pid&lt;/p&gt;

&lt;p&gt;strace -ff -p $SPID -e trace=clone -o $DIR/$FILE&lt;/p&gt;

&lt;p&gt;Then I grepped through the results for &apos;flags=&apos; , sorted for unique, and looked for CLONE_NEWNS. &lt;/p&gt;</comment>
                            <comment id="91383" author="cliffw" created="Tue, 12 Aug 2014 14:30:55 +0000"  >&lt;p&gt;Running the new test patch, I see this backtrace repeatedly&lt;br/&gt;
2014-08-12 00:13:47 Call Trace:&lt;br/&gt;
2014-08-12 00:13:47  &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff810a06dc&amp;gt;&amp;#93;&lt;/span&gt; ? create_new_namespaces+0x3c/0x1b0&lt;br/&gt;
2014-08-12 00:13:47  &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff810a0ae6&amp;gt;&amp;#93;&lt;/span&gt; ? unshare_nsproxy_namespaces+0x76/0xd0&lt;br/&gt;
2014-08-12 00:13:47  &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8106e7df&amp;gt;&amp;#93;&lt;/span&gt; ? sys_unshare+0x12f/0x2d0&lt;br/&gt;
2014-08-12 00:13:47  &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8100b072&amp;gt;&amp;#93;&lt;/span&gt; ? system_call_fastpath+0x16/0x1b&lt;br/&gt;
2014-08-12 00:13:54 New namespace created&lt;/p&gt;</comment>
                            <comment id="91384" author="green" created="Tue, 12 Aug 2014 14:36:12 +0000"  >&lt;p&gt;So we clearly see the slurmd or one of it&apos;s children call unshare syscall that clones namespace and prevents all fulesystems from unmounting, not just lustre, for the life of the cloned namespace (that ends when the child does).&lt;/p&gt;

&lt;p&gt;What&apos;s strange is that I do not see any unshare calls in the slurmd source I downloaded from github, so I wonder if this is coming from some of the libraries in use at LLNL.&lt;/p&gt;

&lt;p&gt;In any case there&apos;s absolutely nothing we (on the lustre side) can do here to speed up unmounting it appears. The userspace asked the kernel not to allow any real unmounts after all.&lt;/p&gt;</comment>
                            <comment id="91390" author="cliffw" created="Tue, 12 Aug 2014 15:04:48 +0000"  >&lt;p&gt;I then straced slurmd, with -e trace=unshare, and see this result:&lt;br/&gt;
root     110934      1  0 07:53 ?        00:00:00 slurmstepd: &lt;span class=&quot;error&quot;&gt;&amp;#91;1742267.0&amp;#93;&lt;/span&gt;&lt;br/&gt;
strace of that gives&lt;br/&gt;
unshare(CLONE_NEWNS|0x8000000)&lt;/p&gt;

&lt;p&gt;I see the same call in several other traces.&lt;/p&gt;</comment>
                            <comment id="91393" author="cliffw" created="Tue, 12 Aug 2014 15:06:24 +0000"  >&lt;p&gt;Sorry, previous comment was lost.&lt;br/&gt;
Using Oleg&apos;s patch, we see this backtrace:&lt;br/&gt;
2014-08-12 07:52:20 Pid: 109868, comm: slurmstepd Tainted: P           ---------------    2.6.32-431.23.3.el6_lustre.g549407e.x86_64 #1&lt;br/&gt;
2014-08-12 07:52:20 Call Trace:&lt;br/&gt;
2014-08-12 07:52:20  &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff810a06dc&amp;gt;&amp;#93;&lt;/span&gt; ? create_new_namespaces+0x3c/0x1b0&lt;br/&gt;
2014-08-12 07:52:20  &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff810a0ae6&amp;gt;&amp;#93;&lt;/span&gt; ? unshare_nsproxy_namespaces+0x76/0xd0&lt;br/&gt;
2014-08-12 07:52:20  &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8106e7df&amp;gt;&amp;#93;&lt;/span&gt; ? sys_unshare+0x12f/0x2d0&lt;br/&gt;
2014-08-12 07:52:20  &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffff8100b288&amp;gt;&amp;#93;&lt;/span&gt; ? tracesys+0xd9/0xde&lt;br/&gt;
2014-08-12 07:52:53 New namespace created&lt;br/&gt;
2014-08-12 07:52:53 Pid: 110584, comm: slurmstepd Tainted: P           ---------------    2.6.32-431.23.3.el6_lustre.g549407e.x86_64 #1&lt;/p&gt;</comment>
                            <comment id="91945" author="cliffw" created="Tue, 19 Aug 2014 15:56:48 +0000"  >&lt;p&gt;Slurm plugins on Hyperion (From /etc/slurm/plugstack.conf)&lt;br/&gt;
 use-env.so&lt;br/&gt;
 auto-affinity.so&lt;br/&gt;
 io-watchdog.so&lt;br/&gt;
 renice.so &lt;br/&gt;
lua.so&lt;/p&gt;

&lt;p&gt;I am not sure if this is the LLNL version , but I do see this in the use-env source:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt; /* Unshare file namespace.  This means only &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; process and its children
     * will see the following mounts, and when &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; process and its children
     * terminate, the mounts go away automatically.
     */
    &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (unshare (CLONE_NEWNS) &amp;lt; 0) {
        slurm_error (&lt;span class=&quot;code-quote&quot;&gt;&quot;unshare CLONE_NEWNS: %m&quot;&lt;/span&gt;);
        &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; (-1);
    }
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="92092" author="morrone" created="Thu, 21 Aug 2014 00:11:12 +0000"  >&lt;blockquote&gt;&lt;p&gt;but I do see this in the use-env source&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Are you sure about that?  It looks to me like that code is from the iorelay plugin, not use-env.  The other place mentioning CLONE_NEWNS is the private_mount plugin.&lt;/p&gt;

&lt;p&gt;I got my sources from:&lt;/p&gt;

&lt;p&gt;  &lt;a href=&quot;https://code.google.com/p/slurm-spank-plugins/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://code.google.com/p/slurm-spank-plugins/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="92094" author="morrone" created="Thu, 21 Aug 2014 00:24:31 +0000"  >&lt;p&gt;Perhaps /etc/slurm/lua.d/ns.lua is more likely.&lt;/p&gt;</comment>
                            <comment id="92239" author="morrone" created="Fri, 22 Aug 2014 17:32:10 +0000"  >&lt;p&gt;I am thinking that ns.lua is the likely culprit.  I&apos;ll have to get someone at LLNL to look into why we are doing that, and how to do it better.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="15363" name="iwc175.console.txt" size="10174" author="cliffw" created="Thu, 10 Jul 2014 15:54:17 +0000"/>
                            <attachment id="15362" name="iwc175.dump.txt.gz" size="5921989" author="cliffw" created="Thu, 10 Jul 2014 15:54:17 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10490" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>End date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Fri, 22 Aug 2014 23:46:12 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzw2h3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>10483</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10493" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>Start date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Fri, 13 Sep 2013 23:46:12 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    </customfields>
    </item>
</channel>
</rss>