<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:12:08 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-14713] Process hung with waiting for mmap_sem</title>
                <link>https://jira.whamcloud.com/browse/LU-14713</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Write and Truncate IO will serialized on &lt;tt&gt;ll_trunc_sem::ll_trunc_{readers|waiters&lt;/tt&gt;}, if one process quit abruptly (be killed), the other will keep waiting for the semaphore (task state be set as &lt;tt&gt;TASK_INTERRUPTIBLE&lt;/tt&gt;):&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt; INFO: task a.out:109684 blocked for more than 120 seconds.
      Tainted: G          IOE    --------- -  - 4.18.0-240.15.1.el8_3.x86_64 #1
 &quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.
 Call Trace:
 __schedule+0x2a6/0x700
 schedule+0x38/0xa0
 trunc_sem_down_read+0xa6/0xb0 [lustre]
 vvp_io_write_start+0x107/0xb80 [lustre]
 cl_io_start+0x59/0x110 [obdclass]
 cl_io_loop+0x9a/0x1e0 [obdclass]
 ll_file_io_generic+0x380/0xb10 [lustre]
 ll_file_write_iter+0x136/0x5a0 [lustre]
 new_sync_write+0x124/0x170
 vfs_write+0xa5/0x1a0
 ksys_write+0x4f/0xb0
 do_syscall_64+0x5b/0x1a0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment></environment>
        <key id="64416">LU-14713</key>
            <summary>Process hung with waiting for mmap_sem</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="bobijam">Zhenyu Xu</assignee>
                                    <reporter username="bobijam">Zhenyu Xu</reporter>
                        <labels>
                    </labels>
                <created>Thu, 27 May 2021 03:49:01 +0000</created>
                <updated>Wed, 8 Jun 2022 15:43:31 +0000</updated>
                            <resolved>Sat, 20 Nov 2021 14:42:48 +0000</resolved>
                                    <version>Lustre 2.12.7</version>
                    <version>Lustre 2.12.8</version>
                                    <fixVersion>Lustre 2.15.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>11</watches>
                                                                            <comments>
                            <comment id="302757" author="adilger" created="Thu, 27 May 2021 06:19:40 +0000"  >&lt;p&gt;Making the wait interruptible is good.  It would also be good if the interrupted process woke up any waiters for the lock, so that they do not wait forever.&lt;/p&gt;</comment>
                            <comment id="302774" author="gerrit" created="Thu, 27 May 2021 09:00:53 +0000"  >&lt;p&gt;Bobi Jam (bobijam@hotmail.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/43844&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/43844&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14713&quot; title=&quot;Process hung with waiting for mmap_sem&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14713&quot;&gt;&lt;del&gt;LU-14713&lt;/del&gt;&lt;/a&gt; llite: make trunc_sem_down_xxx waiting killable&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: e335dad76a6edc7ede7a45b4ed60098dfa2c356f&lt;/p&gt;</comment>
                            <comment id="302932" author="neilb" created="Fri, 28 May 2021 05:46:25 +0000"  >&lt;p&gt;Thanks for the stack trace.&lt;/p&gt;

&lt;p&gt;I think there must have been a least one other process blocked on a mutex or semphore.&#160; Was anything else in the logs?&lt;/p&gt;

&lt;p&gt;trunc_sem_down_read() can only block in two circumstances: 1/ ll_trunc_readers is negative so a write lock it held, 2/ ll_waiters is non-zero so some process is waiting for a write lock.&lt;/p&gt;

&lt;p&gt;In case 2, that process will be blocked on trunc_sem_down_write() and should get a soft-lockup warning.&lt;/p&gt;

&lt;p&gt;In case 1, the writer must have called vvp_io_setattr_start and either&lt;/p&gt;

&lt;p&gt;A/ not yet called vvp_io_setattr_end(), or&lt;/p&gt;

&lt;p&gt;B/ changed io-&amp;gt;ci_type or cio-&amp;gt;u.ci_setattr.sa_subtype so taht when vvp_io_setattr_end() was called it didn&apos;t release the lock.&lt;/p&gt;

&lt;p&gt;B seems extremely unlikely.&lt;/p&gt;

&lt;p&gt;A implies that the process is still waiting as _end is &lt;b&gt;always&lt;/b&gt; called after _start before the process can exit or return to user space.&lt;/p&gt;

&lt;p&gt;The only scenario that I can think of (other than 1B) where there would be no other process triggering the soft-lockup detector is that the process holding the write lock is in an TASK_INTERRUPTIBLE or TASK_IDLE wait.&#160; That seems unlikely, but should be easy to find if it is the case.&lt;/p&gt;

&lt;p&gt;Given that the program that generated the stack trace was &quot;a.out&quot; I assume you can reproduce the problem.&#160; Could you do that, then &quot;echo t &amp;gt; /proc/sysrq-trigger&quot; to generate a stack trace of all tasks.&#160; Then provide me with the full set of stack traces?&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;I agree that it would be good to be able to kill tasks blocked on mutexes or semphores, but that tends to take a lot more careful effort than just returning an error code.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="303229" author="mnishizawa" created="Wed, 2 Jun 2021 01:43:00 +0000"  >&lt;p&gt;I attached stack traces in DDN-2177.  Please note that this issue occurred twice on this client and the process &quot;a.out:109684&quot; in problem description is when this occurred first time.  The customer restarted the client, the issue occurred again and then vmcore was captured (vmcore is available in DDN-2177).  The stack traces attached is from this vmcore and so the process is &quot;a.out:4718&quot;.  Hope this helps, &lt;/p&gt;</comment>
                            <comment id="303238" author="neilb" created="Wed, 2 Jun 2021 06:05:53 +0000"  >&lt;p&gt;I don&apos;t know how to access &quot;DDN-2177&quot;.&#160; I don&apos;t suppose you can attach them here?&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="303239" author="mnishizawa" created="Wed, 2 Jun 2021 06:26:32 +0000"  >&lt;p&gt;Ah, we usually don&apos;t share our customer&apos;s data publicly.  So I attached it to DDN-2177, but it would be better to digging into the root cause.  &lt;br/&gt;
Here is the stack trace:  &lt;span class=&quot;error&quot;&gt;&amp;#91;^crash.log&amp;#93;&lt;/span&gt; &lt;/p&gt;

&lt;p&gt;I may delete the attachment later.  &lt;/p&gt;</comment>
                            <comment id="303395" author="neilb" created="Thu, 3 Jun 2021 00:44:40 +0000"  >&lt;p&gt;Thanks for the crash data.&#160; I could not find the other process that I was sure must exist.&lt;/p&gt;

&lt;p&gt;The only process that looks at all suspicious is PID 46556 (ll_imp_inval) and I suspect it is waiting on the a.out thread, not the other way around.&lt;/p&gt;

&lt;p&gt;It is very likely that the ll_trunc_sem that a.out is blocked on has ll_trunc_waiters == 0 and ll_trunc_reads == -1.&lt;/p&gt;

&lt;p&gt;There are two ways this can happen:&lt;/p&gt;

&lt;p&gt;1/ a thread calls trunc_sem_down_write, but never calls trunc_sem_up_write&lt;/p&gt;

&lt;p&gt;2/ a thread calls trunc_sem_up_write without having first called trunc_sem_down_write.&#160; If it does this while some other thread is holding a read lock, then the up_write call will set ll_trunc_reads to 0, and when the other thread calls trunc_sem_up_read, ll_trunc_reads will be decremented to -1.&lt;/p&gt;

&lt;p&gt;I cannot see a code path that leads to option 1.&#160; I &lt;b&gt;can&lt;/b&gt; see a code path that might lead to option 2, but there may be considerations that I&apos;m not aware of which make it impossible.&#160; I&apos;ll explain it in case it is possible.&lt;/p&gt;

&lt;p&gt;trunc_sem_up/down_write are only called from vvp_io_setattr_start/end.&#160; These are only called from cl_io_start/end which are always paired.&lt;/p&gt;

&lt;p&gt;They are called directly in ll_readahead_handle_work and cl_io_loop where cl_io_end will &lt;b&gt;always&lt;/b&gt; follow cl_io_start, and indirectly through love_io_start/end which will be paired properly exactly if&#160; vvp_io_setattr_start/end are paired properly.&lt;/p&gt;

&lt;p&gt;cl_io_end will &lt;b&gt;always&lt;/b&gt; call all of the -&amp;gt;cio_end functions, so missing a call to trunc_sem_up write is (should be) impossible.&lt;/p&gt;

&lt;p&gt;However cl_io_start will &lt;b&gt;sometimes&lt;/b&gt; &lt;b&gt;not&lt;/b&gt;&#160; call all cio_start functions.&#160; If one of the cio_start functions in the stack returns non-zero, the remainder are not called.&#160; However the matching cio_end function will be called.&#160; This seems &lt;b&gt;very&lt;/b&gt; &lt;b&gt;strange&lt;/b&gt;.&#160; I might be missing something, but it certainly seems that is the what would happen.&lt;/p&gt;

&lt;p&gt;So if some cio_start function before vvp_io_setattr_start returned non-zero, then trunc_sem_down_write would not be called, but trunc_semd_up_write would.&#160; This would often not cause a problem.&#160; But if some other thread had called trunc_sem_down_read(), then it would provide exactly the symptom you observe.&lt;/p&gt;

&lt;p&gt;@Andreas does this seem at all possible?&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="303408" author="adilger" created="Thu, 3 Jun 2021 07:10:42 +0000"  >&lt;blockquote&gt;
&lt;p&gt;@Andreas does this seem at all possible?&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;That is really a question for Bobijam to answer, as I&apos;m not nearly as expert in the client IO code as he is.&lt;/p&gt;</comment>
                            <comment id="303425" author="bobijam" created="Thu, 3 Jun 2021 09:15:51 +0000"  >&lt;p&gt;The clio layer always (at least for now) stacks like this VVP upon LOV upon OSC/MDC, and cl_io_start() calls from top to bottom, which is vvp_io_xxx_start() then lov_io_xxx_start() which is nested with {mdc|osc}_io_xxx_start(), and that makes vvp_io_setattr_start() the first to be called in the CIT_SETATTR cl_io_loop call path.&lt;/p&gt;

&lt;p&gt;Like following example shows&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000020:00000001:0.0:1622711194.323939:0:20024:0:(cl_io.c:755:cl_io_loop()) Process entered
00000020:00000001:0.0:1622711194.323940:0:20024:0:(cl_io.c:390:cl_io_iter_init()) Process entered
...
00000020:00000001:0.0:1622711194.323955:0:20024:0:(cl_io.c:402:cl_io_iter_init()) Process leaving (rc=0 : 0 : 0)
00000020:00000001:0.0:1622711194.323958:0:20024:0:(cl_io.c:311:cl_io_lock()) Process entered
...
00000020:00000001:0.0:1622711194.324421:0:20024:0:(cl_io.c:331:cl_io_lock()) Process leaving (rc=0 : 0 : 0)
00000020:00000001:0.0:1622711194.324422:0:20024:0:(cl_io.c:516:cl_io_start()) Process entered
00000080:00000001:0.0:1622711194.324422:0:20024:0:(vvp_io.c:737:vvp_io_setattr_start()) Process entered
...
00000080:00000001:0.0:1622711194.324427:0:20024:0:(vvp_io.c:763:vvp_io_setattr_start()) Process leaving (rc=0 : 0 : 0)
00020000:00000001:0.0:1622711194.324427:0:20024:0:(lov_io.c:1487:lov_io_setattr_start()) Process entered
...
00000008:00000001:0.0:1622711194.324429:0:20024:0:(osc_io.c:582:osc_io_setattr_start()) Process entered
...
00000008:00000001:0.0:1622711194.324459:0:20024:0:(osc_io.c:687:osc_io_setattr_start()) Process leaving (rc=0 : 0 : 0)
...
00020000:00000001:0.0:1622711194.324461:0:20024:0:(lov_io.c:1502:lov_io_setattr_start()) Process leaving (rc=0 : 0 : 0)
00000020:00000001:0.0:1622711194.324462:0:20024:0:(cl_io.c:528:cl_io_start()) Process leaving (rc=0 : 0 : 0)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So basically vvp_io_setattr_start() will be called whatever in CIT_SETATTR cl_io_start().&lt;/p&gt;</comment>
                            <comment id="303433" author="neilb" created="Thu, 3 Jun 2021 11:30:35 +0000"  >&lt;p&gt;Thanks for the excellent explanation.&#160; The bug must be hiding somewhere else then.&#160; I&apos;ll keep digging.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="303521" author="neilb" created="Fri, 4 Jun 2021 00:02:23 +0000"  >&lt;p&gt;Could you please report precisely which version of Lustre is in used (ideally the git commit hash).&#160; The &quot;Affected versions&quot; above says &quot;None&quot; which isn&apos;t helpful.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="303522" author="neilb" created="Fri, 4 Jun 2021 00:05:32 +0000"  >&lt;p&gt;Another question:&#160; If you still have the core file and can find the &apos;struct ll_trunc_sem&quot;&#160; that the process is blocked on, could you report the contents of that.&#160; I&apos;m guessing the two atomic_t&apos;s are -1 and 0, but it would be highly informative if they had other values.&#160; Maybe some sort of memory corruption caused the problem.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="303523" author="bobijam" created="Fri, 4 Jun 2021 01:01:34 +0000"  >&lt;p&gt;The reported version is&#160;2.12.6-ddn19 (both client and server). And unfortunately I failed to load teh vmcore with crqsh&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# crash ./120035_YITP_client_ioerror/usr/lib/debug/lib/modules/4.18.0-240.15.1.el8_3.x86_64/vmlinux 120035_YITP_client_ioerror/vmcore&#160;


crash 7.2.3-11.el7_9.1
Copyright (C) 2002-2017&#160; Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010&#160; IBM Corporation
Copyright (C) 1999-2006&#160; Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012&#160; Fujitsu Limited
Copyright (C) 2006, 2007&#160; VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011&#160; NEC Corporation
Copyright (C) 1999, 2002, 2007&#160; Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002&#160; Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.&#160; Enter &quot;help copying&quot; to see the conditions.
This program has absolutely no warranty.&#160; Enter &quot;help warranty&quot; for details.
&#160;
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later &amp;lt;http://gnu.org/licenses/gpl.html&amp;gt;
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.&#160; Type &quot;show copying&quot;
and &quot;show warranty&quot; for details.
This GDB was configured as &quot;x86_64-unknown-linux-gnu&quot;...


WARNING: kernel relocated [222MB]: patching 97897 gdb minimal_symbol values


please wait... (gathering task table data)&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;
crash: radix trees do not exist or have changed their format

 &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="303528" author="neilb" created="Fri, 4 Jun 2021 01:44:48 +0000"  >&lt;p&gt;&amp;gt; The reported version is 2.12.6-ddn19&lt;/p&gt;

&lt;p&gt;The public &quot;lustre-release&quot; git tree doesn&apos;t have any branch or tag with a name like that, and the public &quot;2.12.6&quot; doesn&apos;t contain trunc_sem_down_read() at all.&lt;/p&gt;

&lt;p&gt;Is there somewhere I can get the source code for that release?&lt;/p&gt;</comment>
                            <comment id="303536" author="bobijam" created="Fri, 4 Jun 2021 03:12:31 +0000"  >&lt;blockquote&gt;&lt;p&gt;I can open it, but I&apos;m having difficulty to find &apos;struct ll_trunc_sem&apos;. vmcore is available from:&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;same here.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;crash&amp;gt; struct ll_trunc_sem
struct: invalid data structure reference: ll_trunc_sem&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;seems lack of lustre module debug info.&lt;/p&gt;

&lt;p&gt;rpm -qpl lustre-client-debuginfo-2.12.6_ddn19-1.el8.x86_64.rpm&lt;/p&gt;

&lt;p&gt;shows that it does not contain lustre module info, only some tools.&lt;/p&gt;</comment>
                            <comment id="303547" author="neilb" created="Fri, 4 Jun 2021 04:19:46 +0000"  >&lt;p&gt;Looking at the vmcore&#160; &quot;bt -FF 4718&quot; show&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;#1 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffffbe8540f3bb88&amp;#93;&lt;/span&gt; schedule at ffffffff8f6d38a8&lt;br/&gt;
 ffffbe8540f3bb90: &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff9f9b3d8c37a0:lustre_inode_cache&amp;#93;&lt;/span&gt; trunc_sem_down_read+166 &lt;br/&gt;
 #2 &lt;span class=&quot;error&quot;&gt;&amp;#91;ffffbe8540f3bb98&amp;#93;&lt;/span&gt; trunc_sem_down_read at ffffffffc12523c6 &lt;span class=&quot;error&quot;&gt;&amp;#91;lustre&amp;#93;&lt;/span&gt;&lt;br/&gt;
 ffffbe8540f3bba0: &lt;span class=&quot;error&quot;&gt;&amp;#91;ffff9f9b3d8c37a0:lustre_inode_cache&amp;#93;&lt;/span&gt; 00000000ffffffff&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;So ffffbe8540f3bba0 must be the address of the ll_trunc_sem.&lt;/p&gt;

&lt;p&gt;crash&amp;gt; x/2wx 0xffff9f9b3d8c37a0&lt;br/&gt;
0xffff9f9b3d8c37a0: 0x00000000 0x00000000&lt;/p&gt;

&lt;p&gt;so both ll_trunc_waiters and ll_trunc_readers are zero.&#160; So trunc_sem_down_read() shouldn&apos;t block.&lt;/p&gt;

&lt;p&gt;This suggests a missed wake-up.&#160; It could only be the wakeup from trunc_sem_up_write().&#160; Maybe a barrier is needed after the atomic_set, and before the atomic_add_unless_negative.&lt;/p&gt;

&lt;p&gt;But I thought barriers like that weren&apos;t needed on x86.&lt;/p&gt;

&lt;p&gt;I&apos;ll read up about memory ordering again.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="303557" author="neilb" created="Fri, 4 Jun 2021 06:02:46 +0000"  >&lt;p&gt;I&apos;m confident that memory barriers aren&apos;t a problem.&#160; atomic_set() and atomic_read() provide any memory barriers they need, and none are needed on x86.&lt;/p&gt;

&lt;p&gt;So I&apos;m stumped - for today at least.&lt;/p&gt;

&lt;p&gt;ll_trunc_readers is definitely 0 and whenever it is set to zero a wake_up happens.&#160; But the waiter didn&apos;t wake up.&lt;/p&gt;

&lt;p&gt;I&apos;d still like to see the exact lustre source code for these modules, but I doubt it will show anything.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="307933" author="bobijam" created="Wed, 21 Jul 2021 07:07:58 +0000"  >&lt;p&gt;Neil,&lt;/p&gt;

&lt;p&gt;I&apos;m not familiar with the memory barrier while I noticed a kernel comment about wake_up_var()&amp;#45;&amp;gt;__wake_up_bit()&amp;#45;&amp;gt;waitqueue_active()&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;  97 /**                                                                             
  98  * waitqueue_active -- locklessly test for waiters on the queue                 
  99  * @wq_head: the waitqueue to test for waiters                                  
 100  *                                                                              
 101  * returns true if the wait list is not empty                                   
 102  *                                                                              
 103  * NOTE: this function is lockless and requires care, incorrect usage _will_    
 104  * lead to sporadic and non-obvious failure.                                    
 105  *                                                                              
 106  * Use either while holding wait_queue_head::lock or when used for wakeups      
 107  * with an extra smp_mb() like::                                                
 108  *                                                                              
 109  *      CPU0 - waker                    CPU1 - waiter                           
 110  *                                                                              
 111  *                                      for (;;) {                              
 112  *      @cond = true;                     prepare_to_wait(&amp;amp;wq_head, &amp;amp;wait, state);
 113  *      smp_mb();                         // smp_mb() from set_current_state()  
 114  *      if (waitqueue_active(wq_head))         if (@cond)                       
 115  *        wake_up(wq_head);                      break;                         
 116  *                                        schedule();                           
 117  *                                      }                                       
 118  *                                      finish_wait(&amp;amp;wq_head, &amp;amp;wait);           
 119  *                                                                              
 120  * Because without the explicit smp_mb() it&apos;s possible for the                  
 121  * waitqueue_active() load to get hoisted over the @cond store such that we&apos;ll  
 122  * observe an empty wait list while the waiter might not observe @cond.  &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Does that mean there should be smp_mb() in trunc_sem_up_write() before wake_up_var()?&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
347 &lt;span class=&quot;code-keyword&quot;&gt;static&lt;/span&gt; inline void trunc_sem_up_write(struct ll_trunc_sem *sem)                 
 348 {                                                                               
 349         atomic_set(&amp;amp;sem-&amp;gt;ll_trunc_readers, 0);                                                       
 350         wake_up_var(&amp;amp;sem-&amp;gt;ll_trunc_readers);                                    
 351 } &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="308016" author="neilb" created="Wed, 21 Jul 2021 23:11:43 +0000"  >&lt;p&gt;It seems I was wrong about atomic_set implying a barrier, but I&apos;m slightly less convinced that x86 doesn&apos;t require them.&lt;/p&gt;

&lt;p&gt;So I&apos;d still be very surprised if barriers were causing the problem, but there is something worth fixing there.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;The atomic_set in trunc_sem_uip_write should be atomic_set_release(), and&#160; that atomic_reads should be atomic_read_acquire().&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;I would still like to see the exact source code for the kernel modules which produced the hang, if that is possible.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="308031" author="gerrit" created="Thu, 22 Jul 2021 01:11:10 +0000"  >&lt;p&gt;Neil Brown (neilb@suse.de) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/44372&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/44372&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14713&quot; title=&quot;Process hung with waiting for mmap_sem&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14713&quot;&gt;&lt;del&gt;LU-14713&lt;/del&gt;&lt;/a&gt; llite: add memory barriier to the trunc_sem.&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 1b20297bce772534dd557abaa11859447da8651b&lt;/p&gt;</comment>
                            <comment id="310758" author="gerrit" created="Fri, 20 Aug 2021 14:15:24 +0000"  >&lt;p&gt;&quot;Bobi Jam &amp;lt;bobijam@hotmail.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/44715&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/44715&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14713&quot; title=&quot;Process hung with waiting for mmap_sem&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14713&quot;&gt;&lt;del&gt;LU-14713&lt;/del&gt;&lt;/a&gt; llite: tighten condition for fault not drop mmap_sem&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 0cccc61c4bd48cc6942963394cd2ea70e30b8563&lt;/p&gt;</comment>
                            <comment id="310761" author="bobijam" created="Fri, 20 Aug 2021 14:37:43 +0000"  >&lt;p&gt;As ll_fault0() shows, if ll_filemap_fault() returns VM_FAULT_RETRY, it presumes mmap_sem has not dropped and continues to normal fault under DLM lock, which will eventually drop the mmap_sem.&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeHeader panelHeader&quot; style=&quot;border-bottom-width: 1px;&quot;&gt;&lt;b&gt;ll_fault0()&lt;/b&gt;&lt;/div&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
279         &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (ll_sbi_has_fast_read(ll_i2sbi(file_inode(vma-&amp;gt;vm_file)))) {         
280                 &lt;span class=&quot;code-comment&quot;&gt;/* &lt;span class=&quot;code-keyword&quot;&gt;do&lt;/span&gt; fast fault */&lt;/span&gt;                                             
281                 bool has_retry = vmf-&amp;gt;flags &amp;amp; FAULT_FLAG_RETRY_NOWAIT;          
282                                                                                 
283                 &lt;span class=&quot;code-comment&quot;&gt;/* To avoid loops, instruct downstream to not drop mmap_sem */&lt;/span&gt;  
284                 vmf-&amp;gt;flags |= FAULT_FLAG_RETRY_NOWAIT;                          
285                 ll_cl_add(vma-&amp;gt;vm_file, env, NULL, LCC_MMAP);                   
286                 fault_ret = ll_filemap_fault(vma, vmf);                         
287                 ll_cl_remove(vma-&amp;gt;vm_file, env);                                
288                 &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (!has_retry)                                                 
289                         vmf-&amp;gt;flags &amp;amp;= ~FAULT_FLAG_RETRY_NOWAIT;                 
290                                                                                 
291                 /* - If there is no error, then the page was found in cache and 
292                  *   uptodate;                                                  
293                  * - If VM_FAULT_RETRY is set, the page existed but failed to   
294                  *   lock. We will &lt;span class=&quot;code-keyword&quot;&gt;try&lt;/span&gt; slow path to avoid loops.                
295                  * - Otherwise, it should &lt;span class=&quot;code-keyword&quot;&gt;try&lt;/span&gt; normal fault under DLM lock. */   
296                 &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (!(fault_ret &amp;amp; VM_FAULT_RETRY) &amp;amp;&amp;amp;                            
297                     !(fault_ret &amp;amp; VM_FAULT_ERROR))                              
298                         GOTO(out, result = 0);                                  
299                                                                                 
300                 fault_ret = 0;                                                  
301         }            &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;But filemap_fault() shows that returning VM_FAULT_RETRY  while not dopping mmap_sem needs vmf-&amp;gt;flags contains both FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT, otherwise it could possibly return VM_FAULT_RETRY with mmap_sem released.&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeHeader panelHeader&quot; style=&quot;border-bottom-width: 1px;&quot;&gt;&lt;b&gt;filemap_fault()&lt;/b&gt;&lt;/div&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
2421         &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (!lock_page_or_retry(page, vma-&amp;gt;vm_mm, vmf-&amp;gt;flags)) {                
2422                 page_cache_release(page);                                       
2423                 &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; ret | VM_FAULT_RETRY;                                            
2424         }      &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
410 &lt;span class=&quot;code-keyword&quot;&gt;static&lt;/span&gt; inline &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; lock_page_or_retry(struct page *page, struct mm_struct *mm,   
411                                      unsigned &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; flags)                                 
412 {                                                                               
413         might_sleep();                                                          
414         &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; trylock_page(page) || __lock_page_or_retry(page, mm, flags);     
415 } 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
 854 &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; __lock_page_or_retry(struct page *page, struct mm_struct *mm,               
 855                          unsigned &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; flags)                                    
 856 {                                                                               
 857         &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (flags &amp;amp; FAULT_FLAG_ALLOW_RETRY) {                                   
 858                 /*                                                              
 859                  * CAUTION! In &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; &lt;span class=&quot;code-keyword&quot;&gt;case&lt;/span&gt;, mmap_sem is not released              
 860                  * even though &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; 0.                                        
 861                  */                                                             
 862                 &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (flags &amp;amp; FAULT_FLAG_RETRY_NOWAIT)                            
 863                         &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; 0;                                               
 864                                                                                 
 865                 up_read(&amp;amp;mm-&amp;gt;mmap_sem);                                         
 866                 &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (flags &amp;amp; FAULT_FLAG_KILLABLE)                                
 867                         wait_on_page_locked_killable(page);                     
 868                 &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt;                                                            
 869                         wait_on_page_locked(page);                                      
 870                 &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; 0;  
 871         } &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt; {                                                                
 872                 &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (flags &amp;amp; FAULT_FLAG_KILLABLE) {                              
 873                         &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; ret;                                                
 874                                                                                 
 875                         ret = __lock_page_killable(page);                       
 876                         &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (ret) {                                              
 877                                 up_read(&amp;amp;mm-&amp;gt;mmap_sem);                         
 878                                 &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; 0;                                               
 879                         }                                                       
 880                 } &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt;                                                          
 881                         __lock_page(page);                                      
 882                 &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; 1;                                                       
 883         }     
 884 }                           
 &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="318735" author="gerrit" created="Sat, 20 Nov 2021 06:25:39 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/44715/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/44715/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14713&quot; title=&quot;Process hung with waiting for mmap_sem&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14713&quot;&gt;&lt;del&gt;LU-14713&lt;/del&gt;&lt;/a&gt; llite: tighten condition for fault not drop mmap_sem&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 81aec05103558f57adb10fd319847304cdf44aa7&lt;/p&gt;</comment>
                            <comment id="318760" author="gerrit" created="Sat, 20 Nov 2021 06:30:57 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/43844/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/43844/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14713&quot; title=&quot;Process hung with waiting for mmap_sem&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14713&quot;&gt;&lt;del&gt;LU-14713&lt;/del&gt;&lt;/a&gt; llite: mend the trunc_sem_up_write()&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 39745c8b5493159bbca62add54ca9be7cac6564f&lt;/p&gt;</comment>
                            <comment id="318775" author="pjones" created="Sat, 20 Nov 2021 14:42:48 +0000"  >&lt;p&gt;Landed for 2.15&lt;/p&gt;</comment>
                            <comment id="335532" author="gerrit" created="Thu, 19 May 2022 19:31:51 +0000"  >&lt;p&gt;&quot;Etienne AUJAMES &amp;lt;eaujames@ddn.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/47404&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/47404&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14713&quot; title=&quot;Process hung with waiting for mmap_sem&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14713&quot;&gt;&lt;del&gt;LU-14713&lt;/del&gt;&lt;/a&gt; llite: tighten condition for fault not drop mmap_sem&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_12&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: cdf66f48b5100b0b209be3f1741e6163000299e2&lt;/p&gt;</comment>
                            <comment id="335578" author="eaujames" created="Fri, 20 May 2022 12:47:01 +0000"  >&lt;p&gt;We likely hit this issue at the CEA in production.&lt;br/&gt;
After upgrading from 2.12.6 to 2.12.7, we observed regularly hangs on the clients when canceling a job.&lt;br/&gt;
The hang is observed on mmap_sem when closing the /dev/infiniband/uverb* device:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;void ib_umem_release(struct ib_umem *umem)
{
        if (!umem)
                return;
.....

#ifdef HAVE_ATOMIC_PINNED_VM
        atomic64_sub(ib_umem_num_pages(umem), &amp;amp;umem-&amp;gt;owning_mm-&amp;gt;pinned_vm);
#else
        down_write(&amp;amp;umem-&amp;gt;owning_mm-&amp;gt;mmap_sem);                 &amp;lt;----- the thread hang here, rw_semaphore owner is garbage (already free)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;MOFED: 5.4&lt;br/&gt;
kernel: 3.10&lt;br/&gt;
lustre: client 2.12.7&lt;/p&gt;

&lt;p&gt;So I am not 100% sure, but I think we hit the regression from 21dc165991 &quot;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13182&quot; title=&quot;MAP_POPULATE hangs with Linux 5.4&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13182&quot;&gt;&lt;del&gt;LU-13182&lt;/del&gt;&lt;/a&gt; llite: Avoid eternel retry loops with MAP_POPULATE&quot; (landed in 2.12.7).&lt;br/&gt;
This patch could cause mmap_sem to be released (up_read) twice (only for kernel &amp;lt; 5.1).&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                                        </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="67734">LU-15397</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i01vgn:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>