<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:37:24 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-10697] MDT locking issues after failing over OSTs from hung OSS</title>
                <link>https://jira.whamcloud.com/browse/LU-10697</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We have more server instabilities with 2.10.3.&lt;/p&gt;

&lt;p&gt;The problem started with an hung OSS `oak-io2-s2` this morning, it&apos;s not clear what happened and would required some analysis of the crash dump. The first stack trace in the log is the following:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;[1369496.805186] INFO: task systemd:1 blocked &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; more than 120 seconds.
[1369496.826365] &lt;span class=&quot;code-quote&quot;&gt;&quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot;&lt;/span&gt; disables &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; message.
[1369496.852691] systemd         D ffffffff81a8ae08     0     1      0 0x00000000
[1369496.876463]  ffff88015368fb50 0000000000000082 ffff880153698000 ffff88015368ffd8
[1369496.901403]  ffff88015368ffd8 ffff88015368ffd8 ffff880153698000 ffffffff81a8ae00
[1369496.926336]  ffffffff81a8ae04 ffff880153698000 00000000ffffffff ffffffff81a8ae08
[1369496.951268] Call Trace:
[1369496.959866]  [&amp;lt;ffffffff816aa409&amp;gt;] schedule_preempt_disabled+0x29/0x70
[1369496.981629]  [&amp;lt;ffffffff816a8337&amp;gt;] __mutex_lock_slowpath+0xc7/0x1d0
[1369497.002523]  [&amp;lt;ffffffff816a774f&amp;gt;] mutex_lock+0x1f/0x2f
[1369497.019998]  [&amp;lt;ffffffff8127f9a2&amp;gt;] sysfs_permission+0x32/0x60
[1369497.039175]  [&amp;lt;ffffffff8120c31e&amp;gt;] __inode_permission+0x6e/0xc0
[1369497.058921]  [&amp;lt;ffffffff8120c388&amp;gt;] inode_permission+0x18/0x50
[1369497.078096]  [&amp;lt;ffffffff8120e44e&amp;gt;] link_path_walk+0x27e/0x8b0
[1369497.097273]  [&amp;lt;ffffffff8120ebdb&amp;gt;] path_lookupat+0x6b/0x7b0
[1369497.115877]  [&amp;lt;ffffffff811df555&amp;gt;] ? kmem_cache_alloc+0x35/0x1e0
[1369497.135925]  [&amp;lt;ffffffff81211d2f&amp;gt;] ? getname_flags+0x4f/0x1a0
[1369497.155099]  [&amp;lt;ffffffff8120f34b&amp;gt;] filename_lookup+0x2b/0xc0
[1369497.173988]  [&amp;lt;ffffffff81212ec7&amp;gt;] user_path_at_empty+0x67/0xc0
[1369497.193750]  [&amp;lt;ffffffff8124b919&amp;gt;] ? ep_scan_ready_list.isra.7+0x1b9/0x1f0
[1369497.216641]  [&amp;lt;ffffffff81212f31&amp;gt;] user_path_at+0x11/0x20
[1369497.234673]  [&amp;lt;ffffffff81206473&amp;gt;] vfs_fstatat+0x63/0xc0
[1369497.252433]  [&amp;lt;ffffffff81206a94&amp;gt;] SYSC_newfstatat+0x24/0x60
[1369497.271340]  [&amp;lt;ffffffff810eaaba&amp;gt;] ? get_monotonic_boottime+0x4a/0x100
[1369497.293088]  [&amp;lt;ffffffff810aec01&amp;gt;] ? posix_get_boottime+0x11/0x20
[1369497.313423]  [&amp;lt;ffffffff810b0271&amp;gt;] ? SyS_clock_gettime+0x81/0xc0
[1369497.333456]  [&amp;lt;ffffffff81206cde&amp;gt;] SyS_newfstatat+0xe/0x10
[1369497.351774]  [&amp;lt;ffffffff816b5009&amp;gt;] system_call_fastpath+0x16/0x1b


&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Because OSTs were unresponsive, we took a crash dump at 11:10:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;vmcore-oak-io2-s2-2018-02-21-11:10:27.gz &lt;a href=&quot;https://stanford.box.com/s/bm7kgo5vndy6wylmnmo24qx26d377wge&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://stanford.box.com/s/bm7kgo5vndy6wylmnmo24qx26d377wge&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Failing over the OSTs to the partner OSS seemed to work at first.&lt;/p&gt;

&lt;p&gt;Then users reported that some operations on directories were stuck, so I took a look at the MDS and found it with warnings and stack traces (see attached file `oak-md1-s2.kernel.log` for full logs). This is just to show the beginning:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;Feb 21 11:20:54 oak-md1-s2 kernel: LustreError: 102992:0:(osp_precreate.c:903:osp_precreate_cleanup_orphans()) oak-OST003f-osc-MDT0000: cannot cleanup orphans: rc = -11
Feb 21 11:20:54 oak-md1-s2 kernel: LustreError: 102992:0:(osp_precreate.c:903:osp_precreate_cleanup_orphans()) Skipped 21 previous similar messages
Feb 21 11:28:39 oak-md1-s2 kernel: Lustre: oak-OST0051-osc-MDT0000: Connection restored to 10.0.2.105@o2ib5 (at 10.0.2.105@o2ib5)
Feb 21 11:28:39 oak-md1-s2 kernel: Lustre: Skipped 946 previous similar messages
Feb 21 11:28:47 oak-md1-s2 kernel: LustreError: 102692:0:(client.c:3007:ptlrpc_replay_interpret()) @@@ status -2, old was 0  req@ffff88020ad62a00 x1592931638579664/t4295689592(4295689592) o6-&amp;gt;oak-OST0053-osc-MDT0000@10.0.2.105@o2ib5:28/4 lens 664/400 e 5 to 0 dl 1519241348 ref 2 fl Interpret:R/4/0 rc -2/-2
Feb 21 11:31:33 oak-md1-s2 kernel: INFO: task mdt01_002:102837 blocked &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; more than 120 seconds.
Feb 21 11:31:33 oak-md1-s2 kernel: &lt;span class=&quot;code-quote&quot;&gt;&quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot;&lt;/span&gt; disables &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; message.
Feb 21 11:31:33 oak-md1-s2 kernel: mdt01_002       D ffffffff00000000     0 102837      2 0x00000080
Feb 21 11:31:33 oak-md1-s2 kernel: ffff88102428f558 0000000000000046 ffff88027975cf10 ffff88102428ffd8
Feb 21 11:31:33 oak-md1-s2 kernel: ffff88102428ffd8 ffff88102428ffd8 ffff88027975cf10 ffff88027975cf10
Feb 21 11:31:33 oak-md1-s2 kernel: ffff88036428f248 ffff88036428f240 fffffffe00000001 ffffffff00000000
Feb 21 11:31:33 oak-md1-s2 kernel: Call Trace:
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffff816a94e9&amp;gt;] schedule+0x29/0x70
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffff816aadd5&amp;gt;] rwsem_down_write_failed+0x225/0x3a0
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffff81332047&amp;gt;] call_rwsem_down_write_failed+0x17/0x30
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffff816a87cd&amp;gt;] down_write+0x2d/0x3d
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffffc128f7f4&amp;gt;] lod_qos_prep_create+0xaa4/0x17f0 [lod]
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffffc1017200&amp;gt;] ? qsd_op_begin+0xb0/0x4d0 [lquota]
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffffc1094d10&amp;gt;] ? osd_declare_qid+0x1f0/0x480 [osd_ldiskfs]
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffffc1290ab8&amp;gt;] lod_prepare_create+0x298/0x3f0 [lod]
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffffc105807e&amp;gt;] ? osd_idc_find_and_init+0x7e/0x100 [osd_ldiskfs]
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffffc128563e&amp;gt;] lod_declare_striped_create+0x1ee/0x970 [lod]
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffffc1287b54&amp;gt;] lod_declare_create+0x1e4/0x540 [lod]
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffffc12f3a1f&amp;gt;] mdd_declare_create_object_internal+0xdf/0x2f0 [mdd]
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffffc12e4b63&amp;gt;] mdd_declare_create+0x53/0xe20 [mdd]
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffffc12e8b69&amp;gt;] mdd_create+0x7d9/0x1320 [mdd]
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffffc11ba9bc&amp;gt;] mdt_reint_open+0x218c/0x31a0 [mdt]
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffffc09ef4ce&amp;gt;] ? upcall_cache_get_entry+0x20e/0x8f0 [obdclass]
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffffc119faa3&amp;gt;] ? ucred_set_jobid+0x53/0x70 [mdt]
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffffc11af8a0&amp;gt;] mdt_reint_rec+0x80/0x210 [mdt]
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffffc119130b&amp;gt;] mdt_reint_internal+0x5fb/0x9c0 [mdt]
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffffc1191832&amp;gt;] mdt_intent_reint+0x162/0x430 [mdt]
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffffc119c59e&amp;gt;] mdt_intent_policy+0x43e/0xc70 [mdt]
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffffc0bbc672&amp;gt;] ? ldlm_resource_get+0x5e2/0xa30 [ptlrpc]
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffffc0bb5277&amp;gt;] ldlm_lock_enqueue+0x387/0x970 [ptlrpc]
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffffc0bde903&amp;gt;] ldlm_handle_enqueue0+0x9c3/0x1680 [ptlrpc]
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffffc0c06ae0&amp;gt;] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc]
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffffc0c63ea2&amp;gt;] tgt_enqueue+0x62/0x210 [ptlrpc]
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffffc0c67da5&amp;gt;] tgt_request_handle+0x925/0x1370 [ptlrpc]
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffffc0c10b16&amp;gt;] ptlrpc_server_handle_request+0x236/0xa90 [ptlrpc]
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffffc0c0d148&amp;gt;] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffff810c4822&amp;gt;] ? default_wake_function+0x12/0x20
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffff810ba588&amp;gt;] ? __wake_up_common+0x58/0x90
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffffc0c14252&amp;gt;] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffffc0c137c0&amp;gt;] ? ptlrpc_register_service+0xe30/0xe30 [ptlrpc]
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffff810b098f&amp;gt;] kthread+0xcf/0xe0
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffff810b08c0&amp;gt;] ? insert_kthread_work+0x40/0x40
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffff816b4f58&amp;gt;] ret_from_fork+0x58/0x90
Feb 21 11:31:33 oak-md1-s2 kernel: [&amp;lt;ffffffff810b08c0&amp;gt;] ? insert_kthread_work+0x40/0x40
Feb 21 11:31:33 oak-md1-s2 kernel: INFO: task mdt00_004:103144 blocked &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; more than 120 seconds.


&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;then slightly later:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;Feb 21 11:35:25 oak-md1-s2 kernel: LustreError: 103142:0:(ldlm_request.c:130:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1519241425, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-oak-MDT0000_UUID lock: ffff8801fa7b4200/0xec47b06a6b3b9c78 lrc: 3/0,1 mode: --/CW res: [0x20000f271:0x2562:0x0].0x0 bits 0x2 rrc: 11 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 103142 timeout: 0 lvb_type: 0
Feb 21 11:35:25 oak-md1-s2 kernel: LustreError: dumping log to /tmp/lustre-log.1519241725.103142
Feb 21 11:35:27 oak-md1-s2 kernel: LustreError: 102833:0:(ldlm_request.c:130:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1519241427, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-oak-MDT0000_UUID lock: ffff880afe56c400/0xec47b06a6b3d396a lrc: 3/0,1 mode: --/CW res: [0x20000f271:0x2562:0x0].0x0 bits 0x2 rrc: 11 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 102833 timeout: 0 lvb_type: 0
Feb 21 11:35:42 oak-md1-s2 kernel: LustreError: 220676:0:(ldlm_request.c:130:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1519241442, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-oak-MDT0000_UUID lock: ffff880e0908aa00/0xec47b06a6b44312f lrc: 3/1,0 mode: --/PR res: [0x20000db01:0xda33:0x0].0x0 bits 0x13 rrc: 4 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 220676 timeout: 0 lvb_type: 0
Feb 21 11:37:04 oak-md1-s2 kernel: LNet: Service thread pid 102832 completed after 399.98s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
Feb 21 11:37:04 oak-md1-s2 kernel: LNet: Skipped 3 previous similar messages
Feb 21 11:37:06 oak-md1-s2 kernel: LustreError: 220680:0:(ldlm_request.c:130:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1519241526, 300s ago); not entering recovery in server code, just going back to sleep ns: mdt-oak-MDT0000_UUID lock: ffff880248745a00/0xec47b06a6b61994d lrc: 3/1,0 mode: --/PR res: [0x20000c387:0x1420f:0x0].0x0 bits 0x13 rrc: 6 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 220680 timeout: 0 lvb_type: 0
Feb 21 11:38:44 oak-md1-s2 kernel: LNet: Service thread pid 103154 completed after 499.94s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
Feb 21 11:38:57 oak-md1-s2 kernel: LNet: Service thread pid 103147 was inactive &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; 412.60s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; debugging purposes:


&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;We decided to do a stop/start of the MDT at 12:04:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;Feb 21 12:04:28 oak-md1-s2 kernel: Lustre: Failing over oak-MDT0000


&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The full logs show many granted locks.&lt;/p&gt;

&lt;p&gt;Recovery completed at 12:21 with 1 client evicted:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;Feb 21 12:21:46 oak-md1-s2 kernel: Lustre: oak-MDT0000: Recovery over after 14:38, of 1263 clients 1262 recovered and 1 was evicted.


&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Which fixed the directory access issues.&lt;/p&gt;

&lt;p&gt;That&apos;s the third time I see this issue (after OST failover) since the upgrade to 2.10.3.&#160;Never seen&#160; with 2.10.1+patches or 2.10.2 (without). Here, all servers are running 2.10.3, MDS is running 2.10.3 + first patch from&#160;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10680&quot; title=&quot;MDT becoming unresponsive in 2.10.3&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10680&quot;&gt;&lt;del&gt;LU-10680&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Note: most of our clients are still running 2.10.0 (~900) and 2.10.1 (~300) but we haven&apos;t seen any Lustre issues on the clients so far.&lt;/p&gt;

&lt;p&gt;Thanks!&lt;/p&gt;</description>
                <environment>3.10.0-693.2.2.el7_lustre.pl1.x86_64</environment>
        <key id="50900">LU-10697</key>
            <summary>MDT locking issues after failing over OSTs from hung OSS</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="bfaccini">Bruno Faccini</assignee>
                                    <reporter username="sthiell">Stephane Thiell</reporter>
                        <labels>
                    </labels>
                <created>Wed, 21 Feb 2018 23:15:27 +0000</created>
                <updated>Thu, 14 Nov 2019 22:15:07 +0000</updated>
                                            <version>Lustre 2.10.3</version>
                                                        <due></due>
                            <votes>1</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="221493" author="pjones" created="Thu, 22 Feb 2018 18:11:59 +0000"  >&lt;p&gt;Thanks Stephane.&lt;/p&gt;</comment>
                            <comment id="225970" author="bfaccini" created="Fri, 13 Apr 2018 11:53:46 +0000"  >&lt;p&gt;Hello Stephane,&lt;br/&gt;
Sorry to be late on this.&lt;br/&gt;
I have analyzed the OSS crash-dump you have provided, and it shows definitely the same situation/dead-lock than in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10709&quot; title=&quot;OSS deadlock in 2.10.3&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10709&quot;&gt;LU-10709&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I am presently analyzing the MDS syslog to try to understand why the OSS crash did not fix and why you have needed to restart it.&lt;/p&gt;</comment>
                            <comment id="226020" author="sthiell" created="Fri, 13 Apr 2018 19:04:34 +0000"  >&lt;p&gt;Thanks for checking Bruno. Almost always after hitting&#160;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10709&quot; title=&quot;OSS deadlock in 2.10.3&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10709&quot;&gt;LU-10709&lt;/a&gt;, we had to restart the MDT.&lt;/p&gt;</comment>
                            <comment id="258218" author="thomasr" created="Wed, 13 Nov 2019 08:33:56 +0000"  >&lt;p&gt;Same trouble here (GSI) with 2.10.6:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;In any case, it always looks like

Nov 13 10:23:58 lxmds19.gsi.de kernel: Pid: 6449, comm: mdt00_095 3.10.0-957.el7_lustre.x86_64 #1 SMP Wed Dec 12 15:03:08 UTC 2018
Nov 13 10:23:58 lxmds19.gsi.de kernel: Call Trace:
Nov 13 10:23:58 lxmds19.gsi.de kernel:  [&amp;lt;ffffffff87786cf7&amp;gt;] call_rwsem_down_write_failed+0x17/0x30
Nov 13 10:23:58 lxmds19.gsi.de kernel:  [&amp;lt;ffffffffc1716f04&amp;gt;] lod_qos_prep_create+0xaa4/0x17f0 [lod]
Nov 13 10:23:58 lxmds19.gsi.de kernel:  [&amp;lt;ffffffffc171818d&amp;gt;] lod_prepare_create+0x25d/0x360 [lod]
Nov 13 10:23:58 lxmds19.gsi.de kernel:  [&amp;lt;ffffffffc170c9ae&amp;gt;] lod_declare_striped_create+0x1ee/0x970 [lod]
Nov 13 10:23:58 lxmds19.gsi.de kernel:  [&amp;lt;ffffffffc170ee24&amp;gt;] lod_declare_create+0x1e4/0x540 [lod]
Nov 13 10:23:58 lxmds19.gsi.de kernel:  [&amp;lt;ffffffffc177ab22&amp;gt;] mdd_declare_create_object_internal+0xe2/0x2f0 [mdd]
Nov 13 10:23:58 lxmds19.gsi.de kernel:  [&amp;lt;ffffffffc176c1a3&amp;gt;] mdd_declare_create+0x53/0xe30 [mdd]
Nov 13 10:23:58 lxmds19.gsi.de kernel:  [&amp;lt;ffffffffc1770059&amp;gt;] mdd_create+0x879/0x1400 [mdd]
Nov 13 10:23:58 lxmds19.gsi.de kernel:  [&amp;lt;ffffffffc166acc5&amp;gt;] mdt_reint_open+0x2175/0x3190 [mdt]
Nov 13 10:23:58 lxmds19.gsi.de kernel:  [&amp;lt;ffffffffc165fb43&amp;gt;] mdt_reint_rec+0x83/0x210 [mdt]
Nov 13 10:23:58 lxmds19.gsi.de kernel:  [&amp;lt;ffffffffc164137b&amp;gt;] mdt_reint_internal+0x5fb/0x9c0 [mdt]
Nov 13 10:23:58 lxmds19.gsi.de kernel:  [&amp;lt;ffffffffc16418a2&amp;gt;] mdt_intent_reint+0x162/0x430 [mdt]
Nov 13 10:23:58 lxmds19.gsi.de kernel:  [&amp;lt;ffffffffc164c681&amp;gt;] mdt_intent_policy+0x441/0xc70 [mdt]
Nov 13 10:23:58 lxmds19.gsi.de kernel:  [&amp;lt;ffffffffc0f5d2ba&amp;gt;] ldlm_lock_enqueue+0x38a/0x980 [ptlrpc]
Nov 13 10:23:58 lxmds19.gsi.de kernel:  [&amp;lt;ffffffffc0f86b53&amp;gt;] ldlm_handle_enqueue0+0x9d3/0x16a0 [ptlrpc]
Nov 13 10:23:58 lxmds19.gsi.de kernel:  [&amp;lt;ffffffffc100c4f2&amp;gt;] tgt_enqueue+0x62/0x210 [ptlrpc]
Nov 13 10:23:58 lxmds19.gsi.de kernel:  [&amp;lt;ffffffffc101042a&amp;gt;] tgt_request_handle+0x92a/0x1370 [ptlrpc]
Nov 13 10:23:58 lxmds19.gsi.de kernel:  [&amp;lt;ffffffffc0fb8e5b&amp;gt;] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
Nov 13 10:23:58 lxmds19.gsi.de kernel:  [&amp;lt;ffffffffc0fbc5a2&amp;gt;] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
Nov 13 10:23:58 lxmds19.gsi.de kernel:  [&amp;lt;ffffffff874c1c31&amp;gt;] kthread+0xd1/0xe0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="258331" author="adilger" created="Thu, 14 Nov 2019 22:14:58 +0000"  >&lt;p&gt;Per Colin&apos;s comment in &lt;tt&gt;lustre-discuss&lt;/tt&gt; this is related to an upstream kernel bug:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Which kernel are you running?&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://access.redhat.com/solutions/3393611&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://access.redhat.com/solutions/3393611&lt;/a&gt;&lt;/p&gt;&lt;/blockquote&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="50931">LU-10709</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="55309">LU-12136</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="29611" name="oak-md1-s2.kernel.log" size="195879" author="sthiell" created="Wed, 21 Feb 2018 23:14:51 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzt5b:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>