<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:33:09 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-10223] OSS hang </title>
                <link>https://jira.whamcloud.com/browse/LU-10223</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;After reprovison SOAK with  MOFED build #16, 1 OSS (soak-7) got hung after running about 2 hours. Have tried to restart the node and mount again, it quickly hung again.&lt;/p&gt;

&lt;p&gt;OSS dmesg&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[   59.983080] IPv6: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
[  577.564976] LNet: HW NUMA nodes: 2, HW CPU cores: 32, npartitions: 2
[  577.588287] alg: No test for adler32 (adler32-zlib)
[  577.604774] alg: No test for crc32 (crc32-table)
[  578.440401] Lustre: Lustre: Build Version: 2.10.55
[  578.626899] LNet: Added LNI 192.168.1.107@o2ib [8/256/0/180]
[  578.644034] LNetError: 126:0:(o2iblnd_cb.c:2305:kiblnd_passive_connect()) Can&apos;t accept conn from 192.168.1.119@o2ib on NA (ib0:1:192.168.1.107): bad dst nid 192.168.1.107@o2ib
[  579.250293] LustreError: 137-5: soaked-OST000b_UUID: not available for connect from 192.168.1.128@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server.
[  579.404451] Lustre: soaked-OST0005: Not available for connect from 192.168.1.118@o2ib (not set up)
[  579.468193] Lustre: soaked-OST0005: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900
[  579.509366] LustreError: 11-0: soaked-MDT0001-lwp-OST0005: operation mds_connect to node 192.168.1.109@o2ib failed: rc = -114
[  579.517897] Lustre: soaked-OST0005: Will be in recovery for at least 2:30, or until 32 clients reconnect
[  579.518181] Lustre: soaked-OST0005: Connection restored to  (at 192.168.1.126@o2ib)
[  580.759587] LustreError: 137-5: soaked-OST0017_UUID: not available for connect from 192.168.1.133@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server.
[  580.818149] LustreError: Skipped 53 previous similar messages
[  580.844275] Lustre: soaked-OST0005: Connection restored to  (at 192.168.1.121@o2ib)
[  580.869576] Lustre: Skipped 4 previous similar messages
[  581.910615] LustreError: 137-5: soaked-OST0011_UUID: not available for connect from 192.168.1.120@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server.
[  581.911026] Lustre: soaked-OST0005: Connection restored to  (at 192.168.1.120@o2ib)
[  581.911029] Lustre: Skipped 1 previous similar message
[  582.011699] LustreError: Skipped 6 previous similar messages
[  584.510520] Lustre: soaked-OST0005: Connection restored to soaked-MDT0001-mdtlov_UUID (at 192.168.1.109@o2ib)
[  584.543241] Lustre: Skipped 1 previous similar message
[  584.613345] LNet: 7784:0:(o2iblnd_cb.c:3198:kiblnd_check_conns()) Timed out tx for 192.168.1.114@o2ib: 4295251 seconds
[  585.156427] LustreError: 137-5: soaked-OST0017_UUID: not available for connect from 192.168.1.136@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server.
[  585.214952] LustreError: Skipped 4 previous similar messages
[  585.613324] LNet: 7784:0:(o2iblnd_cb.c:3198:kiblnd_check_conns()) Timed out tx for 192.168.1.115@o2ib: 4295252 seconds
[  588.629665] Lustre: soaked-OST0005: Connection restored to  (at 192.168.1.123@o2ib)
[  588.654988] Lustre: Skipped 15 previous similar messages
[  588.718309] Lustre: soaked-OST000b: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900
[  588.763453] LustreError: 11-0: soaked-MDT0001-lwp-OST000b: operation mds_connect to node 192.168.1.109@o2ib failed: rc = -114
[  588.800753] LustreError: Skipped 3 previous similar messages
[  589.290380] Lustre: soaked-OST000b: Will be in recovery for at least 2:30, or until 32 clients reconnect
[  590.882947] LustreError: 137-5: soaked-OST0017_UUID: not available for connect from 192.168.1.109@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server.
[  596.295018] Lustre: soaked-OST0011: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900
[  596.343902] LustreError: 11-0: soaked-MDT0001-lwp-OST0011: operation mds_connect to node 192.168.1.109@o2ib failed: rc = -114
[  596.349189] Lustre: soaked-OST0011: Will be in recovery for at least 2:30, or until 32 clients reconnect
[  596.412527] LustreError: Skipped 3 previous similar messages
[  596.989322] Lustre: soaked-OST0011: Connection restored to soaked-MDT0000-mdtlov_UUID (at 192.168.1.108@o2ib)
[  597.025080] Lustre: Skipped 27 previous similar messages
[  601.257640] LustreError: 137-5: soaked-OST0017_UUID: not available for connect from 192.168.1.110@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server.
[  601.322237] LustreError: Skipped 1 previous similar message
[  604.746939] LustreError: 11-0: soaked-MDT0001-lwp-OST0017: operation mds_connect to node 192.168.1.109@o2ib failed: rc = -114
[  604.751172] Lustre: soaked-OST0017: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900
[  604.755435] Lustre: soaked-OST0017: Will be in recovery for at least 2:30, or until 32 clients reconnect
[  604.859781] LustreError: Skipped 3 previous similar messages
[  613.202223] Lustre: soaked-OST0017: Connection restored to  (at 192.168.1.120@o2ib)
[  613.230721] Lustre: Skipped 60 previous similar messages
[  625.052799] Lustre: soaked-OST0011: Recovery over after 0:28, of 32 clients 32 recovered and 0 were evicted.
[  625.178799] Lustre: soaked-OST0011: deleting orphan objects from 0x740000402:1954773 to 0x740000402:1956385
[  625.186027] Lustre: soaked-OST0011: deleting orphan objects from 0x740000401:2946444 to 0x740000401:2946577
[  625.247687] Lustre: soaked-OST0011: deleting orphan objects from 0x740000400:1324860 to 0x740000400:1326312
[  629.859616] LustreError: 11-0: soaked-MDT0003-lwp-OST0005: operation mds_connect to node 192.168.1.111@o2ib failed: rc = -114
[  629.900030] LustreError: Skipped 3 previous similar messages
[  632.686916] Lustre: soaked-OST0017: Recovery over after 0:28, of 32 clients 32 recovered and 0 were evicted.
[  632.812885] Lustre: soaked-OST0017: deleting orphan objects from 0x8c0000402:1330224 to 0x8c0000402:1330433
[  632.845820] Lustre: soaked-OST0017: deleting orphan objects from 0x8c0000401:1961276 to 0x8c0000401:1962209
[  632.892404] Lustre: soaked-OST0017: deleting orphan objects from 0x8c0000400:2942161 to 0x8c0000400:2942193
[  633.363106] LustreError: 7850:0:(lustre_dlm.h:1372:ldlm_res_lvbo_update()) delayed lvb init failed (rc -2)
[  633.370215] LustreError: 29224:0:(ofd_io.c:616:ofd_preprw_write()) soaked-OST0005: BRW to missing obj 0x440000402:1949013
[  633.434168] LustreError: 7850:0:(lustre_dlm.h:1372:ldlm_res_lvbo_update()) Skipped 1 previous similar message
[  633.468475] format at lustre_dlm.h:1098:ldlm_lvbo_fill doesn&apos;t end in newline
[  635.387737] Lustre: soaked-OST000b: Recovery over after 0:46, of 32 clients 32 recovered and 0 were evicted.
[  635.423428] Lustre: Skipped 1 previous similar message
[  633.493466] LustreError: 7850:0:(lustre_dlm.h:1098:ldlm_lvbo_fill()) lock ffff880797e70800: delayed lvb init failed (rc -2)
[  635.794185] LustreError: 7850:0:(lustre_dlm.h:1372:ldlm_res_lvbo_update()) delayed lvb init failed (rc -2)
[  635.830221] LustreError: 7850:0:(lustre_dlm.h:1372:ldlm_res_lvbo_update()) Skipped 10 previous similar messages
[  654.748696] LustreError: 11-0: soaked-MDT0000-lwp-OST0011: operation mds_connect to node 192.168.1.108@o2ib failed: rc = -114
[  654.789007] LustreError: Skipped 1 previous similar message
[  679.789097] LustreError: 11-0: soaked-MDT0003-lwp-OST0005: operation mds_connect to node 192.168.1.111@o2ib failed: rc = -114
[  679.829488] LustreError: Skipped 5 previous similar messages
[  729.808913] LustreError: 11-0: soaked-MDT0003-lwp-OST0005: operation mds_connect to node 192.168.1.111@o2ib failed: rc = -114
[  729.849328] LustreError: Skipped 7 previous similar messages
[  749.610650] LNet: 7784:0:(o2iblnd_cb.c:3198:kiblnd_check_conns()) Timed out tx for 192.168.1.114@o2ib: 0 seconds
[  804.746279] LustreError: 11-0: soaked-MDT0000-lwp-OST0017: operation mds_connect to node 192.168.1.108@o2ib failed: rc = -114
[  804.786597] LustreError: Skipped 9 previous similar messages
[  833.890248] LNet: Service thread pid 35547 was inactive for 200.52s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
[  833.952191] Pid: 35547, comm: ll_ost_io01_022
[  833.969466] 
Call Trace:
[  833.987632]  [&amp;lt;ffffffff816a9589&amp;gt;] schedule+0x29/0x70
[  834.006382]  [&amp;lt;ffffffffc06b34d5&amp;gt;] cv_wait_common+0x125/0x150 [spl]
[  834.028931]  [&amp;lt;ffffffff810b1920&amp;gt;] ? autoremove_wake_function+0x0/0x40
[  834.052232]  [&amp;lt;ffffffffc06b3515&amp;gt;] __cv_wait+0x15/0x20 [spl]
[  834.072541]  [&amp;lt;ffffffffc1f7181f&amp;gt;] txg_wait_synced+0xef/0x140 [zfs]
[  834.094665]  [&amp;lt;ffffffffc0b562c8&amp;gt;] osd_trans_stop+0x468/0x590 [osd_zfs]
[  834.117804]  [&amp;lt;ffffffffc117629f&amp;gt;] ofd_trans_stop+0x1f/0x60 [ofd]
[  834.139098]  [&amp;lt;ffffffffc117c4c4&amp;gt;] ofd_commitrw_write+0x7e4/0x1c50 [ofd]
[  834.162354]  [&amp;lt;ffffffffc11805e9&amp;gt;] ofd_commitrw+0x4b9/0xac0 [ofd]
[  834.183547]  [&amp;lt;ffffffffc0f74677&amp;gt;] obd_commitrw+0x2ed/0x330 [ptlrpc]
[  834.205542]  [&amp;lt;ffffffffc0f47c41&amp;gt;] tgt_brw_write+0xff1/0x17c0 [ptlrpc]
[  834.227963]  [&amp;lt;ffffffff811de501&amp;gt;] ? __slab_free+0x81/0x2f0
[  834.247183]  [&amp;lt;ffffffff8118aeb4&amp;gt;] ? free_one_page+0x2e4/0x310
[  834.267230]  [&amp;lt;ffffffffc0e99f80&amp;gt;] ? target_bulk_timeout+0x0/0xb0 [ptlrpc]
[  834.290662]  [&amp;lt;ffffffffc0f43c55&amp;gt;] tgt_request_handle+0x925/0x1370 [ptlrpc]
[  834.314361]  [&amp;lt;ffffffffc0eeca3e&amp;gt;] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc]
[  834.340655]  [&amp;lt;ffffffffc0ee9908&amp;gt;] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
[  834.364065]  [&amp;lt;ffffffff811dca2e&amp;gt;] ? kmalloc_order_trace+0x2e/0xa0
[  834.385198]  [&amp;lt;ffffffffc0ef01e2&amp;gt;] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
[  834.406858]  [&amp;lt;ffffffff81029557&amp;gt;] ? __switch_to+0xd7/0x510
[  834.425923]  [&amp;lt;ffffffff816a904d&amp;gt;] ? __schedule+0x39d/0x8b0
[  834.444999]  [&amp;lt;ffffffffc0eef750&amp;gt;] ? ptlrpc_main+0x0/0x1e40 [ptlrpc]
[  834.466633]  [&amp;lt;ffffffff810b099f&amp;gt;] kthread+0xcf/0xe0
[  834.483693]  [&amp;lt;ffffffff810b08d0&amp;gt;] ? kthread+0x0/0xe0
[  834.501018]  [&amp;lt;ffffffff816b4fd8&amp;gt;] ret_from_fork+0x58/0x90
[  834.519757]  [&amp;lt;ffffffff810b08d0&amp;gt;] ? kthread+0x0/0xe0

[  834.542850] LustreError: dumping log to /tmp/lustre-log.1510193654.35547
[  834.802579] Pid: 32862, comm: ll_ost_io00_019
[  834.819060] 
Call Trace:
[  834.835684]  [&amp;lt;ffffffff816a9589&amp;gt;] schedule+0x29/0x70
[  834.853784]  [&amp;lt;ffffffffc06b34d5&amp;gt;] cv_wait_common+0x125/0x150 [spl]
[  834.875745]  [&amp;lt;ffffffff810b1920&amp;gt;] ? autoremove_wake_function+0x0/0x40
[  834.898451]  [&amp;lt;ffffffffc06b3515&amp;gt;] __cv_wait+0x15/0x20 [spl]
[  834.918158]  [&amp;lt;ffffffffc1f26bfb&amp;gt;] dmu_tx_wait+0x20b/0x3c0 [zfs]
[  834.938932]  [&amp;lt;ffffffffc1f26e41&amp;gt;] dmu_tx_assign+0x91/0x490 [zfs]
[  834.959865]  [&amp;lt;ffffffffc0b52f37&amp;gt;] osd_trans_start+0xa7/0x3a0 [osd_zfs]
[  834.982424]  [&amp;lt;ffffffffc117620b&amp;gt;] ofd_trans_start+0x6b/0xe0 [ofd]
[  835.003465]  [&amp;lt;ffffffffc117c62b&amp;gt;] ofd_commitrw_write+0x94b/0x1c50 [ofd]
[  835.026153]  [&amp;lt;ffffffffc11805e9&amp;gt;] ofd_commitrw+0x4b9/0xac0 [ofd]
[  835.046815]  [&amp;lt;ffffffffc0f74677&amp;gt;] obd_commitrw+0x2ed/0x330 [ptlrpc]
[  835.068248]  [&amp;lt;ffffffffc0f47c41&amp;gt;] tgt_brw_write+0xff1/0x17c0 [ptlrpc]
[  835.090232]  [&amp;lt;ffffffff810cd7a4&amp;gt;] ? update_curr+0x104/0x190
[  835.110138]  [&amp;lt;ffffffff810c8f28&amp;gt;] ? __enqueue_entity+0x78/0x80
[  835.130802]  [&amp;lt;ffffffff810cf91c&amp;gt;] ? enqueue_entity+0x26c/0xb60
[  835.151423]  [&amp;lt;ffffffffc0e99f80&amp;gt;] ? target_bulk_timeout+0x0/0xb0 [ptlrpc]
[  835.175126]  [&amp;lt;ffffffffc0f43c55&amp;gt;] tgt_request_handle+0x925/0x1370 [ptlrpc]
[  835.199009]  [&amp;lt;ffffffffc0eeca3e&amp;gt;] ptlrpc_server_handle_request+0x24e/0xab0 [ptlrpc]
[  835.225389]  [&amp;lt;ffffffffc0ee9908&amp;gt;] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
[  835.248817]  [&amp;lt;ffffffff810c4832&amp;gt;] ? default_wake_function+0x12/0x20
[  835.270461]  [&amp;lt;ffffffff810ba598&amp;gt;] ? __wake_up_common+0x58/0x90
[  835.290685]  [&amp;lt;ffffffffc0ef01e2&amp;gt;] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
[  835.312227]  [&amp;lt;ffffffff81029557&amp;gt;] ? __switch_to+0xd7/0x510
[  835.331138]  [&amp;lt;ffffffff816a9000&amp;gt;] ? __schedule+0x350/0x8b0
[  835.350071]  [&amp;lt;ffffffffc0eef750&amp;gt;] ? ptlrpc_main+0x0/0x1e40 [ptlrpc]
[  835.371526]  [&amp;lt;ffffffff810b099f&amp;gt;] kthread+0xcf/0xe0
[  835.388344]  [&amp;lt;ffffffff810b08d0&amp;gt;] ? kthread+0x0/0xe0
[  835.405400]  [&amp;lt;ffffffff816b4fd8&amp;gt;] ret_from_fork+0x58/0x90
[  835.423854]  [&amp;lt;ffffffff810b08d0&amp;gt;] ? kthread+0x0/0xe0

[  835.446443] Pid: 29220, comm: ll_ost_io01_004
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment>soak lustre-master-ib #16 DNE  without reformat; previous build was #12</environment>
        <key id="49229">LU-10223</key>
            <summary>OSS hang </summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="sarah">Sarah Liu</reporter>
                        <labels>
                            <label>soak</label>
                    </labels>
                <created>Fri, 10 Nov 2017 00:37:16 +0000</created>
                <updated>Wed, 29 Aug 2018 21:37:00 +0000</updated>
                            <resolved>Wed, 29 Aug 2018 21:36:59 +0000</resolved>
                                    <version>Lustre 2.11.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                            <comments>
                            <comment id="232769" author="adilger" created="Wed, 29 Aug 2018 21:37:00 +0000"  >&lt;p&gt;Close as a duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10250&quot; title=&quot;replay-single test_74:  hang and timed out&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10250&quot;&gt;LU-10250&lt;/a&gt;, not much information here.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="49351">LU-10250</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="28619" name="lustre-log.1510193654.35547" size="2030300" author="sarah" created="Fri, 10 Nov 2017 03:22:27 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzznlb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>