<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:36:20 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-10578] request_in_callback() event type 2, status -103, service ost_io</title>
                <link>https://jira.whamcloud.com/browse/LU-10578</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We&apos;re seeing a recurrent crash on OSSes running 2.10.2 on Oak. Many of the following log messages can be seen before the crash:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;[928518.391343] LustreError: 325230:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io
[928518.401925] LustreError: 325230:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io
[928518.410543] LustreError: 325230:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Finally, the server crashes after an NMI watchdog is triggered:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;[928639.223483] NMI watchdog: Watchdog detected hard LOCKUP on cpu 24

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;See oak-io1-s1.vmcore-dmesg.txt for full log.&lt;/p&gt;

&lt;p&gt;After restarting the OSS, I can see the following in the logs:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;Jan 29 12:45:21 oak-io1-s1 kernel: LNet: Using FMR &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; registration
Jan 29 12:45:21 oak-io1-s1 kernel: LustreError: 325230:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io
Jan 29 12:45:21 oak-io1-s1 kernel: LustreError: 359716:0:(pack_generic.c:590:__lustre_unpack_msg()) message length 0 too small &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; magic/version check
Jan 29 12:45:21 oak-io1-s1 kernel: LustreError: 359716:0:(sec.c:2069:sptlrpc_svc_unwrap_request()) error unpacking request from 12345-10.0.2.223@o2ib5 x1590957016747328
Jan 29 12:45:21 oak-io1-s1 kernel: LustreError: 359716:0:(sec.c:2069:sptlrpc_svc_unwrap_request()) Skipped 1386 previous similar messages
Jan 29 12:45:21 oak-io1-s1 kernel: Lustre: oak-OST001c: Connection restored to 7cfcbcde-275c-eb1e-9911-bb9f7ea0c616 (at 10.0.2.223@o2ib5)
Jan 29 12:45:21 oak-io1-s1 kernel: LustreError: 325230:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io
Jan 29 12:45:21 oak-io1-s1 kernel: LustreError: 359716:0:(ldlm_lib.c:3247:target_bulk_io()) @@@ Reconnect on bulk WRITE  req@ffff881c45339850 x1590957016747344/t0(0) o4-&amp;gt;7f9fc76a-8204-2cde-2104-5efdf97586ca@10.0.2.223@o2ib5:162/0 lens 3440/1152 e 0 to 0 dl 1517258732 ref 1 fl Interpret:/0/0 rc 0/0
Jan 29 12:45:21 oak-io1-s1 kernel: LNet: 325230:0:(o2iblnd_cb.c:1350:kiblnd_reconnect_peer()) Abort reconnection of 10.0.2.223@o2ib5: connected

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&quot;message length 0 too small..&quot; can also be found in &lt;del&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9983&quot; title=&quot;LBUG llog_osd.c:327:llog_osd_declare_write_rec() - all DNE MDS&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9983&quot;&gt;&lt;del&gt;LU-9983&lt;/del&gt;&lt;/a&gt;&lt;/del&gt; (fixed in 2.10.3) and &lt;del&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10068&quot; title=&quot;OST fails to mount:LustreError: 14558:0:(pack_generic.c:588:__lustre_unpack_msg()) message length 0 too small for magic/version check&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10068&quot;&gt;&lt;del&gt;LU-10068&lt;/del&gt;&lt;/a&gt;&lt;/del&gt; (not clear if fully resolved).&lt;/p&gt;

&lt;p&gt;10.0.2.223@o2ib5 is oak-gw04, a lustre client running in a VM with SR-IOV on IB (mlx4) serving as a SMB gateway. It was been working fine (almost a year) as long as we were running 2.9, but we&apos;re now having this recurrent issue after upgrading clients and servers to the 2.10.x branch. For a bit I suspected the change on map_on_demand now set to 256 instead of 0, but after reading the changelogs, I don&apos;t think it should have such impact... We&apos;re using mlx4 on both clients and servers on Oak (but it&apos;s connected to several lnet routers with both mlx4 and mlx5 remotely.&lt;/p&gt;

&lt;p&gt;I tried to upgrade oak-gw04 to 2.10.3 RC1 and the same happened. I&apos;m now in the process of upgrading all Lustre servers to 2.10.3 RC1, because it should also fix another issue that we have, &lt;del&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10267&quot; title=&quot;Wrong poll() returned revents for changelog device&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10267&quot;&gt;&lt;del&gt;LU-10267&lt;/del&gt;&lt;/a&gt;&lt;/del&gt; - not related to this issue. I will update this ticket if the issue happens again with all servers running 2.10.3 RC1...&lt;/p&gt;

&lt;p&gt;A&#160;crash is usually occuring after a few hours/days&#160;when oak-gw04 is up. Note that I don&apos;t get any OSS crash if oak-gw04 stays down. We have numerous other VMs using SR-IOV and have no issue, only this one (serving SMB). This &quot;SMB gateway&quot; is experimental, our users love it but it&apos;s not critical for production.&lt;/p&gt;

&lt;p&gt;Any idea welcomed...&lt;/p&gt;

&lt;p&gt;Thanks!&lt;br/&gt;
 Stephane&lt;/p&gt;</description>
                <environment>Lustre 2.10.2, Lustre 2.10.3 RC1, CentOS 7.4</environment>
        <key id="50438">LU-10578</key>
            <summary>request_in_callback() event type 2, status -103, service ost_io</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="sharmaso">Sonia Sharma</assignee>
                                    <reporter username="sthiell">Stephane Thiell</reporter>
                        <labels>
                    </labels>
                <created>Tue, 30 Jan 2018 00:21:34 +0000</created>
                <updated>Fri, 13 Apr 2018 19:17:50 +0000</updated>
                            <resolved>Fri, 13 Apr 2018 19:17:50 +0000</resolved>
                                    <version>Lustre 2.10.2</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="219455" author="pjones" created="Tue, 30 Jan 2018 18:08:07 +0000"  >&lt;p&gt;Let us know how you get on&lt;/p&gt;</comment>
                            <comment id="219462" author="sthiell" created="Tue, 30 Jan 2018 19:55:01 +0000"  >&lt;p&gt;Sure. Just restarted everything under 2.10.3 RC1 and started&#160;the&#160;SMB gateway. Will update as needed.&lt;/p&gt;</comment>
                            <comment id="219600" author="sthiell" created="Wed, 31 Jan 2018 22:43:40 +0000"  >&lt;p&gt;Unfortunately,&#160;a first&#160;OSS crashed again, and logs there are pointing again to the SMB gateway 10.0.2.223@o2ib5):&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;[168210.670422] LustreError: 271224:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io
[168210.683576] LustreError: 271224:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io
[168210.695777] LustreError: 271224:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io
[168210.712729] LustreError: 271224:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io
[168210.719227] LustreError: 271224:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io
[168246.606295] INFO: rcu_sched detected stalls on CPUs/tasks: { 18} (detected by 7, t=60002 jiffies, g=7390585, c=7390584, q=494642)
[168246.606296] Task dump &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; CPU 18:
[168246.606300] ll_ost_io00_103 R  running task        0 284327      2 0x00000088
[168246.606304]  ffffffff810ea5fc ffff881cabe1bd28 ffffffff81099f95 ffff881c6214d458
[168246.606306]  ffff88015329dee0 ffff881c32960b40 ffff8801b6c00000 ffff8801b6c00000
[168246.606308]  ffff881c3247f280 ffff883dcc6b4e00 ffff881d53a0fc00 ffff881cabe1bd90
[168246.606309] Call Trace:
[168246.606318]  [&amp;lt;ffffffff810ea5fc&amp;gt;] ? ktime_get+0x4c/0xd0
[168246.606324]  [&amp;lt;ffffffff81099f95&amp;gt;] ? mod_timer+0x185/0x220
[168246.606348]  [&amp;lt;ffffffffc0842bc7&amp;gt;] ? libcfs_debug_msg+0x57/0x80 [libcfs]
[168246.606419]  [&amp;lt;ffffffffc0bbd9e8&amp;gt;] ? __lustre_unpack_msg+0x88/0x430 [ptlrpc]
[168246.606482]  [&amp;lt;ffffffffc0bf0273&amp;gt;] ? sptlrpc_svc_unwrap_request+0x73/0x600 [ptlrpc]
[168246.606536]  [&amp;lt;ffffffffc0bd1115&amp;gt;] ? ptlrpc_main+0x955/0x1e40 [ptlrpc]
[168246.606588]  [&amp;lt;ffffffffc0bd07c0&amp;gt;] ? ptlrpc_register_service+0xe30/0xe30 [ptlrpc]
[168246.606592]  [&amp;lt;ffffffff810b098f&amp;gt;] ? kthread+0xcf/0xe0
[168246.606595]  [&amp;lt;ffffffff810b08c0&amp;gt;] ? insert_kthread_work+0x40/0x40
[168246.606600]  [&amp;lt;ffffffff816b4f58&amp;gt;] ? ret_from_fork+0x58/0x90
[168246.606602]  [&amp;lt;ffffffff810b08c0&amp;gt;] ? insert_kthread_work+0x40/0x40
[168260.719089] Lustre: oak-OST0002: Client b12d2aa6-e7f0-54da-6947-a475d16c533a (at 10.0.2.223@o2ib5) reconnecting
[168260.719091] Lustre: Skipped 2387 previous similar messages
[168260.720910] LustreError: 271224:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io
[168260.720939] LustreError: 284292:0:(pack_generic.c:590:__lustre_unpack_msg()) message length 0 too small &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; magic/version check
[168260.720941] LustreError: 284292:0:(pack_generic.c:590:__lustre_unpack_msg()) Skipped 2660 previous similar messages
[168260.720944] LustreError: 284292:0:(sec.c:2069:sptlrpc_svc_unwrap_request()) error unpacking request from 12345-10.0.2.223@o2ib5 x1591039248701024
[168260.720945] LustreError: 284292:0:(sec.c:2069:sptlrpc_svc_unwrap_request()) Skipped 2416 previous similar messages
[168285.720074] Lustre: oak-OST0002: Connection restored to  (at 10.0.2.223@o2ib5)
[168285.720076] Lustre: Skipped 2662 previous similar messages
[168285.722111] LustreError: 271224:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io
[168285.729113] LustreError: 271224:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io
[168285.735719] LustreError: 271224:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io
[168285.742268] LustreError: 271224:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io
[168285.749380] LustreError: 271224:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io


&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;crash:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;[168411.523150] LustreError: 271224:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io
[168411.533367] LustreError: 271224:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io
[168411.543574] LustreError: 271224:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io
[168411.554742] LustreError: 271224:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io
[168411.565872] LustreError: 271224:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io
[168411.576389] LustreError: 271224:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io
[168411.587405] LustreError: 271224:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io
[168411.598215] LustreError: 271224:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io
[168411.605427] NMI watchdog: Watchdog detected hard LOCKUP on cpu 26
[168411.605458] Modules linked in: osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) raid456 async_raid6_recov async_memcpy async_pq raid6_pq libcrc32c async_xor xor async_tx ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) vfat fat uas usb_storage mpt2sas mptctl mptbase dell_rbu rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_core iTCO_wdt iTCO_vendor_support mxm_wmi dcdbas sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr lpc_ich mei_me mei dm_service_time ses enclosure sg ipmi_si ipmi_devintf ipmi_msghandler
[168411.605474]  shpchp wmi acpi_power_meter nfsd dm_multipath dm_mod auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mlx4_en i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mlx4_core drm tg3 ahci crct10dif_pclmul crct10dif_common mpt3sas libahci crc32c_intel ptp raid_class libata megaraid_sas i2c_core devlink scsi_transport_sas pps_core
[168411.605476] CPU: 26 PID: 280365 Comm: lc_watchdogd Tainted: G           OE  ------------   3.10.0-693.2.2.el7_lustre.pl1.x86_64 #1


&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After the OSTs failover, the OSS pair crashed too:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;[101277.229184] LustreError: 144304:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io
[101277.240162] LustreError: 144304:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io
[101277.240175] LustreError: 181754:0:(pack_generic.c:590:__lustre_unpack_msg()) message length 0 too small &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; magic/version check
[101277.240178] LustreError: 181754:0:(pack_generic.c:590:__lustre_unpack_msg()) Skipped 10158 previous similar messages
[101277.240182] LustreError: 181754:0:(sec.c:2069:sptlrpc_svc_unwrap_request()) error unpacking request from 12345-10.0.2.223@o2ib5 x1591039248701024
[101277.240183] LustreError: 181754:0:(sec.c:2069:sptlrpc_svc_unwrap_request()) Skipped 10158 previous similar messages
[101277.255135] LustreError: 144304:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io
[101277.269872] LustreError: 144304:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io


&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;crash:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;...
[101298.937807] LustreError: 144304:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io
[101298.952543] LustreError: 144304:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io
[101298.966895] LustreError: 144304:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io
[101298.980745] LustreError: 144304:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io
[101298.994897] LustreError: 144304:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io
[101299.005048] LustreError: 144304:0:(events.c:304:request_in_callback()) event type 2, status -103, service ost_io
[101299.005530] NMI watchdog: Watchdog detected hard LOCKUP on cpu 7
[101299.005574] Modules linked in: osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) raid456 async_raid6_recov async_memcpy async_pq raid6_pq libcrc32c async_xor xor async_tx ldiskfs(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ptlrpc(OE) ko2iblnd(OE) obdclass(OE) lnet(OE) libcfs(OE) ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm uas vfat fat iTCO_wdt iTCO_vendor_support usb_storage mpt2sas mptctl mptbase rpcsec_gss_krb5 nfsv4 dell_rbu dns_resolver nfs fscache mxm_wmi dcdbas sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr mlx4_en mlx4_ib ib_core mlx4_core devlink lpc_ich mei_me mei dm_service_time ses enclosure sg ipmi_si
[101299.005593]  ipmi_devintf ipmi_msghandler shpchp wmi acpi_power_meter dm_multipath dm_mod nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops tg3 ttm ahci crct10dif_pclmul crct10dif_common drm mpt3sas libahci crc32c_intel ptp libata megaraid_sas i2c_core raid_class pps_core scsi_transport_sas
[101299.005596] CPU: 7 PID: 148188 Comm: lc_watchdogd Tainted: G           OE  ------------   3.10.0-693.2.2.el7_lustre.pl1.x86_64 #1
...


&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;No logs could be seen on the SMB gateway itself (the client),&#160;just OST disconnecting when the OSSes crashed. We&apos;ve now shut down the SMB gateway to avoid further service interruptions.This is super weird. All of the above are running 2.10.3 RC1.&lt;/p&gt;

&lt;p&gt;Stephane&lt;/p&gt;</comment>
                            <comment id="220028" author="sthiell" created="Mon, 5 Feb 2018 22:53:06 +0000"  >&lt;p&gt;The server crash itself could be caused by an issue with the serial console, similar to the one described in&#160;&lt;del&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7886&quot; title=&quot;Hard lockup from debug logging&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7886&quot;&gt;&lt;del&gt;LU-7886&lt;/del&gt;&lt;/a&gt;&lt;/del&gt;. We&apos;re investigating this possibility. Although, this wouldn&apos;t explain&#160;the flood of&#160;&lt;tt&gt;request_in_callback&lt;/tt&gt; error messages and also &lt;tt&gt;sptlrpc_svc_unwrap_request() error unpacking request from 12345-10.0.2.223@o2ib5&lt;/tt&gt; seen in the first place.&lt;/p&gt;</comment>
                            <comment id="223846" author="sthiell" created="Fri, 16 Mar 2018 14:54:24 +0000"  >&lt;p&gt;Just to note that this issue occurred on another Lustre client, also in a VM using SR-IOV for IB, but not exporting the filesystem using SAMBA. So likely not related to SMB. So now I wonder if the issue is related to IB/SR-IOV?&lt;/p&gt;</comment>
                            <comment id="223957" author="pjones" created="Mon, 19 Mar 2018 17:14:39 +0000"  >&lt;p&gt;Sonia&lt;/p&gt;

&lt;p&gt;Can you please investigate?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="223958" author="jhammond" created="Mon, 19 Mar 2018 17:15:16 +0000"  >&lt;p&gt;Sonia,&lt;/p&gt;

&lt;p&gt;It looks like errno 103 (ECONNABORTED) could only be comeing from o2iblnd in this case. Could you take a look?&lt;/p&gt;</comment>
                            <comment id="224067" author="sharmaso" created="Tue, 20 Mar 2018 19:33:06 +0000"  >&lt;p&gt;Hi Stephane&lt;/p&gt;

&lt;p&gt;Looks like the neterr logging is not turned on . Can you enable that and provide the logs from /var/log/messages and dmesg output when the issue occur again? It would show the network related errors occurring when we see ECONNABORTED.&lt;/p&gt;

&lt;p&gt;what does the configuration looks like on the servers, clients and routers , specifically on oak-gw04?&lt;/p&gt;

&lt;p&gt;Can you run&#160; below command on the nodes and show the output.&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;lnetctl net show -v
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Thanks&lt;/p&gt;</comment>
                            <comment id="224070" author="sthiell" created="Tue, 20 Mar 2018 20:32:35 +0000"  >&lt;p&gt;Hi Sonia,&lt;/p&gt;

&lt;p&gt;Thanks for looking at this! I have enabled neterror logging on oak-gw04 and servers. I&apos;ll try to reproduce the error, but we are trying to reduce the impact for production so have temporarily limited access to the SMB gateway. I&apos;ll see what I can do to make it happens again...&lt;/p&gt;

&lt;p&gt;Note: oak-gw04 is on same same lustre network than the servers (o2ib5), so routers between them.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@oak-gw04 ~]# lnetctl net show -v
net:
    - net type: lo
      local NI(s):
        - nid: 0@lo
          status: up
          statistics:
              send_count: 0
              recv_count: 0
              drop_count: 0
          tunables:
              peer_timeout: 0
              peer_credits: 0
              peer_buffer_credits: 0
              credits: 0
          lnd tunables:
          tcp bonding: 0
          dev cpt: 0
          CPT: &quot;[0]&quot;
    - net type: o2ib5
      local NI(s):
        - nid: 10.0.2.223@o2ib5
          status: up
          interfaces:
              0: ib0
          statistics:
              send_count: 203
              recv_count: 203
              drop_count: 0
          tunables:
              peer_timeout: 180
              peer_credits: 8
              peer_buffer_credits: 0
              credits: 256
          lnd tunables:
              peercredits_hiw: 4
              map_on_demand: 256
              concurrent_sends: 8
              fmr_pool_size: 512
              fmr_flush_trigger: 384
              fmr_cache: 1
              ntx: 512
              conns_per_peer: 1
          tcp bonding: 0
          dev cpt: -1
          CPT: &quot;[0]&quot;

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Servers (I checked all of them and they all look the same, just the NIDs and counters change):&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@oak-io1-s1 ~]# lnetctl net show -v
net:
    - net type: lo
      local NI(s):
        - nid: 0@lo
          status: up
          statistics:
              send_count: 0
              recv_count: 0
              drop_count: 0
          tunables:
              peer_timeout: 0
              peer_credits: 0
              peer_buffer_credits: 0
              credits: 0
          lnd tunables:
          tcp bonding: 0
          dev cpt: 0
          CPT: &quot;[0,1]&quot;
    - net type: o2ib5
      local NI(s):
        - nid: 10.0.2.101@o2ib5
          status: up
          interfaces:
              0: ib0
          statistics:
              send_count: 361208357
              recv_count: 310772624
              drop_count: 0
          tunables:
              peer_timeout: 180
              peer_credits: 8
              peer_buffer_credits: 0
              credits: 256
          lnd tunables:
              peercredits_hiw: 4
              map_on_demand: 256
              concurrent_sends: 8
              fmr_pool_size: 512
              fmr_flush_trigger: 384
              fmr_cache: 1
              ntx: 512
              conns_per_peer: 1
          tcp bonding: 0
          dev cpt: 1
          CPT: &quot;[0,1]&quot;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Also IB HBA firmware is recent, 2.42.5000. Cards are Mellanox mlx4 FDR MT_1100120019&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Stephane&#160;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="224340" author="sthiell" created="Thu, 22 Mar 2018 22:13:33 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;Ah, we just got another occurrence of the issue from another client!&lt;/p&gt;

&lt;p&gt;Attaching:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;logs from the client oak-gw22 (10.0.2.242@o2ib5) as&#160;oak-gw22_lustre.log.gz&lt;/li&gt;
	&lt;li&gt;logs from the impacted OSS oak-io1-s2 (10.0.2.102@o2ib5) as oak-io1-s2_lustre.log.gz&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Does that help?&lt;/p&gt;

&lt;p&gt;Thanks!&lt;/p&gt;

&lt;p&gt;Stephane&lt;/p&gt;</comment>
                            <comment id="224342" author="sthiell" created="Thu, 22 Mar 2018 22:25:47 +0000"  >&lt;p&gt;Note that the OSS didn&apos;t crash this time probably because we completely disabled the serial console.&lt;/p&gt;</comment>
                            <comment id="224363" author="sthiell" created="Fri, 23 Mar 2018 02:56:36 +0000"  >&lt;p&gt;Happened again. Attached&#160;oak-gw22_lustre_neterror.log this time with neterror:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000100:02000000:0.0F:1521773263.363016:0:1734:0:(import.c:1542:ptlrpc_import_recovery_state_machine()) oak-OST0013-osc-ffff8800b8462000: Connection restored to 10.0.2.102@o2ib5 (at 10.0.2.102@o2ib5)
00000800:00000100:3.0F:1521773263.363190:0:1675:0:(o2iblnd_cb.c:3470:kiblnd_complete()) FastReg failed: 6
00000800:00000100:3.0:1521773263.363193:0:1675:0:(o2iblnd_cb.c:3481:kiblnd_complete()) RDMA (tx: ffffc90001d095d8) failed: 5
00000800:00000100:3.0:1521773263.363196:0:1675:0:(o2iblnd_cb.c:967:kiblnd_tx_complete()) Tx -&amp;gt; 10.0.2.102@o2ib5 cookie 0x2671f sending 1 waiting 0: failed 5
00000800:00000100:3.0:1521773263.363199:0:1675:0:(o2iblnd_cb.c:1914:kiblnd_close_conn_locked()) Closing conn to 10.0.2.102@o2ib5: error -5(waiting)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="224364" author="sthiell" created="Fri, 23 Mar 2018 03:40:37 +0000"  >&lt;p&gt;Apparently this issue is already known... reported by James Simmons in&#160;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9810&quot; title=&quot;Melanox OFED 4.1 support&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9810&quot;&gt;&lt;del&gt;LU-9810&lt;/del&gt;&lt;/a&gt; last September.. is there a patch? I see that&#160;&#160;&lt;a href=&quot;https://review.whamcloud.com/28279/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/28279/&lt;/a&gt;&#160;has been merged on master but can&apos;t find it in 2.10. Could you please advise as this has a big impact on our users, thanks!!&lt;/p&gt;</comment>
                            <comment id="224406" author="sthiell" created="Fri, 23 Mar 2018 15:11:58 +0000"  >&lt;p&gt;Hello,&lt;/p&gt;

&lt;p&gt;I applied &lt;a href=&quot;https://review.whamcloud.com/#/c/28279/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/#/c/28279/&lt;/a&gt;&#160;on the client last night just in case, but this &quot;FastReg failed: 6&quot; error is still reproducible under load. Is there a workaround to avoid FastReg at all? This error has appeared in 2.10 for us, not seen before when using Lustre 2.9. Thanks much.&lt;/p&gt;</comment>
                            <comment id="224410" author="sharmaso" created="Fri, 23 Mar 2018 15:31:01 +0000"  >&lt;p&gt;Hi Stephane,&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9983&quot; title=&quot;LBUG llog_osd.c:327:llog_osd_declare_write_rec() - all DNE MDS&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9983&quot;&gt;&lt;del&gt;LU-9983&lt;/del&gt;&lt;/a&gt; had 3 patches with it. Among those 3, only 2 patches were applied to 2.10 and &lt;a href=&quot;https://review.whamcloud.com/29290/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/29290/&lt;/a&gt;&#160;was not applied to 2.10 because it was thought that the other two patches can take care of the issue observed with the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9810&quot; title=&quot;Melanox OFED 4.1 support&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9810&quot;&gt;&lt;del&gt;LU-9810&lt;/del&gt;&lt;/a&gt;&#160;lnet: prefer Fast Reg patch. &lt;/p&gt;

&lt;p&gt;Basically the issues were seen with this patch&#160;&#160;&lt;a href=&quot;https://review.whamcloud.com/#/c/28278/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/#/c/28278/&lt;/a&gt;&#160;.You can try to revert it and see if it helps.&#160;&lt;/p&gt;

&lt;p&gt;Or you can try to apply&#160;&lt;a href=&quot;https://review.whamcloud.com/29290/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/29290/&lt;/a&gt;&#160; as well from &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9983&quot; title=&quot;LBUG llog_osd.c:327:llog_osd_declare_write_rec() - all DNE MDS&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9983&quot;&gt;&lt;del&gt;LU-9983&lt;/del&gt;&lt;/a&gt;. But this patch triggered another issue that is taken care of in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10129&quot; title=&quot;map-on-demand set to 32 doesn&amp;#39;t work on OPA&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10129&quot;&gt;&lt;del&gt;LU-10129&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="224412" author="sharmaso" created="Fri, 23 Mar 2018 15:39:21 +0000"  >&lt;p&gt;My bad. I see&#160;&#160;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9810&quot; title=&quot;Melanox OFED 4.1 support&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9810&quot;&gt;&lt;del&gt;LU-9810&lt;/del&gt;&lt;/a&gt;&#160;lnet: prefer Fast Reg patch is already reverted and is not there in 2.10.&lt;/p&gt;</comment>
                            <comment id="224415" author="sthiell" created="Fri, 23 Mar 2018 15:56:35 +0000"  >&lt;p&gt;Thanks much Sonia! I&apos;ll check. One thing sounds weird because my 2.10.3 clients (using lustre-client RPMs) start with&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt; kernel: LNet: Using FastReg for registration
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;whereas my Lustre 2.10.3 servers start with:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;kernel: LNet: Using FMR for registration
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Both are using mlx4... hmm...&lt;/p&gt;</comment>
                            <comment id="224418" author="sharmaso" created="Fri, 23 Mar 2018 16:05:14 +0000"  >&lt;p&gt;Yes this shouldn&apos;t be the case for clients. How are you getting lustre-client RPMs, building from source?&lt;/p&gt;</comment>
                            <comment id="224421" author="sthiell" created="Fri, 23 Mar 2018 16:37:01 +0000"  >&lt;p&gt;I tried both ways actually: building for source (kmod) and with Intel provided RPMs (with lustre-client-dkms).&lt;br/&gt;
I wonder if this is related to SR-IOV? On other clients using mlx4 but not VFs/slaves, I can see &quot;LNet: Using FMR for registration&quot;.&lt;/p&gt;

&lt;p&gt;On this old thread, FMR doesn&apos;t seem supported on slaves:&lt;br/&gt;
&lt;a href=&quot;https://www.spinics.net/lists/linux-rdma/msg22014.html&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://www.spinics.net/lists/linux-rdma/msg22014.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Any  idea? Thanks!&lt;/p&gt;</comment>
                            <comment id="224422" author="sthiell" created="Fri, 23 Mar 2018 16:42:22 +0000"  >&lt;p&gt;I bet there was a change in the way FastReg is handled in 2.10 that is breaking the use of FastReg on mlx4 slaves...&lt;/p&gt;</comment>
                            <comment id="224433" author="sthiell" created="Fri, 23 Mar 2018 17:45:19 +0000"  >&lt;p&gt;I applied &lt;a href=&quot;https://review.whamcloud.com/29290/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/29290/&lt;/a&gt;  on top of 2.10.3, and I am still able to reproduce the issue.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;00000100:02000000:1.0:1521826954.348730:0:1622:0:(import.c:1542:ptlrpc_import_recovery_state_machine()) oak-OST0012-osc-ffff8800bb5c9000: Connection restored to 10.0.2.101@o2ib5 (at 10.0.2.101@o2ib5)
00000800:00000100:0.0:1521826954.348944:0:1558:0:(o2iblnd_cb.c:3476:kiblnd_complete()) FastReg failed: 6
00000800:00000100:0.0:1521826954.348946:0:1558:0:(o2iblnd_cb.c:3487:kiblnd_complete()) RDMA (tx: ffffc90001d60940) failed: 5
00000800:00000100:0.0:1521826954.348949:0:1558:0:(o2iblnd_cb.c:973:kiblnd_tx_complete()) Tx -&amp;gt; 10.0.2.101@o2ib5 cookie 0x25ef sending 1 waiting 0: failed 5
00000800:00000100:0.0:1521826954.348951:0:1558:0:(o2iblnd_cb.c:1920:kiblnd_close_conn_locked()) Closing conn to 10.0.2.101@o2ib5: error -5(waiting)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="224434" author="sharmaso" created="Fri, 23 Mar 2018 18:02:34 +0000"  >&lt;p&gt;Yes it wouldn&apos;t resolve. I was mistaken earlier and saw that &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9810&quot; title=&quot;Melanox OFED 4.1 support&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9810&quot;&gt;&lt;del&gt;LU-9810&lt;/del&gt;&lt;/a&gt; is already reverted in 2.10.&#160;&lt;br/&gt;
Now the issue is that why in the absence of &quot;Prefer Fast-Reg&quot; patch, mlx4 clients are still using Fast-Reg while the servers are using FMR.&#160;&lt;/p&gt;

&lt;p&gt;And if it is because of SR-IOV then I am thinking that you said the issues did not appear for Lustre 2.9 with same SR-IOV mlx4 clients.&lt;/p&gt;

&lt;p&gt;I am looking into the changes from 2.9 to 2.10 that might be related to this.&lt;/p&gt;</comment>
                            <comment id="224436" author="sharmaso" created="Fri, 23 Mar 2018 18:21:56 +0000"  >&lt;p&gt;Stephane&#160;&lt;/p&gt;

&lt;p&gt;Can you confirm that it is with same kernel and MOFED versions that you are seeing this issue in Lustre 2.10 but not in 2.9?&lt;/p&gt;</comment>
                            <comment id="224437" author="sthiell" created="Fri, 23 Mar 2018 18:26:30 +0000"  >&lt;p&gt;Hi Sonia,&lt;/p&gt;

&lt;p&gt;We&apos;re using the in-kernel OFED stack on clients and servers.&lt;/p&gt;

&lt;p&gt;I cannot confirm this as another major change between our Lustre 2.9 and 2.10 client setup is actually the kernel version ( CentOS 7.3 to 7.4) and the removal of&#160;ib_get_dma_mr in the in-kernel OFED stack in 7.4. For example, I&apos;m not able to build Lustre 2.9 on CentOS 7.4 due to this issue. Now, I could try with MOFED maybe...&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="224438" author="sthiell" created="Fri, 23 Mar 2018 18:32:18 +0000"  >&lt;p&gt;Also, I was checking the documentation of our Infiniband FDR HBAs (PSID&#160;MT_1100120019), at &lt;a href=&quot;http://www.mellanox.com/pdf/firmware/ConnectX3-FW-2_42_5000-release_notes.pdf&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://www.mellanox.com/pdf/firmware/ConnectX3-FW-2_42_5000-release_notes.pdf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;and there is mention of &quot;FMR for SRIOV&quot; in Beta support in quite old firmware (2.11.0500, and we have the latest fw 2.42.5000). So I do have the feeling it should be supported. In that case, I&apos;m wondering why Lustre isn&apos;t picking up FMR.&lt;/p&gt;</comment>
                            <comment id="224460" author="sthiell" created="Fri, 23 Mar 2018 22:38:03 +0000"  >&lt;p&gt;Hi Sonia,&lt;/p&gt;

&lt;p&gt;I&apos;m testing right now with Lustre 2.10.3 + MOFED&#160;4.2-1.2.0.0 (el7.4) in the VM (guest OS). The reproducer doesn&apos;t seem to trigger the issue so far. Fingers crossed!&lt;/p&gt;

&lt;p&gt;Note: in that case, when loading LNet / Lustre, there is no FMR nor FastReg message at all, just this:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[   24.306279] LNet: HW NUMA nodes: 1, HW CPU cores: 4, npartitions: 1
[   24.310169] alg: No test for adler32 (adler32-zlib)
[   24.310935] alg: No test for crc32 (crc32-table)
[   25.141828] Lustre: Lustre: Build Version: 2.10.3_MOFED
[   25.282150] LNet: Added LNI 10.0.2.242@o2ib5 [8/256/0/180]
[   50.563366] Lustre: Mounted oak-client
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="224751" author="sthiell" created="Wed, 28 Mar 2018 18:56:15 +0000"  >&lt;p&gt;Hi Sonia,&lt;/p&gt;

&lt;p&gt;The VMs using MOFED are now working fine. I wasn&apos;t able to reproduce the issue. We&apos;re still running some tests but that sounds good. We now believe the issue was introduced in RHEL/CentOS 7.4. We didn&apos;t see this with Lustre 2.9 because we were still using CentOS 7.3 or less and it used to worked well. I don&apos;t know the details of what have changed in OFA/EL7.4 but I&apos;ll try to open a Red Hat ticket when I get some time, but our solution going forward is to use MOFED 4.2 on the VMs that use SR-IOV over mlx4.&lt;/p&gt;

&lt;p&gt;Thanks much.&lt;/p&gt;

&lt;p&gt;Stephane&lt;/p&gt;</comment>
                            <comment id="224763" author="sharmaso" created="Wed, 28 Mar 2018 20:22:12 +0000"  >&lt;p&gt;Thanks Stephane for the update. I will keep a note of this as well.&lt;/p&gt;</comment>
                            <comment id="225957" author="pjones" created="Fri, 13 Apr 2018 03:53:21 +0000"  >&lt;p&gt;Stephane&lt;/p&gt;

&lt;p&gt;Do you need anything further on this ticket?&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="226017" author="sthiell" created="Fri, 13 Apr 2018 18:49:48 +0000"  >&lt;p&gt;Hi Peter &#8211; no thanks, we&apos;re good, using MOFED 4.2 solved our issue, everything is working well on our Oak client VMs. It&apos;s up to you (Intel) if you want to investigate why CentOS 7.4 with in-kernel OFA + SR-IOV + Lustre 2.10 doesn&apos;t work under I/O load. Might not be worth it, esp. that RHEL 7.5 is now out and probably come with updated OFA.&lt;/p&gt;

&lt;p&gt;Thanks for your help in finding the root cause!&lt;/p&gt;

&lt;p&gt;Stephane&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="226022" author="pjones" created="Fri, 13 Apr 2018 19:17:50 +0000"  >&lt;p&gt;ok - thanks Stephane. I think upwards and onwards &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="29294" name="oak-gw04.kernel.log" size="53622" author="sthiell" created="Tue, 30 Jan 2018 00:21:30 +0000"/>
                            <attachment id="29889" name="oak-gw22_lustre.log.gz" size="163996" author="sthiell" created="Thu, 22 Mar 2018 22:13:43 +0000"/>
                            <attachment id="29892" name="oak-gw22_lustre_neterror.log" size="5115539" author="sthiell" created="Fri, 23 Mar 2018 02:54:36 +0000"/>
                            <attachment id="29295" name="oak-io1-s1.vmcore-dmesg.txt" size="1056979" author="sthiell" created="Tue, 30 Jan 2018 00:21:15 +0000"/>
                            <attachment id="29890" name="oak-io1-s2_lustre.log.gz" size="5429466" author="sthiell" created="Thu, 22 Mar 2018 22:13:45 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzrv3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>