<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:38:38 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-10839] obdfilter-survey test 1a  hangs with soft lockup on a client - socknal_cd</title>
                <link>https://jira.whamcloud.com/browse/LU-10839</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;obdfilter-survey test_1a hangs. For the failure at &lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/2a6d8a68-2d45-11e8-b74b-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/2a6d8a68-2d45-11e8-b74b-52540065bddc&lt;/a&gt;, the last thing seen in the client test_log is&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;== obdfilter-survey test 1a: Object Storage Targets survey =========================================== 17:00:36 (1521651636)
Unable to detect ip address for host: &apos;&apos;
+ NETTYPE=tcp thrlo=8 nobjhi=1 thrhi=16 size=1024 case=disk rslt_loc=/tmp targets=&quot;10.9.6.3:lustre-OST0000 10.9.6.3:lustre-OST0001 10.9.6.3:lustre-OST0002 10.9.6.3:lustre-OST0003 10.9.6.3:lustre-OST0004 10.9.6.3:lustre-OST0005 10.9.6.3:lustre-OST0006 10.9.6.3:lustre-OST0007&quot; /usr/bin/obdfilter-survey
Warning: Permanently added &apos;10.9.6.3&apos; (ECDSA) to the list of known hosts.
Wed Mar 21 17:00:46 UTC 2018 Obdfilter-survey for case=disk from trevis-43vm1.trevis.hpdd.intel.com
ost&#160; 8 sz&#160; 8388608K rsz 1024K obj&#160;&#160;&#160; 8 thr&#160;&#160; 64 write&#160;&#160; 39.97 [&#160;&#160; 0.00,&#160;&#160; 11.00] rewrite&#160;&#160; 48.09 [&#160;&#160; 0.00,&#160;&#160; 17.00] read&#160;&#160; 91.69 [&#160;&#160; 0.00,&#160; 102.99]
ost&#160; 8 sz&#160; 8388608K rsz 1024K obj&#160;&#160;&#160; 8 thr&#160; 128 write&#160;&#160; 33.90 [&#160;&#160; 0.00,&#160;&#160; 27.00] rewrite&#160;&#160; 46.32 [&#160;&#160; 0.00,&#160;&#160; 22.00] read
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;On the client console we see&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[ 4793.948275] LustreError: 166-1: MGC10.9.6.4@tcp: Connection to MGS (at 10.9.6.4@tcp) was lost; in progress operations using this service will fail
[ 4997.543363] INFO: rcu_sched self-detected stall on CPU { 0
[ 4997.544362] INFO: rcu_sched detected stalls on CPUs/tasks: \{ 0} (detected by 1, t=213158 jiffies, g=449683, c=449682, q=159)
[ 4997.544362] Task dump for CPU 0:
[ 4997.544362] kworker/0:2&#160;&#160;&#160;&#160; R&#160; running task&#160;&#160;&#160;&#160;&#160;&#160;&#160; 0&#160;&#160;&#160; 46&#160;&#160;&#160;&#160; &#160;2 0x00000008
[ 4997.544339] }&#160; (t=213158 jiffies g=449683 c=449682 q=159)
[ 4997.544362] Call Trace:
[ 4997.544362]&#160; [&amp;lt;ffffffff816b3b0c&amp;gt;] ? __schedule+0x47c/0xa30
[ 4997.544362]&#160; [&amp;lt;ffffffff810abeca&amp;gt;] ? process_one_work+0x21a/0x440
[ 4997.544362]&#160; [&amp;lt;ffffffff816b40e9&amp;gt;] schedule+0x29/0x70
[ 4997.544362]&#160; [&amp;lt;ffffffff810acba9&amp;gt;] worker_thread+0x1d9/0x3c0
[ 4997.544362]&#160; [&amp;lt;ffffffff810ac9d0&amp;gt;] ? manage_workers.isra.24+0x2a0/0x2a0
[ 4997.544362]&#160; [&amp;lt;ffffffff810b4031&amp;gt;] kthread+0xd1/0xe0
[ 4997.544362]&#160; [&amp;lt;ffffffff810b3f60&amp;gt;] ? insert_kthread_work+0x40/0x40
[ 4997.544362]&#160; [&amp;lt;ffffffff816c0577&amp;gt;] ret_from_fork+0x77/0xb0
[ 4997.544362]&#160; [&amp;lt;ffffffff810b3f60&amp;gt;] ? insert_kthread_work+0x40/0x40
[ 4997.544339] Task dump for CPU 0:
[ 4997.544339] kworker/0:2&#160;&#160;&#160;&#160; R&#160; running task&#160;&#160;&#160;&#160;&#160;&#160;&#160; 0&#160;&#160;&#160; 46&#160;&#160;&#160;&#160;&#160; 2 0x00000008
[ 4997.544339] Call Trace:
[ 4997.544339]&#160; &amp;lt;IRQ&amp;gt;&#160; [&amp;lt;ffffffff810c72c8&amp;gt;] sched_show_task+0xa8/0x110
[ 4997.544339]&#160; [&amp;lt;ffffffff810cade9&amp;gt;] dump_cpu_task+0x39/0x70
[ 4997.544339]&#160; [&amp;lt;ffffffff8113baf0&amp;gt;] rcu_dump_cpu_stacks+0x90/0xd0
[ 4997.544339]&#160; [&amp;lt;ffffffff8113f192&amp;gt;] rcu_check_callbacks+0x452/0x740
[ 4997.544339]&#160; [&amp;lt;ffffffff810eef7c&amp;gt;] ? update_wall_time+0x26c/0x6c0
[ 4997.544339]&#160; [&amp;lt;ffffffff810f6c00&amp;gt;] ? tick_sched_do_timer+0x50/0x50
[ 4997.544339]&#160; [&amp;lt;ffffffff8109d886&amp;gt;] update_process_times+0x46/0x80
[ 4997.544339]&#160; [&amp;lt;ffffffff810f6a00&amp;gt;] tick_sched_handle+0x30/0x70
[ 4997.544339]&#160; [&amp;lt;ffffffff810f6c39&amp;gt;] tick_sched_timer+0x39/0x80
[ 4997.544339]&#160; [&amp;lt;ffffffff810b8196&amp;gt;] __hrtimer_run_queues+0xd6/0x260
[ 4997.544339]&#160; [&amp;lt;ffffffff810b872f&amp;gt;] hrtimer_interrupt+0xaf/0x1d0
[ 4997.544339]&#160; [&amp;lt;ffffffff8105467b&amp;gt;] local_apic_timer_interrupt+0x3b/0x60
[ 4997.544339]&#160; [&amp;lt;ffffffff816c4e73&amp;gt;] smp_apic_timer_interrupt+0x43/0x60
[ 4997.544339]&#160; [&amp;lt;ffffffff816c1732&amp;gt;] apic_timer_interrupt+0x162/0x170
[ 4997.544339]&#160; &amp;lt;EOI&amp;gt;&#160; [&amp;lt;ffffffff810335b9&amp;gt;] ? sched_clock+0x9/0x10
[ 4997.544339]&#160; [&amp;lt;ffffffff810c28a4&amp;gt;] ? finish_task_switch+0x54/0x170
[ 4997.544339]&#160; [&amp;lt;ffffffff816b3b0c&amp;gt;] __schedule+0x47c/0xa30
[ 4997.544339]&#160; [&amp;lt;ffffffff810abeca&amp;gt;] ? process_one_work+0x21a/0x440
[ 4997.544339]&#160; [&amp;lt;ffffffff816b40e9&amp;gt;] schedule+0x29/0x70
[ 4997.544339]&#160; [&amp;lt;ffffffff810acba9&amp;gt;] worker_thread+0x1d9/0x3c0
[ 4997.544339]&#160; [&amp;lt;ffffffff810ac9d0&amp;gt;] ? manage_workers.isra.24+0x2a0/0x2a0
[ 4997.544339]&#160; [&amp;lt;ffffffff810b4031&amp;gt;] kthread+0xd1/0xe0
[ 4997.544339]&#160; [&amp;lt;ffffffff810b3f60&amp;gt;] ? insert_kthread_work+0x40/0x40
[ 4997.544339]&#160; [&amp;lt;ffffffff816c0577&amp;gt;] ret_from_fork+0x77/0xb0
[ 4997.544339]&#160; [&amp;lt;ffffffff810b3f60&amp;gt;] ? insert_kthread_work+0x40/0x40
[ 5061.726259] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 25s! [socknal_cd00:4048]
[ 5061.727248] Modules linked in: lustre(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma ib_isert iscsi_target_modib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core iosf_mbi ppdev crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr joydev virtio_balloon i2c_piix4 parport_pc parport nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi cirrus drm_kms_helper virtio_blk syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ata_piix drm libata crct10dif_pclmul crct10dif_common crc32c_intel 8139too serio_raw virtio_pci 8139cp virtio_ring virtio mii i2c_core floppy [last unloaded: lnet_selftest]

[ 5076.551703] CPU: 0 PID: 4048 Comm: socknal_cd00 Tainted: G&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; OE&#160; ------------&#160;&#160; 3.10.0-693.21.1.el7.x86_64 #1
[ 5076.551703] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[ 5076.551703] task: ffff8800359b0000 ti: ffff88007af5c000 task.ti: ffff88007af5c000
[ 5076.551703] RIP: 0010:[&amp;lt;ffffffff816b6545&amp;gt;]&#160; [&amp;lt;ffffffff816b6545&amp;gt;] _raw_spin_unlock_irqrestore+0x15/0x20
[ 5076.551703] RSP: 0018:ffff88007af5fde0&#160; EFLAGS: 00000246
[ 5076.551703] RAX: dead000000000200 RBX: ffff88007af5fe08 RCX: dead000000000200
[ 5076.551703] RDX: ffff88007a8dbe70 RSI: 0000000000000246 RDI: 0000000000000246
[ 5076.551703] RBP: ffff88007af5fde0 R08: ffff88007af5fe70 R09: 0000000000000000
[ 5076.551703] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88007af5fd60
[ 5076.551703] R13: 0000000000000000 R14: 00000000000001bf R15: 00000000000007ca
[ 5076.551703] FS:&#160; 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[ 5076.551703] CS:&#160; 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5076.551703] CR2: 00000000000000b8 CR3: 000000007afd4000 CR4: 00000000000606f0
[ 5076.551703] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 5076.551703] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 5076.551703] Call Trace:
[ 5076.551703]&#160; [&amp;lt;ffffffff810b4d21&amp;gt;] remove_wait_queue+0x31/0x40
[ 5076.551703]&#160; [&amp;lt;ffffffffc0798cd2&amp;gt;] ksocknal_connd+0x332/0xd60 [ksocklnd]
[ 5076.551703]&#160; [&amp;lt;ffffffff810c7c70&amp;gt;] ? wake_up_state+0x20/0x20
[ 5076.551703]&#160; [&amp;lt;ffffffffc07989a0&amp;gt;] ? ksocknal_thread_fini+0x30/0x30 [ksocklnd]
[ 5076.551703]&#160; [&amp;lt;ffffffff810b4031&amp;gt;] kthread+0xd1/0xe0
[ 5076.551703]&#160; [&amp;lt;ffffffff810b3f60&amp;gt;] ? insert_kthread_work+0x40/0x40
[ 5076.551703]&#160; [&amp;lt;ffffffff816c0577&amp;gt;] ret_from_fork+0x77/0xb0
[ 5076.551703]&#160; [&amp;lt;ffffffff810b3f60&amp;gt;] ? insert_kthread_work+0x40/0x40
[ 5076.551703] Code: 07 00 66 66 66 90 5d c3 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 c6 07 00 66 66 66 90 48 89 f7 57 9d &amp;lt;66&amp;gt; 66 90 66 90 5d c3 0f 1f 40 00 66 66 66 66 90 55 48 89 e5 48
[ 5076.551703] Kernel panic - not syncing: softlockup: hung tasks
[ 5076.551703] CPU: 0 PID: 4048 Comm: socknal_cd00 Tainted: G&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; OEL ------------&#160;&#160; 3.10.0-693.21.1.el7.x86_64 #1
[ 5076.551703] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[ 5076.551703] Call Trace:
[ 5076.551703]&#160; &amp;lt;IRQ&amp;gt;&#160; [&amp;lt;ffffffff816ae7c8&amp;gt;] dump_stack+0x19/0x1b
[ 5076.551703]&#160; [&amp;lt;ffffffff816a8634&amp;gt;] panic+0xe8/0x21f
[ 5076.551703]&#160; [&amp;lt;ffffffff8102d7cf&amp;gt;] ? show_regs+0x5f/0x210
[ 5076.551703]&#160; [&amp;lt;ffffffff811334e1&amp;gt;] watchdog_timer_fn+0x231/0x240
[ 5076.551703]&#160; [&amp;lt;ffffffff811332b0&amp;gt;] ? watchdog+0x40/0x40
[ 5076.551703]&#160; [&amp;lt;ffffffff810b8196&amp;gt;] __hrtimer_run_queues+0xd6/0x260
[ 5076.551703]&#160; [&amp;lt;ffffffff810b872f&amp;gt;] hrtimer_interrupt+0xaf/0x1d0
[ 5076.551703]&#160; [&amp;lt;ffffffff8105467b&amp;gt;] local_apic_timer_interrupt+0x3b/0x60
[ 5076.551703]&#160; [&amp;lt;ffffffff816c4e73&amp;gt;] smp_apic_timer_interrupt+0x43/0x60
[ 5076.551703]&#160; [&amp;lt;ffffffff816c1732&amp;gt;] apic_timer_interrupt+0x162/0x170
[ 5076.551703]&#160; &amp;lt;EOI&amp;gt;&#160; [&amp;lt;ffffffff816b6545&amp;gt;] ? _raw_spin_unlock_irqrestore+0x15/0x20
[ 5076.551703]&#160; [&amp;lt;ffffffff810b4d21&amp;gt;] remove_wait_queue+0x31/0x40
[ 5076.551703]&#160; [&amp;lt;ffffffffc0798cd2&amp;gt;] ksocknal_connd+0x332/0xd60 [ksocklnd]
[ 5076.551703]&#160; [&amp;lt;ffffffff810c7c70&amp;gt;] ? wake_up_state+0x20/0x20
[ 5076.551703]&#160; [&amp;lt;ffffffffc07989a0&amp;gt;] ? ksocknal_thread_fini+0x30/0x30 [ksocklnd]
[ 5076.551703]&#160; [&amp;lt;ffffffff810b4031&amp;gt;] kthread+0xd1/0xe0
[ 5076.551703]&#160; [&amp;lt;ffffffff810b3f60&amp;gt;] ? insert_kthread_work+0x40/0x40
[ 5076.551703]&#160; [&amp;lt;ffffffff816c0577&amp;gt;] ret_from_fork+0x77/0xb0
[ 5076.551703]&#160; [&amp;lt;ffffffff810b3f60&amp;gt;] ? insert_kthread_work+0x40/0x40
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;On the OSS, the following is seen in the console log and may be related&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[ 3746.842987] Lustre: DEBUG MARKER: == obdfilter-survey test 1a: Object Storage Targets survey =========================================== 17:00:36 (1521651636) 
[ 3747.568718] Lustre: Echo OBD driver; http://www.lustre.org/ 
[ 3750.063791] LustreError: 0-0: lustre-OST0000: can&apos;t enable quota enforcement since space accounting isn&apos;t functional. Please run tunefs.lustre --quota on an unmounted filesystem if not done already 
[ 3750.069266] LustreError: Skipped 15 previous similar messages 
[ 4697.745115] LNetError: 28867:0:(socklnd.c:1679:ksocknal_destroy_conn()) Completing partial receive from 12345-10.9.6.1@tcp[1], ip 10.9.6.1:1023, with error, wanted: 136, left: 136, last alive is 18 secs ago 
[ 4697.750855] LustreError: 28867:0:(events.c:304:request_in_callback()) event type 2, status -5, service ost 
[ 4697.754178] LustreError: 12877:0:(pack_generic.c:591:__lustre_unpack_msg()) message length 0 too small for magic/version check 
[ 4697.759336] LustreError: 12877:0:(sec.c:2070:sptlrpc_svc_unwrap_request()) error unpacking request from 12345-10.9.6.1@tcp x1595563597557120

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;I&apos;ve found only one instance of this hang so far.&lt;/p&gt;</description>
                <environment></environment>
        <key id="51475">LU-10839</key>
            <summary>obdfilter-survey test 1a  hangs with soft lockup on a client - socknal_cd</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="jamesanunez">James Nunez</reporter>
                        <labels>
                    </labels>
                <created>Thu, 22 Mar 2018 13:58:35 +0000</created>
                <updated>Thu, 22 Mar 2018 18:05:07 +0000</updated>
                                            <version>Lustre 2.11.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>2</watches>
                                                                                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzup3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>