<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:58:58 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-13168] Client panic &quot;Freechain corrupt&quot;/&quot;Redzone Overwritten&quot;</title>
                <link>https://jira.whamcloud.com/browse/LU-13168</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;We are using 2.12.3_93_gb75f04d-1 on clients to fix a panic when deleting files and using data on MDT (&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12462&quot; title=&quot;osc_cache_writeback_range(): ASSERTION( ext-&amp;gt;oe_start &amp;gt;= start &amp;amp;&amp;amp; ext-&amp;gt;oe_end &amp;lt;= end ) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12462&quot;&gt;&lt;del&gt;LU-12462&lt;/del&gt;&lt;/a&gt;). This has resolved the panic on deleting files, however we are now experiencing 2-3 kernel panics a day between our 6 cluster login machines.&lt;/p&gt;

&lt;p&gt;We do not yet know what is triggering these, however they all start from with either a kmalloc-8 freechain corrupt or kmalloc-8 Redzone overwritten, I&apos;ve reproduced samples of both the vmcore-dmesg file generated by kdump/, this looks similar to me to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12581&quot; title=&quot;Slab object &amp;quot;Freechain corrupt&amp;quot; on client&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12581&quot;&gt;&lt;del&gt;LU-12581&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Typical dmesg from crashed client:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[171210.346747] =============================================================================
[171210.346754] BUG kmalloc-8 (Tainted: G OE ------------ ): Freechain corrupt
[171210.346756] -----------------------------------------------------------------------------
[171210.346759] Disabling lock debugging due to kernel taint
[171210.346763] INFO: Slab 0xffffeb5450defb40 objects=102 used=6 fp=0xffff8eb6b7bedfa8 flags=0x6fffff00000081
[171210.346765] INFO: Object 0xffff8eb6b7bedf30 @offset=3888 fp=0x7fff8eb6b7bedf08
[171210.346770] Redzone ffff8eb6b7bedf28: bb bb bb bb bb bb bb bb ........
[171210.346773] Object ffff8eb6b7bedf30: 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkk.
[171210.346775] Redzone ffff8eb6b7bedf38: bb bb bb bb bb bb bb bb ........
[171210.346778] Padding ffff8eb6b7bedf48: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
[171210.346783] CPU: 21 PID: 8721 Comm: pool Kdump: loaded Tainted: G B OE ------------ 3.10.0-1062.9.1.el7.x86_64 #1
[171210.346785] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/21/2019
[171210.346787] Call Trace:
[171210.346799] [&amp;lt;ffffffffa777ac23&amp;gt;] dump_stack+0x19/0x1b
[171210.346805] [&amp;lt;ffffffffa7221561&amp;gt;] print_trailer+0x161/0x280
[171210.346808] [&amp;lt;ffffffffa7221ebf&amp;gt;] on_freelist+0xff/0x270
[171210.346813] [&amp;lt;ffffffffa77774cc&amp;gt;] free_debug_processing+0x18d/0x270
[171210.346818] [&amp;lt;ffffffffa71ddcb5&amp;gt;] ? kvfree+0x35/0x40
[171210.346822] [&amp;lt;ffffffffa7223bee&amp;gt;] __slab_free+0x1ce/0x290
[171210.346829] [&amp;lt;ffffffffa7272e58&amp;gt;] ? generic_setxattr+0x68/0x80
[171210.346834] [&amp;lt;ffffffffa7273635&amp;gt;] ? __vfs_setxattr_noperm+0x65/0x1b0
[171210.346840] [&amp;lt;ffffffffa732b7ae&amp;gt;] ? evm_inode_setxattr+0xe/0x10
[171210.346844] [&amp;lt;ffffffffa71ddcb5&amp;gt;] ? kvfree+0x35/0x40
[171210.346847] [&amp;lt;ffffffffa7223db6&amp;gt;] kfree+0x106/0x140
[171210.346851] [&amp;lt;ffffffffa71ddcb5&amp;gt;] kvfree+0x35/0x40
[171210.346855] [&amp;lt;ffffffffa727399b&amp;gt;] setxattr+0x15b/0x1e0
[171210.346861] [&amp;lt;ffffffffa725c3ed&amp;gt;] ? putname+0x3d/0x60
[171210.346865] [&amp;lt;ffffffffa725d602&amp;gt;] ? user_path_at_empty+0x72/0xc0
[171210.346871] [&amp;lt;ffffffffa724d828&amp;gt;] ? __sb_start_write+0x58/0x120
[171210.346876] [&amp;lt;ffffffffa7273c87&amp;gt;] SyS_setxattr+0xb7/0x100
[171210.346882] [&amp;lt;ffffffffa778dede&amp;gt;] system_call_fastpath+0x25/0x2a
[171210.346885] =============================================================================
[171210.346888] BUG kmalloc-8 (Tainted: G B OE ------------ ): Wrong object count. Counter is 6 but counted were 98
[171210.346889] -----------------------------------------------------------------------------
[171210.346893] INFO: Slab 0xffffeb5450defb40 objects=102 used=6 fp=0xffff8eb6b7bedfa8 flags=0x6fffff00000081
[171210.346897] CPU: 21 PID: 8721 Comm: pool Kdump: loaded Tainted: G B OE ------------ 3.10.0-1062.9.1.el7.x86_64 #1
[171210.346899] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/21/2019
[171210.346901] Call Trace:
[171210.346905] [&amp;lt;ffffffffa777ac23&amp;gt;] dump_stack+0x19/0x1b
[171210.346908] [&amp;lt;ffffffffa7221b54&amp;gt;] slab_err+0xb4/0xe0
[171210.346915] [&amp;lt;ffffffffa7030a1e&amp;gt;] ? show_stack+0x4e/0x60
[171210.346918] [&amp;lt;ffffffffa7221561&amp;gt;] ? print_trailer+0x161/0x280
[171210.346921] [&amp;lt;ffffffffa7221f85&amp;gt;] on_freelist+0x1c5/0x270
[171210.346925] [&amp;lt;ffffffffa77774cc&amp;gt;] free_debug_processing+0x18d/0x270
[171210.346929] [&amp;lt;ffffffffa71ddcb5&amp;gt;] ? kvfree+0x35/0x40
[171210.346932] [&amp;lt;ffffffffa7223bee&amp;gt;] __slab_free+0x1ce/0x290
[171210.346937] [&amp;lt;ffffffffa7272e58&amp;gt;] ? generic_setxattr+0x68/0x80
[171210.346941] [&amp;lt;ffffffffa7273635&amp;gt;] ? __vfs_setxattr_noperm+0x65/0x1b0
[171210.346944] [&amp;lt;ffffffffa732b7ae&amp;gt;] ? evm_inode_setxattr+0xe/0x10
[171210.346948] [&amp;lt;ffffffffa71ddcb5&amp;gt;] ? kvfree+0x35/0x40
[171210.346951] [&amp;lt;ffffffffa7223db6&amp;gt;] kfree+0x106/0x140
[171210.346955] [&amp;lt;ffffffffa71ddcb5&amp;gt;] kvfree+0x35/0x40
[171210.346959] [&amp;lt;ffffffffa727399b&amp;gt;] setxattr+0x15b/0x1e0
[171210.346963] [&amp;lt;ffffffffa725c3ed&amp;gt;] ? putname+0x3d/0x60
[171210.346967] [&amp;lt;ffffffffa725d602&amp;gt;] ? user_path_at_empty+0x72/0xc0
[171210.346971] [&amp;lt;ffffffffa724d828&amp;gt;] ? __sb_start_write+0x58/0x120
[171210.346976] [&amp;lt;ffffffffa7273c87&amp;gt;] SyS_setxattr+0xb7/0x100
[171210.346980] [&amp;lt;ffffffffa778dede&amp;gt;] system_call_fastpath+0x25/0x2a
[171210.346983] FIX kmalloc-8: Object count adjusted.
[171210.346985] =============================================================================
[171210.346988] BUG kmalloc-8 (Tainted: G B OE ------------ ): Redzone overwritten
[171210.346989] -----------------------------------------------------------------------------
[171210.346993] INFO: 0xffff8eb6b7bed0b0-0xffff8eb6b7bed0b7. First byte 0x4c instead of 0xcc
[171210.346996] INFO: Slab 0xffffeb5450defb40 objects=102 used=98 fp=0xffff8eb6b7bedfa8 flags=0x6fffff00000081
[171210.346998] INFO: Object 0xffff8eb6b7bed0a8 @offset=168 fp=0x7f7f0e36373e5050
[171210.347001] Redzone ffff8eb6b7bed0a0: cc cc cc cc cc cc cc cc ........
[171210.347004] Object ffff8eb6b7bed0a8: d0 0b d6 0b 88 01 00 25 .......%
[171210.347006] Redzone ffff8eb6b7bed0b0: 4c 4c 4c 4c 4c 4c 4c 4c LLLLLLLL
[171210.347009] Padding ffff8eb6b7bed0c0: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
[171210.347012] CPU: 21 PID: 8721 Comm: pool Kdump: loaded Tainted: G B OE ------------ 3.10.0-1062.9.1.el7.x86_64 #1
[171210.347014] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/21/2019
[171210.347016] Call Trace:
[171210.347020] [&amp;lt;ffffffffa777ac23&amp;gt;] dump_stack+0x19/0x1b
[171210.347023] [&amp;lt;ffffffffa7221561&amp;gt;] print_trailer+0x161/0x280
[171210.347026] [&amp;lt;ffffffffa72217ef&amp;gt;] check_bytes_and_report+0xcf/0x110
[171210.347030] [&amp;lt;ffffffffa722237d&amp;gt;] check_object+0x1dd/0x2a0
[171210.347033] [&amp;lt;ffffffffa77773cc&amp;gt;] free_debug_processing+0x8d/0x270
[171210.347037] [&amp;lt;ffffffffa71ddcb5&amp;gt;] ? kvfree+0x35/0x40
[171210.347040] [&amp;lt;ffffffffa7223bee&amp;gt;] __slab_free+0x1ce/0x290
[171210.347045] [&amp;lt;ffffffffa7272e58&amp;gt;] ? generic_setxattr+0x68/0x80
[171210.347049] [&amp;lt;ffffffffa7273635&amp;gt;] ? __vfs_setxattr_noperm+0x65/0x1b0
[171210.347258] [&amp;lt;ffffffffa732b7ae&amp;gt;] ? evm_inode_setxattr+0xe/0x10
[171210.347262] [&amp;lt;ffffffffa71ddcb5&amp;gt;] ? kvfree+0x35/0x40
[171210.347265] [&amp;lt;ffffffffa7223db6&amp;gt;] kfree+0x106/0x140
[171210.347449] [&amp;lt;ffffffffa71ddcb5&amp;gt;] kvfree+0x35/0x40
[171210.347627] [&amp;lt;ffffffffa727399b&amp;gt;] setxattr+0x15b/0x1e0
[171210.347823] [&amp;lt;ffffffffa725c3ed&amp;gt;] ? putname+0x3d/0x60
[171210.348010] [&amp;lt;ffffffffa725d602&amp;gt;] ? user_path_at_empty+0x72/0xc0
[171210.348204] [&amp;lt;ffffffffa724d828&amp;gt;] ? __sb_start_write+0x58/0x120
[171210.348209] [&amp;lt;ffffffffa7273c87&amp;gt;] SyS_setxattr+0xb7/0x100
[171210.348392] [&amp;lt;ffffffffa778dede&amp;gt;] system_call_fastpath+0x25/0x2a
[171210.348578] FIX kmalloc-8: Restoring 0xffff8eb6b7bed0b0-0xffff8eb6b7bed0b7=0xcc
[171210.349139] FIX kmalloc-8: Object at 0xffff8eb6b7bed0a8 not freed
[171210.462694] general protection fault: 0000 [#1] SMP 
[171210.488281] Modules linked in: fuse can_bcm sctp can_raw can nfsd mgc(OE) lustre(OE) lmv(OE) mdc(OE) fid(OE) osc(OE) lov(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) cts lnet(OE) rpcsec_gss_krb5 nfsv4 dns_resolver libcfs(OE) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) mlx4_en(OE) ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport xt_recent xt_conntrack nf_conntrack iptable_filter dm_mirror dm_region_hash dm_log dm_mod mlx4_ib(OE) ib_uverbs(OE) ib_core(OE) sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm mlx4_core(OE) mgag200 ttm irqbypass drm_kms_helper crc32_pclmul iTCO_wdt crc32c_intel iTCO_vendor_support ghash_clmulni_intel aesni_intel
[171210.866605] syscopyarea lrw sysfillrect sysimgblt fb_sys_fops gf128mul glue_helper drm ablk_helper cryptd ses enclosure drm_panel_orientation_quirks ipmi_si mlx_compat(OE) pcspkr ipmi_devintf devlink ioatdma ipmi_msghandler pcc_cpufreq wmi i2c_i801 hpwdt lpc_ich acpi_power_meter binfmt_misc knem(OE) auth_rpcgss ip_tables smartpqi bridge stp llc xfs isci libsas qla3xxx e1000e igb i2c_algo_bit megaraid_sas aacraid aic79xx ata_piix mpt2sas raid_class mptspi scsi_transport_spi mptsas mptscsih mptbase arcmsr ahci libahci sata_nv sata_svw bnx2x libcrc32c bnx2 ext4 mbcache jbd2 sata_sil libata tg3 e1000 nfsv3 nfs_acl nfs lockd grace sunrpc fscache tun sd_mod crc_t10dif crct10dif_generic sg ixgbe crct10dif_pclmul crct10dif_common hpsa dca mdio ptp hpilo scsi_transport_sas pps_core [last unloaded: ipmi_msghandler]
[171211.240594] 
[171211.242206] CPU: 21 PID: 8721 Comm: pool Kdump: loaded Tainted: G B OE ------------ 3.10.0-1062.9.1.el7.x86_64 #1
[171211.300030] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/21/2019
[171211.336094] task: ffff8e972f73b150 ti: ffff8e9cdf368000 task.ti: ffff8e9cdf368000
[171211.375938] RIP: 0010:[&amp;lt;ffffffffc112fcdc&amp;gt;] [&amp;lt;ffffffffc112fcdc&amp;gt;] cl_page_delete0+0x6c/0x220 [obdclass]
[171211.423586] RSP: 0018:ffff8e9cdf36bb98 EFLAGS: 00010287
[171211.451848] RAX: 7fffffffc1439900 RBX: ffff8eb76f7e4a90 RCX: 000000000000001c
[171211.490610] RDX: ffff8e88df67bb50 RSI: ffff8ea253a26b58 RDI: ffff8e8fd9ff89a4
[171211.530173] RBP: ffff8e9cdf36bbb0 R08: ffff8ea253a26b58 R09: 0000000000000046
[171211.568509] R10: 0000000000000230 R11: ffff8eae63eebc00 R12: ffff8eb76f7e4a28
[171211.606963] R13: ffffffffc118d878 R14: ffff8e9cdf36bcd0 R15: ffff8e9cdf36bc60
[171211.645074] FS: 00002aaabe446700(0000) GS:ffff8ec27f3c0000(0000) knlGS:0000000000000000
[171211.689665] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[171211.722223] CR2: 0000000001440b30 CR3: 0000003bd6a22000 CR4: 00000000003607e0
[171211.762135] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[171211.801929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[171211.842015] Call Trace:
[171211.855677] [&amp;lt;ffffffffc112fec3&amp;gt;] cl_page_delete+0x33/0x110 [obdclass]
[171211.892125] [&amp;lt;ffffffffc158a9ff&amp;gt;] ll_invalidatepage+0x7f/0x170 [lustre]
[171211.931786] [&amp;lt;ffffffffa71ce22d&amp;gt;] do_invalidatepage_range+0x7d/0x90
[171211.965295] [&amp;lt;ffffffffa71ce2d7&amp;gt;] truncate_inode_page+0x77/0x80
[171211.996796] [&amp;lt;ffffffffa71ce50a&amp;gt;] truncate_inode_pages_range+0x1ea/0x750
[171212.033187] [&amp;lt;ffffffffa71ceadf&amp;gt;] truncate_inode_pages_final+0x4f/0x60
[171212.071886] [&amp;lt;ffffffffc1570acf&amp;gt;] ll_delete_inode+0x4f/0x230 [lustre]
[171212.106401] [&amp;lt;ffffffffa7268544&amp;gt;] evict+0xb4/0x180
[171212.133368] [&amp;lt;ffffffffa726896c&amp;gt;] iput+0xfc/0x190
[171212.162024] [&amp;lt;ffffffffa725cbde&amp;gt;] do_unlinkat+0x1ae/0x2d0
[171212.191747] [&amp;lt;ffffffffa725dc96&amp;gt;] SyS_unlink+0x16/0x20
[171212.221730] [&amp;lt;ffffffffa778dede&amp;gt;] system_call_fastpath+0x25/0x2a
[171212.253488] Code: 89 e6 ba 04 00 00 00 4c 89 ef e8 80 fb ff ff 49 8b 44 24 30 49 83 c4 28 49 39 c4 48 8d 58 e0 74 2b 66 0f 1f 44 00 00 48 8b 43 18 &amp;lt;48&amp;gt; 8b 40 40 48 85 c0 74 0b 48 89 de 4c 89 ef e8 20 2e 26 e6 48 
[171212.353283] RIP [&amp;lt;ffffffffc112fcdc&amp;gt;] cl_page_delete0+0x6c/0x220 [obdclass]
[171212.391650] RSP &amp;lt;ffff8e9cdf36bb98&amp;gt;
&#160;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;second dmesg (Redzone overwritten message)&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[588058.598892] =============================================================================
[588058.598898] BUG kmalloc-8 (Tainted: G OE ------------ ): Redzone overwritten
[588058.598900] -----------------------------------------------------------------------------
[588058.598903] Disabling lock debugging due to kernel taint
[588058.598906] INFO: 0xffff8b70ccbdde48-0xffff8b70ccbdde4f. First byte 0x4c instead of 0xcc
[588058.598908] INFO: Slab 0xffffd60ee632f740 objects=102 used=93 fp=0xffff8b70ccbddcd8 flags=0x6fffff00000081
[588058.598910] INFO: Object 0xffff8b70ccbdde40 @offset=3648 fp=0x7f7f0b704c3d5d48
[588058.598914] Redzone ffff8b70ccbdde38: cc cc cc cc cc cc cc cc ........
[588058.598916] Object ffff8b70ccbdde40: d0 0b d6 0b 88 01 00 25 .......%
[588058.598918] Redzone ffff8b70ccbdde48: 4c 4c 4c 4c 4c 4c 4c 4c LLLLLLLL
[588058.598920] Padding ffff8b70ccbdde58: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
[588058.598924] CPU: 27 PID: 12194 Comm: pool Kdump: loaded Tainted: G B OE ------------ 3.10.0-1062.9.1.el7.x86_64 #1
[588058.598926] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/21/2019
[588058.598927] Call Trace:
[588058.598939] [&amp;lt;ffffffffb477ac23&amp;gt;] dump_stack+0x19/0x1b
[588058.598944] [&amp;lt;ffffffffb4221561&amp;gt;] print_trailer+0x161/0x280
[588058.598947] [&amp;lt;ffffffffb42217ef&amp;gt;] check_bytes_and_report+0xcf/0x110
[588058.598950] [&amp;lt;ffffffffb422237d&amp;gt;] check_object+0x1dd/0x2a0
[588058.598953] [&amp;lt;ffffffffb47773cc&amp;gt;] free_debug_processing+0x8d/0x270
[588058.598958] [&amp;lt;ffffffffb41ddcb5&amp;gt;] ? kvfree+0x35/0x40
[588058.598962] [&amp;lt;ffffffffb4223bee&amp;gt;] __slab_free+0x1ce/0x290
[588058.598968] [&amp;lt;ffffffffb4272e58&amp;gt;] ? generic_setxattr+0x68/0x80
[588058.598972] [&amp;lt;ffffffffb4273635&amp;gt;] ? __vfs_setxattr_noperm+0x65/0x1b0
[588058.598977] [&amp;lt;ffffffffb432b7ae&amp;gt;] ? evm_inode_setxattr+0xe/0x10
[588058.598980] [&amp;lt;ffffffffb41ddcb5&amp;gt;] ? kvfree+0x35/0x40
[588058.598982] [&amp;lt;ffffffffb4223db6&amp;gt;] kfree+0x106/0x140
[588058.598985] [&amp;lt;ffffffffb41ddcb5&amp;gt;] kvfree+0x35/0x40
[588058.598989] [&amp;lt;ffffffffb427399b&amp;gt;] setxattr+0x15b/0x1e0
[588058.598994] [&amp;lt;ffffffffb425c3ed&amp;gt;] ? putname+0x3d/0x60
[588058.598998] [&amp;lt;ffffffffb425d602&amp;gt;] ? user_path_at_empty+0x72/0xc0
[588058.599003] [&amp;lt;ffffffffb424d828&amp;gt;] ? __sb_start_write+0x58/0x120
[588058.599008] [&amp;lt;ffffffffb42802f1&amp;gt;] ? do_utimes+0xf1/0x180
[588058.599011] [&amp;lt;ffffffffb4273c87&amp;gt;] SyS_setxattr+0xb7/0x100
[588058.599016] [&amp;lt;ffffffffb478dede&amp;gt;] system_call_fastpath+0x25/0x2a
[588058.599019] FIX kmalloc-8: Restoring 0xffff8b70ccbdde48-0xffff8b70ccbdde4f=0xcc
[588058.599022] FIX kmalloc-8: Object at 0xffff8b70ccbdde40 not freed
[588060.269020] WebExtensions[13188]: segfault at 1fff8 ip 00001f17cb0e5fbb sp 00007fffffffb998 error 4
[588076.827561] atom[21965]: segfault at 21ea75682310 ip 00002aaaab0d6550 sp 00007fffffffc3a8 error 4 in libnode.so[2aaaaaccf000+12ba000]
[588128.154889] LustreError: 32046:0:(cl_page.c:394:cl_pagevec_put()) page@ffff8b70fd832600[0 ffff8b5e72b18270 4 1 (null)]
[588128.154903] LustreError: 32046:0:(cl_page.c:394:cl_pagevec_put()) vvp-page@ffff8b70fd832650(0:0) vm@ffffd60ec75c3dc0 6fffff00000009 3:0 0 449438 lru
[588128.154910] LustreError: 32046:0:(cl_page.c:394:cl_pagevec_put()) lov-page@ffff8b70fd832690, comp index: 30002, gen: 8
[588128.154924] LustreError: 32046:0:(cl_page.c:394:cl_pagevec_put()) osc-page@ffff8b70fd8326c8 112542: 1&amp;lt; 0x845fed 2 0 - - &amp;gt; 2&amp;lt; 460972032 0 4096 0x0 0x420 | (null) ffff8b872b233648 ffff8b49c2ff9540 &amp;gt; 3&amp;lt; 0 0 0 &amp;gt; 4&amp;lt; 0 0 8 1879965695 - | - - - - &amp;gt; 5&amp;lt; - - - - | 0 - | 0 - -&amp;gt;
[588128.154930] LustreError: 32046:0:(cl_page.c:394:cl_pagevec_put()) end page@ffff8b70fd832600
[588128.154935] LustreError: 32046:0:(cl_page.c:394:cl_pagevec_put()) list_empty(&amp;amp;page-&amp;gt;cp_batch)
[588128.154939] LustreError: 32046:0:(cl_page.c:394:cl_pagevec_put()) ASSERTION( 0 ) failed: 
[588128.196374] LustreError: 32046:0:(cl_page.c:394:cl_pagevec_put()) LBUG
[588128.230830] Pid: 32046, comm: wget 3.10.0-1062.9.1.el7.x86_64 #1 SMP Fri Dec 6 15:49:49 UTC 2019
[588128.230832] Call Trace:
[588128.230848] [&amp;lt;ffffffffc10d97cc&amp;gt;] libcfs_call_trace+0x8c/0xc0 [libcfs]
[588128.230867] [&amp;lt;ffffffffc10d987c&amp;gt;] lbug_with_loc+0x4c/0xa0 [libcfs]
[588128.230876] [&amp;lt;ffffffffc12fd1f3&amp;gt;] cl_pagevec_put+0x3a3/0x3e0 [obdclass]
[588128.230922] [&amp;lt;ffffffffc12fd240&amp;gt;] cl_page_put+0x10/0x20 [obdclass]
[588128.230944] [&amp;lt;ffffffffc175d895&amp;gt;] ll_releasepage+0xb5/0x1a0 [lustre]
[588128.230971] [&amp;lt;ffffffffb41bd565&amp;gt;] try_to_release_page+0x35/0x50
[588128.230979] [&amp;lt;ffffffffb41d2a19&amp;gt;] shrink_page_list+0xa09/0xc30
[588128.230985] [&amp;lt;ffffffffb41d3266&amp;gt;] shrink_inactive_list+0x1c6/0x5d0
[588128.230989] [&amp;lt;ffffffffb41d3d65&amp;gt;] shrink_lruvec+0x385/0x740
[588128.230993] [&amp;lt;ffffffffb41d4196&amp;gt;] shrink_zone+0x76/0x1a0
[588128.230997] [&amp;lt;ffffffffb41d4680&amp;gt;] do_try_to_free_pages+0xf0/0x520
[588128.231002] [&amp;lt;ffffffffb41d4d0a&amp;gt;] try_to_free_mem_cgroup_pages+0xda/0x190
[588128.231006] [&amp;lt;ffffffffb423c7ce&amp;gt;] mem_cgroup_reclaim+0x4e/0x120
[588128.231011] [&amp;lt;ffffffffb423d19c&amp;gt;] __mem_cgroup_try_charge+0x4ec/0x670
[588128.231014] [&amp;lt;ffffffffb423db09&amp;gt;] mem_cgroup_charge_common+0x59/0xc0
[588128.231018] [&amp;lt;ffffffffb423f4ca&amp;gt;] mem_cgroup_cache_charge+0x8a/0xb0
[588128.231022] [&amp;lt;ffffffffb41be1ee&amp;gt;] __add_to_page_cache_locked+0x4e/0x190
[588128.231026] [&amp;lt;ffffffffb41be387&amp;gt;] add_to_page_cache_lru+0x37/0xb0
[588128.231030] [&amp;lt;ffffffffb41be449&amp;gt;] grab_cache_page_nowait+0x49/0xa0
[588128.231033] [&amp;lt;ffffffffc175e1d5&amp;gt;] ll_write_begin+0xd5/0xc00 [lustre]
[588128.231048] [&amp;lt;ffffffffb41bd28f&amp;gt;] generic_file_buffered_write+0x10f/0x270
[588128.231052] [&amp;lt;ffffffffb41bfaf2&amp;gt;] __generic_file_aio_write+0x1e2/0x400
[588128.231056] [&amp;lt;ffffffffc176c51b&amp;gt;] __generic_file_write_iter+0xcb/0x340 [lustre]
[588128.231072] [&amp;lt;ffffffffc1770704&amp;gt;] vvp_io_write_start+0x4c4/0x970 [lustre]
[588128.231088] [&amp;lt;ffffffffc13011a8&amp;gt;] cl_io_start+0x68/0x130 [obdclass]
[588128.231112] [&amp;lt;ffffffffc130338c&amp;gt;] cl_io_loop+0xcc/0x1c0 [obdclass]
[588128.231134] [&amp;lt;ffffffffc1725f4b&amp;gt;] ll_file_io_generic+0x63b/0xc90 [lustre]
[588128.231147] [&amp;lt;ffffffffc1726a39&amp;gt;] ll_file_aio_write+0x289/0x660 [lustre]
[588128.231158] [&amp;lt;ffffffffc1726f10&amp;gt;] ll_file_write+0x100/0x1c0 [lustre]
[588128.231170] [&amp;lt;ffffffffb424a7f0&amp;gt;] vfs_write+0xc0/0x1f0
[588128.231175] [&amp;lt;ffffffffb424b60f&amp;gt;] SyS_write+0x7f/0xf0
[588128.231179] [&amp;lt;ffffffffb478dede&amp;gt;] system_call_fastpath+0x25/0x2a
[588128.231185] [&amp;lt;ffffffffffffffff&amp;gt;] 0xffffffffffffffff
[588128.231226] Kernel panic - not syncing: LBUG
[588128.255989] CPU: 14 PID: 32046 Comm: wget Kdump: loaded Tainted: G B OE ------------ 3.10.0-1062.9.1.el7.x86_64 #1
[588128.315914] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/21/2019
[588128.353586] Call Trace:
[588128.367273] [&amp;lt;ffffffffb477ac23&amp;gt;] dump_stack+0x19/0x1b
[588128.395924] [&amp;lt;ffffffffb4774967&amp;gt;] panic+0xe8/0x21f
[588128.423537] [&amp;lt;ffffffffc10d98cb&amp;gt;] lbug_with_loc+0x9b/0xa0 [libcfs]
[588128.458128] [&amp;lt;ffffffffc12fd1f3&amp;gt;] cl_pagevec_put+0x3a3/0x3e0 [obdclass]
[588128.496480] [&amp;lt;ffffffffc12fbcf0&amp;gt;] ? cl_page_delete0+0x80/0x220 [obdclass]
[588128.536173] [&amp;lt;ffffffffc12fd240&amp;gt;] cl_page_put+0x10/0x20 [obdclass]
[588128.569680] [&amp;lt;ffffffffc175d895&amp;gt;] ll_releasepage+0xb5/0x1a0 [lustre]
[588128.606245] [&amp;lt;ffffffffb41bd565&amp;gt;] try_to_release_page+0x35/0x50
[588128.637150] [&amp;lt;ffffffffb41d2a19&amp;gt;] shrink_page_list+0xa09/0xc30
[588128.668853] [&amp;lt;ffffffffb41d3266&amp;gt;] shrink_inactive_list+0x1c6/0x5d0
[588128.701180] [&amp;lt;ffffffffb41d3d65&amp;gt;] shrink_lruvec+0x385/0x740
[588128.729829] [&amp;lt;ffffffffb41d4196&amp;gt;] shrink_zone+0x76/0x1a0
[588128.757029] [&amp;lt;ffffffffb41d4680&amp;gt;] do_try_to_free_pages+0xf0/0x520
[588128.788834] [&amp;lt;ffffffffb41d4d0a&amp;gt;] try_to_free_mem_cgroup_pages+0xda/0x190
[588128.825660] [&amp;lt;ffffffffb423c7ce&amp;gt;] mem_cgroup_reclaim+0x4e/0x120
[588128.857225] [&amp;lt;ffffffffb423d19c&amp;gt;] __mem_cgroup_try_charge+0x4ec/0x670
[588128.890392] [&amp;lt;ffffffffb423db09&amp;gt;] mem_cgroup_charge_common+0x59/0xc0
[588128.926221] [&amp;lt;ffffffffb423f4ca&amp;gt;] mem_cgroup_cache_charge+0x8a/0xb0
[588128.959155] [&amp;lt;ffffffffb41be1ee&amp;gt;] __add_to_page_cache_locked+0x4e/0x190
[588128.997956] [&amp;lt;ffffffffb41be387&amp;gt;] add_to_page_cache_lru+0x37/0xb0
[588129.030741] [&amp;lt;ffffffffb41be449&amp;gt;] grab_cache_page_nowait+0x49/0xa0
[588129.068075] [&amp;lt;ffffffffc175e1d5&amp;gt;] ll_write_begin+0xd5/0xc00 [lustre]
[588129.108147] [&amp;lt;ffffffffb41bd28f&amp;gt;] generic_file_buffered_write+0x10f/0x270
[588129.147957] [&amp;lt;ffffffffb41bfaf2&amp;gt;] __generic_file_aio_write+0x1e2/0x400
[588129.187721] [&amp;lt;ffffffffc176c51b&amp;gt;] __generic_file_write_iter+0xcb/0x340 [lustre]
[588129.227843] [&amp;lt;ffffffffc1770704&amp;gt;] vvp_io_write_start+0x4c4/0x970 [lustre]
[588129.267789] [&amp;lt;ffffffffc13011a8&amp;gt;] cl_io_start+0x68/0x130 [obdclass]
[588129.300695] [&amp;lt;ffffffffc130338c&amp;gt;] cl_io_loop+0xcc/0x1c0 [obdclass]
[588129.337964] [&amp;lt;ffffffffc1725f4b&amp;gt;] ll_file_io_generic+0x63b/0xc90 [lustre]
[588129.377915] [&amp;lt;ffffffffc1726a39&amp;gt;] ll_file_aio_write+0x289/0x660 [lustre]
[588129.418350] [&amp;lt;ffffffffc1726f10&amp;gt;] ll_file_write+0x100/0x1c0 [lustre]
[588129.458398] [&amp;lt;ffffffffb424a7f0&amp;gt;] vfs_write+0xc0/0x1f0
[588129.488082] [&amp;lt;ffffffffb424b60f&amp;gt;] SyS_write+0x7f/0xf0
[588129.518291] [&amp;lt;ffffffffb478dede&amp;gt;] system_call_fastpath+0x25/0x2a
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</description>
                <environment>Centos 7.7&lt;br/&gt;
MDS/OSS Lustre version: 2.12.3&lt;br/&gt;
clients Lustre: 2.12.3_93_gb75f04d-1</environment>
        <key id="57865">LU-13168</key>
            <summary>Client panic &quot;Freechain corrupt&quot;/&quot;Redzone Overwritten&quot;</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="adilger">Andreas Dilger</assignee>
                                    <reporter username="cjm14">Christopher Mountford</reporter>
                        <labels>
                    </labels>
                <created>Wed, 22 Jan 2020 16:10:32 +0000</created>
                <updated>Wed, 27 May 2020 13:56:00 +0000</updated>
                            <resolved>Thu, 14 May 2020 14:51:45 +0000</resolved>
                                                    <fixVersion>Lustre 2.14.0</fixVersion>
                    <fixVersion>Lustre 2.12.5</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>8</watches>
                                                                            <comments>
                            <comment id="261695" author="adilger" created="Thu, 23 Jan 2020 05:32:46 +0000"  >&lt;p&gt;I assume that the redzone/slab debugging is because your clients have &quot;&lt;tt&gt;CONFIG_DEBUG_SLAB&lt;/tt&gt;&quot; enabled or similar.&lt;/p&gt;

&lt;p&gt;Do you have any idea if there is a particular workload that triggers this problem, or just general usage?&lt;/p&gt;

&lt;p&gt;If you are able to reproduce the problem easily, would you be able to try &quot;&lt;tt&gt;git bisect&lt;/tt&gt;&quot; to isolate the source of this problem to a specific patch?  You would need to apply the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12462&quot; title=&quot;osc_cache_writeback_range(): ASSERTION( ext-&amp;gt;oe_start &amp;gt;= start &amp;amp;&amp;amp; ext-&amp;gt;oe_end &amp;lt;= end ) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12462&quot;&gt;&lt;del&gt;LU-12462&lt;/del&gt;&lt;/a&gt; patch after each bisect step so that you don&apos;t hit the other problem.&lt;/p&gt;

&lt;p&gt;The only thing in the slab dump that looks familiar is the data &quot;&lt;tt&gt;d0 0b d6 0b 88 01 00 25&lt;/tt&gt;&quot; which looks to be &quot;&lt;tt&gt;LOV_USER_MAGIC_COMP_V1=0x0BD60BD0&lt;/tt&gt;&quot; but the following bytes do not look valid.  One possibility is that a &quot;&lt;tt&gt;struct lov_comp_md_v1&lt;/tt&gt;&quot; was allocated from one slab, but incorrectly freed from a second slab?  It definitely shouldn&apos;t be allocated from the &lt;tt&gt;kmalloc-8&lt;/tt&gt; slab.&lt;/p&gt;</comment>
                            <comment id="261699" author="cjm14" created="Thu, 23 Jan 2020 09:18:31 +0000"  >&lt;p&gt;We are trying to isolate the workload causing this - it is only occurring on login nodes. If we are able to reproduce it we will try to identify is a specific patch causes this on a test node.&lt;/p&gt;</comment>
                            <comment id="264252" author="cjm14" created="Fri, 28 Feb 2020 14:10:06 +0000"  >&lt;p&gt;We&apos;ve updated the clients to 2.12.4 and are still seeing the same issue.&lt;/p&gt;

&lt;p&gt;Not yet identified a way of reproducing the problem, it seems to strike randomly.&lt;/p&gt;

&lt;p&gt;We can confirm we only see these crashes on our login nodes which support virtual desktops (using nomachine)&#160; - 22 desktop login node crashes (between 4 login nodes), no similar problems on 2 ssh only login nodes and 170 compute nodes. All login nodes (NoMachine and ssh) have identical login images and hardware.&lt;/p&gt;

&lt;p&gt;The mate desktop we provide does seem to open (and keep open) a lot of files - particularly small memory mapped files. Not sure if this might contribute to the problem.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="268770" author="cjm14" created="Tue, 28 Apr 2020 15:02:29 +0000"  >&lt;p&gt;I think we&apos;ve finally tracked down a trigger for this crash - it occurs when files are being copied (intermittently) and moved (almost always triggers a panic) using the gnome io libraries (for example using ctrl-x/ctrl-v in the Caja or Nautilus file browsers. Critically, it only occurs when the move is between 2 lustre file systems. GNU utils and kio do not seem to cause the same problem.&lt;/p&gt;

&lt;p&gt;We&apos;ve taken a look at the strace output from a gio move command, it ends (kernel panics) with:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;.
.
lstat(&quot;/lustre/old_data/admin/cjm14/pr_E1hr_CNRM-CM6-1_amip_r1i1p1f2_gr_197901010030-198312312330.nc&quot;, {st_mode=S_IFREG|0644,st_size=4899999744, ...}) = 0
chmod(&quot;/lustre/old_data/admin/cjm14/pr_E1hr_CNRM-CM6-1_amip_r1i1p1f2_gr_197901010030-198312312330.nc&quot;, 0100644) = 0
utimes(&quot;/lustre/old_data/admin/cjm14/pr_E1hr_CNRM-CM6-1_amip_r1i1p1f2_gr_197901010030-198312312330.nc&quot;, [{1588072285, 0}, {1587982928, 0}]) = 0
setxattr(&quot;/lustre/old_data/admin/cjm14/pr_E1hr_CNRM-CM6-1_amip_r1i1p1f2_gr_197901010030-198312312330.nc&quot;, &quot;lustre.lov&quot;, &quot;\320\v\326\v\10\3&quot;, 7, 0packet_write_wait: Connection to 10.141.32.19 port 22: Broken pipe
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;br/&gt;
 I believe the &lt;tt&gt;lustre.lov&lt;/tt&gt; attribute is related to the file layout?&lt;/p&gt;

&lt;p&gt;Critically, gio writes only 7 bytes to this attribute, if we compare this to strace gnu mv -&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;fsetxattr(4, &quot;lustre.lov&quot;, &quot;\320\v\326\v\10\3\0\0\6\0\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\20\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\0\0\0\0\340\0\0\0008\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\2\0\0\0\20\0\0\0\0\0\0\10\0\0\0\0\0\0\0\0\1\0\0\0\30\1\0\0P\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\3\0\0\0\20\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\10\0\0\0h\1\0\0\200\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\10\0\0\0\377\377\377\377\377\377\377\377\350\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\320\v\321\v\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\20\0\1\0\0\0lb\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\3\0\0\0\320\v\321\v\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\20\0\2\0\0\0\207\302\r\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0Aj\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\320\v\321\v\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0@\0\4\0\0\0Gg\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\2\0\0\0mb\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\3\0\0\0\210\302\r\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0Bj\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\320\v\321\v\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0@\0\10\0\377\377\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0&quot;, 776, 0) = 0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The attribute value is 776 bytes long - it looks to me like gio truncates it at the first null character.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;The following code reproduces this crash via a truncated call to setxattr (and confirms that the xattr for the above file is indeed 776 bytes), so far we&apos;ve confirmed this with 2.12.4 client and server, I&apos;m going to check it again on a system with 2.12.4 client and 2.10.8 server:&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
&lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; main(&lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; argc, &lt;span class=&quot;code-object&quot;&gt;char&lt;/span&gt; *argv[]) {
    &#160;   &lt;span class=&quot;code-object&quot;&gt;char&lt;/span&gt; *file = argv[1];
      &#160; &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; trunc_to = 7;
      &#160; &lt;span class=&quot;code-comment&quot;&gt;// call with a zero size buffer, should &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; the buffer size required
&lt;/span&gt;        ssize_t asz = getxattr(file, &lt;span class=&quot;code-quote&quot;&gt;&quot;lustre.lov&quot;&lt;/span&gt;, NULL, 0);
      &#160; &lt;span class=&quot;code-comment&quot;&gt;// call with a suitable buffer:
&lt;/span&gt;      &#160; unsigned &lt;span class=&quot;code-object&quot;&gt;char&lt;/span&gt; xattr_buf[asz];
      &#160; ssize_t asz2 = getxattr(file, &lt;span class=&quot;code-quote&quot;&gt;&quot;lustre.lov&quot;&lt;/span&gt;, xattr_buf,asz);
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (asz2 != asz)
      &#160; {
            &#160;&#160;&#160; printf(&lt;span class=&quot;code-quote&quot;&gt;&quot;asz2 (%i) does not match asz (%i)\n&quot;&lt;/span&gt;, asz2, asz );
            &#160;&#160;&#160; &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; -1;
        }

      &#160; &lt;span class=&quot;code-comment&quot;&gt;// &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; we get &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; far we have the attr - lets take a look at it:
&lt;/span&gt;      &#160; printf( &lt;span class=&quot;code-quote&quot;&gt;&quot;xattr - %i characters\n&quot;&lt;/span&gt;, asz2 );

        &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; ii;
      &#160; &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt;( ii=0; ii&amp;lt;asz2; ii++ )
            &#160; printf( &lt;span class=&quot;code-quote&quot;&gt;&quot;0x%02x:&quot;&lt;/span&gt;, xattr_buf[ii] );
        printf(&lt;span class=&quot;code-quote&quot;&gt;&quot;\n&quot;&lt;/span&gt;);
        &lt;span class=&quot;code-comment&quot;&gt;// Now &lt;span class=&quot;code-keyword&quot;&gt;try&lt;/span&gt; writing it back to the file - &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; should be ok.
&lt;/span&gt;      &#160; &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; rc = setxattr(file, &lt;span class=&quot;code-quote&quot;&gt;&quot;lustre.lov&quot;&lt;/span&gt;, xattr_buf, asz, 0);
      &#160; &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (rc) 
      &#160; {
            &#160;&#160;&#160; printf( &lt;span class=&quot;code-quote&quot;&gt;&quot;call to setxattr with full attribute value failed with errno %i\n&quot;&lt;/span&gt;, errno );
            &#160;&#160;&#160; &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; -1;
      &#160; }
      &#160; printf( &lt;span class=&quot;code-quote&quot;&gt;&quot;wrote xattr with %i bytes\n&quot;&lt;/span&gt;, asz);

      &#160; &lt;span class=&quot;code-comment&quot;&gt;// Finally, lets delibrately truncate it and write it (&lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt;&#160;&#160; appears 
&lt;/span&gt;      &#160; &lt;span class=&quot;code-comment&quot;&gt;// be what happens at the end of glib io move) - &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; crashes our
&lt;/span&gt;      &#160; &lt;span class=&quot;code-comment&quot;&gt;// clients.
&lt;/span&gt;      &#160; rc = setxattr(file, &lt;span class=&quot;code-quote&quot;&gt;&quot;lustre.lov&quot;&lt;/span&gt;, xattr_buf, trunc_to, 0);
      &#160; &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (rc)
      &#160; {
            &#160;&#160;&#160; printf( &lt;span class=&quot;code-quote&quot;&gt;&quot;call to setxattr with partial attribute value failed with errno %i\n&quot;&lt;/span&gt;, errno );
            &#160;&#160;&#160; &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; -1;
      &#160; }
      &#160; printf(&lt;span class=&quot;code-quote&quot;&gt;&quot;wrote truncated xattr with %i bytes\n&quot;&lt;/span&gt;, asz);
        &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; 0;
}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;It often crashes immediately, sometimes crashes on a second or 3rd run and occasionally crashes a minute or 2 after the code is run. If it doesn&apos;t crash immediately it reports errno as 34.&lt;/p&gt;

&lt;p&gt;Both client and server run lustre 2.12.4, MLNX 4.7-1.0.0.1.&lt;/p&gt;

&lt;p&gt;kernels 3.10.0-1062.4.3.el7.x86_64 and&#160;3.10.0-1062.12.1.el7.x86_64 on clients, 3.10.0-1062.9.1.el7_lustre.x86_64 on servers.&lt;/p&gt;</comment>
                            <comment id="268867" author="cjm14" created="Wed, 29 Apr 2020 14:13:24 +0000"  >&lt;p&gt;Testing on a second system with 2.12.4 client and 2.10.8 server (which has a much simpler file layout), the truncated setxattr returns error code 34 and the client does not crash.&lt;/p&gt;</comment>
                            <comment id="269029" author="adilger" created="Thu, 30 Apr 2020 19:48:55 +0000"  >&lt;blockquote&gt;
&lt;p&gt;The attribute value is 776 bytes long - it looks to me like gio truncates it at the first null character.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Could you please file a bug with gio about this. There definitely isn&apos;t a valid reason to assume that all xattrs are ASCII strings.&lt;/p&gt;

&lt;p&gt;It seems that this gio bug is particularly evil, in that it results in the &lt;tt&gt;LOV_MAGIC_V1&lt;/tt&gt; and &lt;tt&gt;LOV_PATTERN_RAID0&lt;/tt&gt; to be written to the kernel, so it passes the basic validity checks, but the rest of the structure is probably zero or uninitialized garbage. &lt;/p&gt;</comment>
                            <comment id="269043" author="adilger" created="Thu, 30 Apr 2020 21:20:17 +0000"  >&lt;p&gt;I was trying to reproduce this locally on master with your test program, but only every got error 34 and no crash. It looks like this was fixed on master via patch &lt;a href=&quot;https://review.whamcloud.com/36589&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/36589&lt;/a&gt; &quot;&lt;tt&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12911&quot; title=&quot;Setting a LOV EA can access or change outside allocated buffer&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12911&quot;&gt;&lt;del&gt;LU-12911&lt;/del&gt;&lt;/a&gt; llite: Don&apos;t access lov_md fields before size check&lt;/tt&gt;&quot; but that was only landed as commit&#160;&lt;tt&gt;v2_13_52-17-gf2d06d3c76&lt;/tt&gt; and not backported to the b2_12 LTS release.&lt;/p&gt;

&lt;p&gt;I&apos;ve cherry-picked that patch as patch &lt;a href=&quot;https://review.whamcloud.com/38433&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/38433&lt;/a&gt; so you can download a packages from build.whamcloud.com when it finishes, or apply it locally and rebuild to verify it fixes your problem.&lt;/p&gt;

&lt;p&gt;Thanks for doing the legwork on this, it made it straight forward to identify the problem once you had a good reproducer.  Even though it was coincidentally fixed (due to static code analysis), your efforts at least identified that this was a real issue for the 2.12 release that needed to be fixed.&lt;/p&gt;</comment>
                            <comment id="269046" author="gerrit" created="Thu, 30 Apr 2020 22:28:03 +0000"  >&lt;p&gt;Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/38434&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/38434&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13168&quot; title=&quot;Client panic &amp;quot;Freechain corrupt&amp;quot;/&amp;quot;Redzone Overwritten&amp;quot;&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13168&quot;&gt;&lt;del&gt;LU-13168&lt;/del&gt;&lt;/a&gt; tests: verify truncated xattr is handled&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: e0e7f0e4ef0aa868c0d9d38bb517357ef50cea25&lt;/p&gt;</comment>
                            <comment id="269092" author="adilger" created="Fri, 1 May 2020 06:03:47 +0000"  >&lt;p&gt;Christopher, it looks like there was still a problem with PFL layouts, even after the 36589 patch.  I&apos;ve updated the 38434 patch to also fix the PFL layout problem.&lt;/p&gt;</comment>
                            <comment id="269106" author="cjm14" created="Fri, 1 May 2020 13:25:38 +0000"  >&lt;p&gt;I&apos;ve tested the 38434 patch, I downloaded the tarball and built the client rpms from there. This seems to have fix the problem, the system call now returns an error without causing a panic. I&apos;ll try applying the diff to the 2.12.4 source and check that works as well.&lt;/p&gt;

&lt;p&gt;I&apos;ll grab the latest gio as well and check if the handling of extended attributes is fixed, if not I&apos;ll put a bug report in to gnome.&lt;/p&gt;</comment>
                            <comment id="269414" author="cjm14" created="Wed, 6 May 2020 13:39:45 +0000"  >&lt;p&gt;I&apos;ve applied the patch to our 2.12.4 clients. This appears to build fine. Our test code now returns an error as expected rather than triggering a panic. Will this be included in the next 2.12 LTS release?&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="269449" author="pjones" created="Wed, 6 May 2020 18:38:08 +0000"  >&lt;p&gt;&amp;gt; Will this be included in the next 2.12 LTS release?&lt;/p&gt;

&lt;p&gt;Very likely but it needs to land to master first.&lt;/p&gt;</comment>
                            <comment id="270156" author="gerrit" created="Thu, 14 May 2020 05:39:32 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/38434/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/38434/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13168&quot; title=&quot;Client panic &amp;quot;Freechain corrupt&amp;quot;/&amp;quot;Redzone Overwritten&amp;quot;&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13168&quot;&gt;&lt;del&gt;LU-13168&lt;/del&gt;&lt;/a&gt; tests: verify truncated xattr is handled&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: cb74546354201434a6fd3d53acd1a0808fbfcb1c&lt;/p&gt;</comment>
                            <comment id="270201" author="pjones" created="Thu, 14 May 2020 14:51:45 +0000"  >&lt;p&gt;Landed for 2.14&lt;/p&gt;</comment>
                            <comment id="270247" author="gerrit" created="Thu, 14 May 2020 19:02:48 +0000"  >&lt;p&gt;Minh Diep (mdiep@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/38604&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/38604&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13168&quot; title=&quot;Client panic &amp;quot;Freechain corrupt&amp;quot;/&amp;quot;Redzone Overwritten&amp;quot;&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13168&quot;&gt;&lt;del&gt;LU-13168&lt;/del&gt;&lt;/a&gt; tests: verify truncated xattr is handled&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_12&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 7d506d0989c0467e5a5fce7f6295eae578aef82e&lt;/p&gt;</comment>
                            <comment id="271209" author="gerrit" created="Wed, 27 May 2020 02:39:51 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/38604/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/38604/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13168&quot; title=&quot;Client panic &amp;quot;Freechain corrupt&amp;quot;/&amp;quot;Redzone Overwritten&amp;quot;&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13168&quot;&gt;&lt;del&gt;LU-13168&lt;/del&gt;&lt;/a&gt; tests: verify truncated xattr is handled&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_12&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 75c0eb51332639a09c720fb41f3a2cdb5b029afb&lt;/p&gt;</comment>
                            <comment id="271285" author="pjones" created="Wed, 27 May 2020 13:52:37 +0000"  >&lt;p&gt;Fix confirmed in 2.12.5&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                        <issuelink>
            <issuekey id="57255">LU-12911</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="59266">LU-13589</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00sen:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>