[LU-1823] sanity/103: slab corruption Created: 04/Sep/12  Updated: 29/May/17  Resolved: 13/Sep/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0, Lustre 2.4.0
Fix Version/s: Lustre 2.3.0, Lustre 2.4.0

Type: Bug Priority: Blocker
Reporter: Alex Zhuravlev Assignee: Oleg Drokin
Resolution: Fixed Votes: 0
Labels: None
Environment:

CONFIG_DEBUG_SLAB=y


Attachments: File mdt-serial.log    
Issue Links:
Duplicate
is duplicated by LU-1235 timeout in sanity subtest 103,unable ... Resolved
Related
is related to LU-1877 BUG: spinlock bad magic on CPU#0, mdt... Resolved
is related to LU-1844 sanityn, subtest test_16: list_del co... Closed
Severity: 3
Rank (Obsolete): 4237

 Description   

Lustre: DEBUG MARKER: == sanity test 103: acl test ========================================================================= 19:57:07 (1346774227)
/work/lustre/head/clean/lustre/utils/l_getidentity
Slab corruption (Tainted: P --------------- ): size-2048 start=dac6c470, len=2048
Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
Last user: [<dff39e58>](cfs_free+0x8/0x10 [libcfs])
310: 02 00 00 00 01 00 07 00 ff ff ff ff 02 00 05 00
320: 01 00 00 00 02 00 07 00 02 00 00 00 04 00 07 00
330: ff ff ff ff 10 00 07 00 ff ff ff ff 20 00 05 00
340: ff ff ff ff 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Next obj: start=dac6cc88, len=2048
Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
Last user: [<dff39e58>](cfs_free+0x8/0x10 [libcfs])
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b

02000000:00000010:1.0:1346774231.327841:1804:3373:0:(sec_null.c:217:null_alloc_repbuf()) kmalloced 'req->rq_repbuf': 2048 at dac6c470.
...

02000000:00000010:1.0:1346774231.328361:836:3373:0:(sec_null.c:231:null_free_repbuf()) kfreed 'req->rq_repbuf': 2048 at dac6c470.



 Comments   
Comment by Peter Jones [ 04/Sep/12 ]

Keith is going to try and reproduce this with a debug kernel

Comment by Keith Mannthey (Inactive) [ 04/Sep/12 ]

Are there more logs from the rest of the systems? Is there anything special needed to reproduce this?

Comment by Alex Zhuravlev [ 04/Sep/12 ]

I'm able to reproduce this almost 100% with REFORMAT=y ONLY=103 sh sanity.sh, within single vbox instance.

Comment by Andreas Dilger [ 04/Sep/12 ]

I know in the past, Oleg, Johann, and I have wanted to run early development kernels with various debug options enabled for all kernel builds, so that this kind of problem can be flushed out when patches land instead of only at users who run these debug kernels, or hitting silent corruption problems. This previously was suggested by Johann in TT-359, but I think it could be done with a patch to the kernel config options during the development cycle instead of via the test environment (which would need more effort/complexity).

I think since we are early in the 2.4 release cycle that it makes sense to enable these config options for all our server kernels (so they will be seen for servers and clients running the server kernel). We can leave this as a blocker bug for the 2.4 release to remember to revert the debug kernel config changes.

Given the relatively small number of patches that have landed on master compared to 2.3, it probably also makes sense to submit a patch to b2_3 to enable CONFIG_DEBUG_SLAB, CONFIG_DEBUG_SPINLOCK and possibly some others, with:

Test-Parameters: fortestonly testgroup=full

to see if there is a similar failure for b2_3.

Comment by Alex Zhuravlev [ 05/Sep/12 ]

probably it'd make sense to add a trivial (dmesg|grep 'Slab corruption' && error) to t-f so that we don't miss it.

Comment by Keith Mannthey (Inactive) [ 05/Sep/12 ]

I submitted a few config changes for b2_3 as suggested. http://review.whamcloud.com/3875 .

I have been able to get some local testing done today. I tried an older 2.2.59 code base (I had it setup on one of my build servers) and I don't seem to see the problem there but I do see it with Master. I will work to narrow down the window of possible changes. Sorting out if b2_3 is effected is my next step.

Comment by Keith Mannthey (Inactive) [ 05/Sep/12 ]

My config test didn't make it though build on the first pass but Yu has a very nice patch/test here I am watching http://review.whamcloud.com/#change,3876

Comment by Jian Yu [ 05/Sep/12 ]

Hi Keith,

By using the build http://build.whamcloud.com/job/lustre-reviews/8904/ in http://review.whamcloud.com/#change,3876, I can manually reproduce the slab corruption issue on RHEL6 distro by only running sanity test 103:
https://maloo.whamcloud.com/test_sets/2c479ade-f7d3-11e1-8b95-52540035b04c

The autotest run for the above build skipped sanity test 103 because it's in the EXCEPT_SLOW list. I'm updating the commit message to add SLOW=yes into the test parameters.

Comment by Jian Yu [ 06/Sep/12 ]

Hi Keith,

FYI, with the build for patch set 5 of http://review.whamcloud.com/#change,3876, I reproduced the issue with PTLDEBUG=-1 manually:
https://maloo.whamcloud.com/test_sets/59a5ca46-f832-11e1-b114-52540035b04c

Comment by Andreas Dilger [ 06/Sep/12 ]

If there are no obvious sources of this corruption, it probably makes sense to submit this test patch as several separate changes, each based on one of the recent 2.2.* tags, to see if we can isolate when this corruption started. After that, it is hopefully possible to do a (manual?) git-bisect to find which patch is the culprit, or at least narrow down the range of patches that need to be examined manually. It is also important to check in each of the failure cases what node type the corruption is seen on (MDS, OSS, client), since that will also reduce the number of changes which might have introduced the problem.

It would make sense to include a check for the LU-1844 list_add/list_del corruption messages as well, since I suspect that is also a sign of random memory corruption.

Comment by Keith Mannthey (Inactive) [ 06/Sep/12 ]

I have started a git bisect to narrow down the code change but I fear it is not realiable data. I am not sure what has happened on my local vms (I shuffled some vms around yesterday) but I am no longer able to reproduce the core issue. I am running Lustre: 2.3.50 (from Master) with kernel-2.6.32-279.5.2 an not triggering the issue. I am moving back to kernel-2.6.32-279.1.1 (confirmed failed with Yu's test run) to see if the issue reappears.

I will update when I know more.

Comment by Jian Yu [ 06/Sep/12 ]

Per the above test report, the slab corruption issue occurred only on the MDS (fat-intel-2):

fat-intel-2: Slab corruption (Not tainted): size-2048 start=ffff8802e1b534f8, len=2048
fat-intel-2: Slab corruption (Not tainted): size-2048 start=ffff8802e1d776f8, len=2048
fat-intel-2: Slab corruption (Not tainted): size-2048 start=ffff8802e13ca4c8, len=2048
 sanity test_103: @@@@@@ FAIL: slab corruption detected 
Comment by Keith Mannthey (Inactive) [ 07/Sep/12 ]

Moving kernels does not seem to reproduce the issue so it is not a lead. I am going to try some client nodes tomorrow. I saw the error on the MDS as well on my initial Master run but have not see it since.

Comment by Keith Mannthey (Inactive) [ 09/Sep/12 ]

I acquired some torro nods today and am starting to setup. My mds vm crashed while running " REFORMAT=y ONLY=103 sh sanity.sh", it took about 30 hours to trigger). This could be the bad cfs_free path that is corrupting the slab.

I will try and attach the whole dmesg.

This was master + kernel-2.6.32-279 on the MDS vm node.

 
 Lustre: DEBUG MARKER: == sanity test 103: acl test =========================================== 06:06:43 (1347109603)
kfree_debugcheck: out of range ptr 6000100000002h.
------------[ cut here ]------------
kernel BUG at mm/slab.c:2911!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/PNP0C0A:00/power_supply/BAT0/energy_full
CPU 0
Modules linked in: cmm(U) osd_ldiskfs(U) mdt(U) mdd(U) mds(U) fsfilt_ldiskfs(U) exportfs mgs(U) mgc(U) ldiskfs(U) lustre(U) lquota(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) autofs4 sunrpc ipv6 ppdev parport_pc parport microcode i2c_piix4 i2c_core snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000 sg ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom ahci pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]

Pid: 24218, comm: jbd2/dm-2-8 Not tainted 2.6.32.masterDEBUG11A #1 innotek GmbH VirtualBox
RIP: 0010:[<ffffffff81162530>]  [<ffffffff81162530>] kfree_debugcheck+0x30/0x40
RSP: 0018:ffff88002733dba0  EFLAGS: 00010082
RAX: 0000000000000039 RBX: 0006000100000002 RCX: 0000000000007a74
RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000046
RBP: ffff88002733dbb0 R08: 0000000000000000 R09: ffffffff8163acc0
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000202
R13: 0006000100000002 R14: ffff880024d9d298 R15: ffff880024d9d298
FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000003ac2ef5170 CR3: 000000003d0e0000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process jbd2/dm-2-8 (pid: 24218, threadinfo ffff88002733c000, task ffff88003d640ae0)
Stack:
 ffff880000000020 ffffffffa035ebae ffff88002733dc00 ffffffff8116594b
<d> ffff88002851f720 ffff88003f810080 ffff88002733dc20 0006000100000002
<d> ffff880024d9d240 0000000000000000 ffff880024d9d298 ffff880024d9d298
Call Trace:
 [<ffffffffa035ebae>] ? cfs_free+0xe/0x10 [libcfs]
 [<ffffffff8116594b>] kfree+0x5b/0x2a0
 [<ffffffffa035ebae>] cfs_free+0xe/0x10 [libcfs]
 [<ffffffffa04ceb73>] lu_global_key_fini+0xa3/0xf0 [obdclass]
 [<ffffffffa04cf380>] key_fini+0x60/0x190 [obdclass]
 [<ffffffffa04cf4df>] keys_fini+0x2f/0x120 [obdclass]
 [<ffffffffa04cf5fd>] lu_context_fini+0x2d/0xc0 [obdclass]
 [<ffffffffa0b86aa2>] osd_trans_commit_cb+0xe2/0x2b0 [osd_ldiskfs]
 [<ffffffffa0a3f21a>] ldiskfs_journal_commit_callback+0x8a/0xc0 [ldiskfs]
 [<ffffffffa00a18af>] jbd2_journal_commit_transaction+0x110f/0x1530 [jbd2]
 [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
 [<ffffffff8107eabb>] ? try_to_del_timer_sync+0x7b/0xe0
 [<ffffffffa00a7128>] kjournald2+0xb8/0x220 [jbd2]
 [<ffffffff81091d66>] kthread+0x96/0xa0
 [<ffffffff8100c14a>] child_rip+0xa/0x20
 [<ffffffff81091cd0>] ? kthread+0x0/0xa0
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
Code: 48 83 ec 08 0f 1f 44 00 00 48 89 fb e8 7a 67 ee ff 84 c0 74 07 48 83 c4 08 5b c9 c3 48 89 de 48 c7 c7 c8 0b 7a 81 e8 ed cc 39 00 <0f> 0b eb fe 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41
RIP  [<ffffffff81162530>] kfree_debugcheck+0x30/0x40
 RSP <ffff88002733dba0>
---[ end trace ff4011ce2a20c79c ]---
Kernel panic - not syncing: Fatal exception
Pid: 24218, comm: jbd2/dm-2-8 Tainted: G      D    ---------------    2.6.32.masterDEBUG11A #1
Call Trace:
 [<ffffffff814ff155>] ? panic+0xa0/0x168
 [<ffffffff815032e4>] ? oops_end+0xe4/0x100
 [<ffffffff8100f26b>] ? die+0x5b/0x90
 [<ffffffff81502bb4>] ? do_trap+0xc4/0x160
 [<ffffffff8100ce35>] ? do_invalid_op+0x95/0xb0
 [<ffffffff81162530>] ? kfree_debugcheck+0x30/0x40
 [<ffffffffa036def3>] ? libcfs_debug_vmsg2+0x4e3/0xb60 [libcfs]
 [<ffffffff8100bedb>] ? invalid_op+0x1b/0x20
 [<ffffffff81162530>] ? kfree_debugcheck+0x30/0x40
 [<ffffffffa035ebae>] ? cfs_free+0xe/0x10 [libcfs]
 [<ffffffff8116594b>] ? kfree+0x5b/0x2a0
 [<ffffffffa035ebae>] ? cfs_free+0xe/0x10 [libcfs]
 [<ffffffffa04ceb73>] ? lu_global_key_fini+0xa3/0xf0 [obdclass]
 [<ffffffffa04cf380>] ? key_fini+0x60/0x190 [obdclass]
 [<ffffffffa04cf4df>] ? keys_fini+0x2f/0x120 [obdclass]
 [<ffffffffa04cf5fd>] ? lu_context_fini+0x2d/0xc0 [obdclass]
 [<ffffffffa0b86aa2>] ? osd_trans_commit_cb+0xe2/0x2b0 [osd_ldiskfs]
 [<ffffffffa0a3f21a>] ? ldiskfs_journal_commit_callback+0x8a/0xc0 [ldiskfs]
 [<ffffffffa00a18af>] ? jbd2_journal_commit_transaction+0x110f/0x1530 [jbd2]
 [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
 [<ffffffff8107eabb>] ? try_to_del_timer_sync+0x7b/0xe0
 [<ffffffffa00a7128>] ? kjournald2+0xb8/0x220 [jbd2]
 [<ffffffff810920d0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa00a7070>] ? kjournald2+0x0/0x220 [jbd2]
 [<ffffffff81091d66>] ? kthread+0x96/0xa0
 [<ffffffff8100c14a>] ? child_rip+0xa/0x20
 [<ffffffff81091cd0>] ? kthread+0x0/0xa0
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
Comment by Keith Mannthey (Inactive) [ 09/Sep/12 ]

Keith local vm MDS panic -v1 dmesg

Comment by Jian Yu [ 10/Sep/12 ]

Hi Keith,

I created several test patches per the following comments from Andreas:

If there are no obvious sources of this corruption, it probably makes sense to submit this test patch as several separate changes, each based on one of the recent 2.2.* tags, to see if we can isolate when this corruption started.

Patch on tag 2.2.94: http://review.whamcloud.com/#change,3921
Patch on tag 2.3.50: http://review.whamcloud.com/#change,3918
Patch on tag 2.2.93: http://review.whamcloud.com/#change,3919
Patch on tag 2.2.92: http://review.whamcloud.com/#change,3920

Hope we can isolate the issue.

Comment by Andreas Dilger [ 10/Sep/12 ]

Keith, can you please fix Yu Jian's patches that hit build failures.

The 2.3.50 patch failed to build due to built-in version checks, so it needs to be rebased one patch later (git hash 388111848489ef99b1fa31ce8fef255ab9c08e84). I haven't investigated the other failure, but hopefully it is similarly trivial. Please get to this ASAP so that the testing can be started on these patches, and hopefully we can isolate this serious defect more quickly.

Comment by Oleg Drokin [ 10/Sep/12 ]

Ok, I enabled debug pagealloc and caught the offender I think:

[ 3201.251050] Lustre: 9532:0:(mdt_lproc.c:418:lprocfs_wr_identity_upcall()) lus
tre-MDT0000: identity upcall set to /home/green/git/lustre-release/lustre/utils/
l_getidentity
[ 3204.127208] BUG: unable to handle kernel paging request at ffff88021851ab48
[ 3204.129164] IP: [<ffffffff811d9dd8>] posix_acl_to_xattr+0x28/0x80
[ 3204.130098] PGD 1a26063 PUD 2766067 PMD 2829067 PTE 800000021851a160
[ 3204.130673] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
[ 3204.130673] last sysfs file: /sys/devices/system/cpu/possible
[ 3204.130673] CPU 1
[ 3204.130673] Modules linked in: lustre obdfilter ost cmm mdt osd_ldiskfs fsfil
t_ldiskfs ldiskfs mdd mds mgs lquota obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass lvfs ksocklnd lnet libcfs ext2 exportfs jbd sha512_generic sha256_gener
ic sunrpc ipv6 microcode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 mbcac
he jbd2 virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix 
dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs]
[ 3204.130673]
[ 3204.130673] Pid: 8603, comm: mdt00_001 Not tainted 2.6.32-debug #3 Bochs Boch
s
[ 3204.130673] RIP: 0010:[<ffffffff811d9dd8>]  [<ffffffff811d9dd8>] posix_acl_to
_xattr+0x28/0x80
[ 3204.130673] RSP: 0018:ffff88027a9459d0  EFLAGS: 00010287
[ 3204.130673] RAX: 0000000000000034 RBX: ffff88021851ab48 RCX: 0000000000000034
[ 3204.130673] RDX: 0000000000000060 RSI: ffff88021851ab48 RDI: ffff88023a694e50
[ 3204.130673] RBP: ffff88027a9459d0 R08: ffffffffa055f3cd R09: ffffffffa05f8e36
[ 3204.130673] R10: 0000000000000000 R11: ffffffffa05fee98 R12: ffff88023a694e50
[ 3204.130673] R13: 0000000000000060 R14: ffffffffa055f3cd R15: ffff8802470b5b98
[ 3204.130673] FS:  00007fcc8f724700(0000) GS:ffff880028240000(0000) knlGS:0000000000000000
[ 3204.130673] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 3204.130673] CR2: ffff88021851ab48 CR3: 00000002653cb000 CR4: 00000000000006e0
[ 3204.130673] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3204.130673] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 3204.130673] Process mdt00_001 (pid: 8603, threadinfo ffff88027a944000, task ffff88027a942540)
[ 3204.130673] Stack:
[ 3204.130673]  ffff88027a945a10 ffffffffa05ac598 ffff88027a9459f0 ffffffffa067b661
[ 3204.130673] <d> 0000000000000010 ffff880222a70ef0 ffff880202a25bb8 ffff88027848b928
[ 3204.130673] <d> ffff88027a945a20 ffffffffa05ac612 ffff88027a945a30 ffffffff811a0147
[ 3204.130673] Call Trace:
[ 3204.130673]  [<ffffffffa05ac598>] ldiskfs_xattr_get_acl+0x78/0xa0 [ldiskfs]
[ 3204.130673]  [<ffffffffa067b661>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
[ 3204.130673]  [<ffffffffa05ac612>] ldiskfs_xattr_get_acl_access+0x22/0x30 [ldiskfs]
[ 3204.130673]  [<ffffffff811a0147>] generic_getxattr+0x87/0x90
[ 3204.130673]  [<ffffffffa074af64>] osd_xattr_get+0x284/0x360 [osd_ldiskfs]
[ 3204.130673]  [<ffffffffa05356d0>] mdd_attr_set+0x15b0/0x2250 [mdd]
[ 3204.130673]  [<ffffffffa067b661>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
[ 3204.130673]  [<ffffffffa0888a26>] cml_attr_set+0x66/0x1a0 [cmm]
[ 3204.130673]  [<ffffffffa07bc768>] mdt_attr_set+0x2a8/0x590 [mdt]
[ 3204.130673]  [<ffffffffa07bd096>] mdt_reint_setattr+0x646/0x13f0 [mdt]
[ 3204.130673]  [<ffffffffa067b661>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
[ 3204.130673]  [<ffffffffa07b6081>] mdt_reint_rec+0x41/0xe0 [mdt]
[ 3204.130673]  [<ffffffffa07af42a>] mdt_reint_internal+0x50a/0x810 [mdt]
[ 3204.130673]  [<ffffffffa07af774>] mdt_reint+0x44/0xe0 [mdt]
[ 3204.130673]  [<ffffffffa07a1d42>] mdt_handle_common+0x922/0x1740 [mdt]
[ 3204.130673]  [<ffffffffa07a2c35>] mdt_regular_handle+0x15/0x20 [mdt]
[ 3204.130673]  [<ffffffffa0f7486f>] ptlrpc_server_handle_request+0x44f/0xee0 [ptlrpc]
[ 3204.130673]  [<ffffffffa066b66e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
[ 3204.130673]  [<ffffffffa0f6d711>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc]
[ 3204.130673]  [<ffffffffa067b661>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
[ 3204.130673]  [<ffffffff81051f73>] ? __wake_up+0x53/0x70
[ 3204.130673]  [<ffffffffa0f773de>] ptlrpc_main+0xaee/0x1800 [ptlrpc]
[ 3204.130673]  [<ffffffffa0f768f0>] ? ptlrpc_main+0x0/0x1800 [ptlrpc]
[ 3204.130673]  [<ffffffff8100c14a>] child_rip+0xa/0x20
[ 3204.130673]  [<ffffffffa0f768f0>] ? ptlrpc_main+0x0/0x1800 [ptlrpc]
[ 3204.130673]  [<ffffffffa0f768f0>] ? ptlrpc_main+0x0/0x1800 [ptlrpc]
[ 3204.130673]  [<ffffffff8100c140>] ? child_rip+0x0/0x20
[ 3204.130673] Code: c3 90 90 55 48 89 e5 0f 1f 44 00 00 48 63 47 04 48 85 f6 8d 04 c5 04 00 00 00 75 07 c9 c3 0f 1f 44 00 00 48 63 c8 48 39 d1 77 48 <c7> 06 02 00 00 00 8b 57 04 85 d2 74 e4 48 83 c6 04 31 d2 0f 1f
[ 3204.130673] RIP  [<ffffffff811d9dd8>] posix_acl_to_xattr+0x28/0x80
[ 3204.130673]  RSP <ffff88027a9459d0>
[ 3204.130673] CR2: ffff88021851ab48
[ 3204.130673] ---[ end trace f9aa2ef75b2f20bf ]---
Comment by Oleg Drokin [ 10/Sep/12 ]

Ok, my current theory is we have a problem coming from patch in LU-998.

mdd_declare_attr_set:

        if (ma->ma_attr.la_valid & LA_MODE) {
                mdd_read_lock(env, obj, MOR_TGT_CHILD);
                rc = mdo_xattr_get(env, obj, buf, XATTR_NAME_ACL_ACCESS,
                                   BYPASS_CAPA);
                mdd_read_unlock(env, obj);

here we'll just use whatever happens to be in buf which sometimes ends up as garbage.

I am not exactly clear on how it's supposed to be used, but it looks like the buffer should be reinitialized every time it's about to be used afresh (frankly I am not even sure why is it needed because it seems to be always duplicated in some local vars that we pass around). Other cases in that same function do initialize the buffer address.

My reading of the code is such that we are just trying to make sure the xattrs exist at all, but if the buf happens to contain some pointer from before and declared size is big enough to fit the xattr, we read them to the buffer overwriting whatever important data was there (and it seems it's frequently points to freed and then possibly reallocated data too).

The fix for this particular problem is to probably pass in &LU_BUF_NULL instead of buf argument. (patch at http://review.whamcloud.com/3928 and seems to be holding well in my testing)
The wider question if the buf is supposed to point to freed memory at any time still remains to be open and I need somebody's help to determine this.
If that is not supposed to happen, we can track down offending user (and I imagine it's probably not super valid because occasionally on shutdown we do try to free that memory that buffer points to and if it's invalid, that's a problem, even though nobody has ever seen this before for some reason).

Comment by Keith Mannthey (Inactive) [ 10/Sep/12 ]

Oleg are you testing with DEBUG_SLAB as well? child_rip is the only thing that seems common there is no mention of the slab code path and cfs_free in your back trace.

Can you confirm that the slab errors have gone away with your patch? I have put your patch in my test queue on the bisect setup as well.

Comment by Keith Mannthey (Inactive) [ 10/Sep/12 ]
Keith, can you please fix Yu Jian's patches that hit build failures.

The 2.3.50 patch failed to build due to built-in version checks, so it needs to be rebased one patch later (git hash 388111848489ef99b1fa31ce8fef255ab9c08e84).  I haven't investigated the other failure, but hopefully it is similarly trivial.  Please get to this ASAP so that the testing can be started on these patches, and hopefully we can isolate this serious defect more quickly.

I have started a fresh build/bisect setup with client-7 client-17 client-18. I am able to reliability re-create here. This is the exact build that does not trigger on my system and the same config changes at Yu's. I fear that running on a random setups (via autotest) may lead to false negatives since some setups produce the error and some don't. I will kick the builds along but I would not trust any single run passes as "Not Broken".

Comment by Oleg Drokin [ 10/Sep/12 ]

Keith, PAGEALLOC debug is a more powerful (and slow) kernel feature to track random memory writes. What it does is when you free a page instead of merely writing it with some poison value to check later (what you are hitting, and once hit - no way to know who made the write) it actually unmaps the page from address space, so when the offender comes to do the write, the page is not there and we get the nice oops pointing at the red-handed writer to the place they are not supposed to be writing.

Comment by Keith Mannthey (Inactive) [ 11/Sep/12 ]

Oleg I am familiar with PAGEALLOC. I am just wondering if it is a separate issue or not. I didn't get a chance to test your patch but I will do it first thing tomorrow.

On my nodes can produce the issue I was able to test 2.2.59 and I don't see the issue but lu-998 code should be present it the release.

I am building on the client nodes now to speed thing up I hope tomorrow I can get thing greatly narrowed down if Oleg's patch is not the root issue.

Sorry I didn't get a change to push Yu's build errors yet.

Comment by Jian Yu [ 11/Sep/12 ]

Hi Keith,
I submitted a patch which contained Oleg's patch and enabled the kernel debug options to verify whether the issue is resolved or not on master branch: http://review.whamcloud.com/3936.

Comment by Oleg Drokin [ 11/Sep/12 ]

Keith, ok. Well, the original issue my patch fixes is there since long ago, it might have been hidden to because nobody was leaving incorrect pointers in the mti_info buf, and then later somebody started to do it and that's why everything started to fall apart lately.
Granted I did not try to go back all the way to see if I can reproduce it on just the original commit

Comment by Keith Mannthey (Inactive) [ 11/Sep/12 ]

I been running http://review.whamcloud.com/3928 with Master for a few hours now and I do not seem to be able to reproduce the errors. It looks to be the fix to me.

If Yu's latest build agrees I will cherry pick it to b2_3 and b2_2.

Thanks for finding this Oleg.

Comment by Oleg Drokin [ 11/Sep/12 ]

Don't create separate patches for other branches, please/ I'll cherry-pick this patch directly since that part of the code have not changed since landing.

Comment by Jian Yu [ 12/Sep/12 ]

I submitted a patch which contained Oleg's patch and enabled the kernel debug options to verify whether the issue is resolved or not on master branch: http://review.whamcloud.com/3936.

sanity test 103 passed without detecting slab corruption issue.

Comment by Keith Mannthey (Inactive) [ 12/Sep/12 ]

I have almost 24 hours of testing and still no recreation.

Oleg I will leave the cherry-pick to you.

Comment by Peter Jones [ 13/Sep/12 ]

Landed for 2.3 and 2.4

Comment by Sarah Liu [ 06/Dec/13 ]

review-dne failed as:

12:47:05:[ 3488.673687] Lustre: 3067:0:(client.c:1903:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1386362637/real 1386362637]  req@ffff880036623bb8 x1453703367244336/t0(0) o38->lustre-MDT0002-osp-MDT0000@10.10.16.232@tcp:24/4 lens 400/544 e 0 to 1 dl 1386362662 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
12:47:06:[ 3488.676610] Lustre: 3067:0:(client.c:1903:ptlrpc_expire_one_request()) Skipped 305 previous similar messages
12:47:06:[ 3673.675126] BUG: unable to handle kernel paging request at ffff880022a6ae30
12:47:06:[ 3673.675968] IP: [<ffffffffa09ef395>] ksocknal_send+0x165/0x450 [ksocklnd]
12:47:06:[ 3673.676583] PGD 1a26063 PUD 1a2a063 PMD 215067 PTE 22a6a060
12:47:06:[ 3673.677103] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
12:47:06:[ 3673.677103] last sysfs file: /sys/devices/system/cpu/possible
12:47:07:[ 3673.677103] CPU 1 
12:47:07:[ 3673.677103] Modules linked in: osp(U) mdd(U) lfsck(U) lod(U) mdt(U) mgs(U) mgc(U) osd_ldiskfs(U) lquota(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) sha512_generic sha256_generic libcfs(U) ldiskfs(U) jbd2 nfsd exportfs autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
12:47:07:[ 3673.677103] 
12:47:07:[ 3673.677103] Pid: 3069, comm: ptlrpcd_1 Not tainted 2.6.32-358.23.2.el6_lustre.g3ddc521.x86_64 #1 Red Hat KVM
12:47:07:[ 3673.677103] RIP: 0010:[<ffffffffa09ef395>]  [<ffffffffa09ef395>] ksocknal_send+0x165/0x450 [ksocklnd]
12:47:07:[ 3673.677103] RSP: 0018:ffff88005f7bf980  EFLAGS: 00010286
12:47:07:[ 3673.677103] RAX: 0000000000000000 RBX: ffff88000eb08ef0 RCX: 0000000000000009
12:47:07:[ 3673.677103] RDX: ffff880002300000 RSI: ffff880002311b20 RDI: ffffffffa09feb38
12:47:07:[ 3673.677103] RBP: ffff88005f7bf9f0 R08: 0000000000000000 R09: 0000000000000000
12:47:07:[ 3673.677103] R10: ffff8800497d37e0 R11: 0000000000000000 R12: ffff880022a6adf0
12:47:07:[ 3673.677103] R13: 0000000000000001 R14: ffff880064cd64e0 R15: 00000000000000e0
12:47:07:[ 3673.677103] FS:  0000000000000000(0000) GS:ffff880002300000(0000) knlGS:0000000000000000
12:47:07:[ 3673.677103] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
12:47:07:[ 3673.677103] CR2: ffff880022a6ae30 CR3: 0000000001a25000 CR4: 00000000000006e0
12:47:07:[ 3673.677103] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
12:47:07:[ 3673.677103] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
12:47:07:[ 3673.677103] Process ptlrpcd_1 (pid: 3069, threadinfo ffff88005f7be000, task ffff88005c46c340)
12:47:07:[ 3673.677103] Stack:
12:47:07:[ 3673.677103]  ffff880015f79c01 0000000122a6adf0 0000000000000000 ffff880052054df0
12:47:07:[ 3673.677103] <d> 0000000000000000 000000e800000000 000200000a0a10ed 0000000000003039
12:47:07:[ 3673.677103] <d> ffff88005f7bfa10 ffff880052054df0 ffff880022a6adf0 ffff880022a6adf0
12:47:07:[ 3673.677103] Call Trace:
12:47:08:[ 3673.677103]  [<ffffffffa0532dfb>] lnet_ni_send+0x4b/0xf0 [lnet]
12:47:08:[ 3673.677103]  [<ffffffffa0537005>] lnet_send+0x655/0xb80 [lnet]
12:47:08:[ 3673.677103]  [<ffffffffa053806a>] LNetPut+0x31a/0x860 [lnet]
12:47:08:[ 3673.677103]  [<ffffffffa0808dc0>] ptl_send_buf+0x1e0/0x550 [ptlrpc]
12:47:08:[ 3673.677103]  [<ffffffff8103b82c>] ? kvm_clock_read+0x1c/0x20
12:47:08:[ 3673.677103]  [<ffffffffa080c2bd>] ptl_send_rpc+0x4dd/0xcc0 [ptlrpc]
12:47:08:[ 3673.677103]  [<ffffffffa0800994>] ptlrpc_send_new_req+0x454/0x7c0 [ptlrpc]
12:47:08:[ 3673.677103]  [<ffffffffa0804728>] ptlrpc_check_set+0x898/0x1da0 [ptlrpc]
12:47:08:[ 3673.677103]  [<ffffffffa083061b>] ptlrpcd_check+0x55b/0x590 [ptlrpc]
12:47:08:[ 3673.677103]  [<ffffffffa0830bdd>] ptlrpcd+0x2ad/0x3f0 [ptlrpc]
12:47:08:[ 3673.677103]  [<ffffffff8105bca0>] ? default_wake_function+0x0/0x20
12:47:08:[ 3673.677103]  [<ffffffffa0830930>] ? ptlrpcd+0x0/0x3f0 [ptlrpc]
12:47:08:[ 3673.677103]  [<ffffffff81095696>] kthread+0x96/0xa0
12:47:08:[ 3673.677103]  [<ffffffff8100c10a>] child_rip+0xa/0x20
12:47:08:[ 3673.677103]  [<ffffffff81095600>] ? kthread+0x0/0xa0
12:47:09:[ 3673.677103]  [<ffffffff8100c100>] ? child_rip+0x0/0x20
12:47:09:[ 3673.677103] Code: 00 c7 43 60 c1 00 00 00 48 89 de 48 c7 43 70 00 00 00 00 48 c7 43 68 00 00 00 00 48 8b 55 c0 8b 4d c8 48 8b 7d a8 e8 7b fa ff ff <41> f6 44 24 40 08 74 1b 8b 4d b4 85 c9 0f 84 c8 00 00 00 65 48 
12:47:09:[ 3673.677103] RIP  [<ffffffffa09ef395>] ksocknal_send+0x165/0x450 [ksocklnd]
12:47:09:[ 3673.677103]  RSP <ffff88005f7bf980>
12:47:09:[ 3673.677103] CR2: ffff880022a6ae30
12:47:09:[    0.000000] Initializing cgroup subsys cpuset
12:47:09:[    0.000000] Initializing cgroup subsys cpu
Generated at Sat Feb 10 01:19:59 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.