Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.3.0, Lustre 2.4.0
    • Lustre 2.3.0, Lustre 2.4.0
    • None
    • CONFIG_DEBUG_SLAB=y
    • 3
    • 4237

    Description

      Lustre: DEBUG MARKER: == sanity test 103: acl test ========================================================================= 19:57:07 (1346774227)
      /work/lustre/head/clean/lustre/utils/l_getidentity
      Slab corruption (Tainted: P --------------- ): size-2048 start=dac6c470, len=2048
      Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
      Last user: [<dff39e58>](cfs_free+0x8/0x10 [libcfs])
      310: 02 00 00 00 01 00 07 00 ff ff ff ff 02 00 05 00
      320: 01 00 00 00 02 00 07 00 02 00 00 00 04 00 07 00
      330: ff ff ff ff 10 00 07 00 ff ff ff ff 20 00 05 00
      340: ff ff ff ff 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
      Next obj: start=dac6cc88, len=2048
      Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
      Last user: [<dff39e58>](cfs_free+0x8/0x10 [libcfs])
      000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
      010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b

      02000000:00000010:1.0:1346774231.327841:1804:3373:0:(sec_null.c:217:null_alloc_repbuf()) kmalloced 'req->rq_repbuf': 2048 at dac6c470.
      ...

      02000000:00000010:1.0:1346774231.328361:836:3373:0:(sec_null.c:231:null_free_repbuf()) kfreed 'req->rq_repbuf': 2048 at dac6c470.

      Attachments

        Issue Links

          Activity

            [LU-1823] sanity/103: slab corruption
            adilger Andreas Dilger added a comment - - edited

            Keith, can you please fix Yu Jian's patches that hit build failures.

            The 2.3.50 patch failed to build due to built-in version checks, so it needs to be rebased one patch later (git hash 388111848489ef99b1fa31ce8fef255ab9c08e84). I haven't investigated the other failure, but hopefully it is similarly trivial. Please get to this ASAP so that the testing can be started on these patches, and hopefully we can isolate this serious defect more quickly.

            adilger Andreas Dilger added a comment - - edited Keith, can you please fix Yu Jian's patches that hit build failures. The 2.3.50 patch failed to build due to built-in version checks, so it needs to be rebased one patch later (git hash 388111848489ef99b1fa31ce8fef255ab9c08e84). I haven't investigated the other failure, but hopefully it is similarly trivial. Please get to this ASAP so that the testing can be started on these patches, and hopefully we can isolate this serious defect more quickly.
            yujian Jian Yu added a comment -

            Hi Keith,

            I created several test patches per the following comments from Andreas:

            If there are no obvious sources of this corruption, it probably makes sense to submit this test patch as several separate changes, each based on one of the recent 2.2.* tags, to see if we can isolate when this corruption started.

            Patch on tag 2.2.94: http://review.whamcloud.com/#change,3921
            Patch on tag 2.3.50: http://review.whamcloud.com/#change,3918
            Patch on tag 2.2.93: http://review.whamcloud.com/#change,3919
            Patch on tag 2.2.92: http://review.whamcloud.com/#change,3920

            Hope we can isolate the issue.

            yujian Jian Yu added a comment - Hi Keith, I created several test patches per the following comments from Andreas: If there are no obvious sources of this corruption, it probably makes sense to submit this test patch as several separate changes, each based on one of the recent 2.2.* tags, to see if we can isolate when this corruption started. Patch on tag 2.2.94: http://review.whamcloud.com/#change,3921 Patch on tag 2.3.50: http://review.whamcloud.com/#change,3918 Patch on tag 2.2.93: http://review.whamcloud.com/#change,3919 Patch on tag 2.2.92: http://review.whamcloud.com/#change,3920 Hope we can isolate the issue.

            Keith local vm MDS panic -v1 dmesg

            keith Keith Mannthey (Inactive) added a comment - Keith local vm MDS panic -v1 dmesg
            keith Keith Mannthey (Inactive) added a comment - - edited

            I acquired some torro nods today and am starting to setup. My mds vm crashed while running " REFORMAT=y ONLY=103 sh sanity.sh", it took about 30 hours to trigger). This could be the bad cfs_free path that is corrupting the slab.

            I will try and attach the whole dmesg.

            This was master + kernel-2.6.32-279 on the MDS vm node.

             
             Lustre: DEBUG MARKER: == sanity test 103: acl test =========================================== 06:06:43 (1347109603)
            kfree_debugcheck: out of range ptr 6000100000002h.
            ------------[ cut here ]------------
            kernel BUG at mm/slab.c:2911!
            invalid opcode: 0000 [#1] SMP
            last sysfs file: /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/PNP0C0A:00/power_supply/BAT0/energy_full
            CPU 0
            Modules linked in: cmm(U) osd_ldiskfs(U) mdt(U) mdd(U) mds(U) fsfilt_ldiskfs(U) exportfs mgs(U) mgc(U) ldiskfs(U) lustre(U) lquota(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) autofs4 sunrpc ipv6 ppdev parport_pc parport microcode i2c_piix4 i2c_core snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000 sg ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom ahci pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
            
            Pid: 24218, comm: jbd2/dm-2-8 Not tainted 2.6.32.masterDEBUG11A #1 innotek GmbH VirtualBox
            RIP: 0010:[<ffffffff81162530>]  [<ffffffff81162530>] kfree_debugcheck+0x30/0x40
            RSP: 0018:ffff88002733dba0  EFLAGS: 00010082
            RAX: 0000000000000039 RBX: 0006000100000002 RCX: 0000000000007a74
            RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000046
            RBP: ffff88002733dbb0 R08: 0000000000000000 R09: ffffffff8163acc0
            R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000202
            R13: 0006000100000002 R14: ffff880024d9d298 R15: ffff880024d9d298
            FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
            CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
            CR2: 0000003ac2ef5170 CR3: 000000003d0e0000 CR4: 00000000000006f0
            DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
            DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
            Process jbd2/dm-2-8 (pid: 24218, threadinfo ffff88002733c000, task ffff88003d640ae0)
            Stack:
             ffff880000000020 ffffffffa035ebae ffff88002733dc00 ffffffff8116594b
            <d> ffff88002851f720 ffff88003f810080 ffff88002733dc20 0006000100000002
            <d> ffff880024d9d240 0000000000000000 ffff880024d9d298 ffff880024d9d298
            Call Trace:
             [<ffffffffa035ebae>] ? cfs_free+0xe/0x10 [libcfs]
             [<ffffffff8116594b>] kfree+0x5b/0x2a0
             [<ffffffffa035ebae>] cfs_free+0xe/0x10 [libcfs]
             [<ffffffffa04ceb73>] lu_global_key_fini+0xa3/0xf0 [obdclass]
             [<ffffffffa04cf380>] key_fini+0x60/0x190 [obdclass]
             [<ffffffffa04cf4df>] keys_fini+0x2f/0x120 [obdclass]
             [<ffffffffa04cf5fd>] lu_context_fini+0x2d/0xc0 [obdclass]
             [<ffffffffa0b86aa2>] osd_trans_commit_cb+0xe2/0x2b0 [osd_ldiskfs]
             [<ffffffffa0a3f21a>] ldiskfs_journal_commit_callback+0x8a/0xc0 [ldiskfs]
             [<ffffffffa00a18af>] jbd2_journal_commit_transaction+0x110f/0x1530 [jbd2]
             [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
             [<ffffffff8107eabb>] ? try_to_del_timer_sync+0x7b/0xe0
             [<ffffffffa00a7128>] kjournald2+0xb8/0x220 [jbd2]
             [<ffffffff81091d66>] kthread+0x96/0xa0
             [<ffffffff8100c14a>] child_rip+0xa/0x20
             [<ffffffff81091cd0>] ? kthread+0x0/0xa0
             [<ffffffff8100c140>] ? child_rip+0x0/0x20
            Code: 48 83 ec 08 0f 1f 44 00 00 48 89 fb e8 7a 67 ee ff 84 c0 74 07 48 83 c4 08 5b c9 c3 48 89 de 48 c7 c7 c8 0b 7a 81 e8 ed cc 39 00 <0f> 0b eb fe 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41
            RIP  [<ffffffff81162530>] kfree_debugcheck+0x30/0x40
             RSP <ffff88002733dba0>
            ---[ end trace ff4011ce2a20c79c ]---
            Kernel panic - not syncing: Fatal exception
            Pid: 24218, comm: jbd2/dm-2-8 Tainted: G      D    ---------------    2.6.32.masterDEBUG11A #1
            Call Trace:
             [<ffffffff814ff155>] ? panic+0xa0/0x168
             [<ffffffff815032e4>] ? oops_end+0xe4/0x100
             [<ffffffff8100f26b>] ? die+0x5b/0x90
             [<ffffffff81502bb4>] ? do_trap+0xc4/0x160
             [<ffffffff8100ce35>] ? do_invalid_op+0x95/0xb0
             [<ffffffff81162530>] ? kfree_debugcheck+0x30/0x40
             [<ffffffffa036def3>] ? libcfs_debug_vmsg2+0x4e3/0xb60 [libcfs]
             [<ffffffff8100bedb>] ? invalid_op+0x1b/0x20
             [<ffffffff81162530>] ? kfree_debugcheck+0x30/0x40
             [<ffffffffa035ebae>] ? cfs_free+0xe/0x10 [libcfs]
             [<ffffffff8116594b>] ? kfree+0x5b/0x2a0
             [<ffffffffa035ebae>] ? cfs_free+0xe/0x10 [libcfs]
             [<ffffffffa04ceb73>] ? lu_global_key_fini+0xa3/0xf0 [obdclass]
             [<ffffffffa04cf380>] ? key_fini+0x60/0x190 [obdclass]
             [<ffffffffa04cf4df>] ? keys_fini+0x2f/0x120 [obdclass]
             [<ffffffffa04cf5fd>] ? lu_context_fini+0x2d/0xc0 [obdclass]
             [<ffffffffa0b86aa2>] ? osd_trans_commit_cb+0xe2/0x2b0 [osd_ldiskfs]
             [<ffffffffa0a3f21a>] ? ldiskfs_journal_commit_callback+0x8a/0xc0 [ldiskfs]
             [<ffffffffa00a18af>] ? jbd2_journal_commit_transaction+0x110f/0x1530 [jbd2]
             [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
             [<ffffffff8107eabb>] ? try_to_del_timer_sync+0x7b/0xe0
             [<ffffffffa00a7128>] ? kjournald2+0xb8/0x220 [jbd2]
             [<ffffffff810920d0>] ? autoremove_wake_function+0x0/0x40
             [<ffffffffa00a7070>] ? kjournald2+0x0/0x220 [jbd2]
             [<ffffffff81091d66>] ? kthread+0x96/0xa0
             [<ffffffff8100c14a>] ? child_rip+0xa/0x20
             [<ffffffff81091cd0>] ? kthread+0x0/0xa0
             [<ffffffff8100c140>] ? child_rip+0x0/0x20
            
            keith Keith Mannthey (Inactive) added a comment - - edited I acquired some torro nods today and am starting to setup. My mds vm crashed while running " REFORMAT=y ONLY=103 sh sanity.sh", it took about 30 hours to trigger). This could be the bad cfs_free path that is corrupting the slab. I will try and attach the whole dmesg. This was master + kernel-2.6.32-279 on the MDS vm node. Lustre: DEBUG MARKER: == sanity test 103: acl test =========================================== 06:06:43 (1347109603) kfree_debugcheck: out of range ptr 6000100000002h. ------------[ cut here ]------------ kernel BUG at mm/slab.c:2911! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/PNP0C0A:00/power_supply/BAT0/energy_full CPU 0 Modules linked in: cmm(U) osd_ldiskfs(U) mdt(U) mdd(U) mds(U) fsfilt_ldiskfs(U) exportfs mgs(U) mgc(U) ldiskfs(U) lustre(U) lquota(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) autofs4 sunrpc ipv6 ppdev parport_pc parport microcode i2c_piix4 i2c_core snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000 sg ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom ahci pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] Pid: 24218, comm: jbd2/dm-2-8 Not tainted 2.6.32.masterDEBUG11A #1 innotek GmbH VirtualBox RIP: 0010:[<ffffffff81162530>] [<ffffffff81162530>] kfree_debugcheck+0x30/0x40 RSP: 0018:ffff88002733dba0 EFLAGS: 00010082 RAX: 0000000000000039 RBX: 0006000100000002 RCX: 0000000000007a74 RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000046 RBP: ffff88002733dbb0 R08: 0000000000000000 R09: ffffffff8163acc0 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000202 R13: 0006000100000002 R14: ffff880024d9d298 R15: ffff880024d9d298 FS: 0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000003ac2ef5170 CR3: 000000003d0e0000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process jbd2/dm-2-8 (pid: 24218, threadinfo ffff88002733c000, task ffff88003d640ae0) Stack: ffff880000000020 ffffffffa035ebae ffff88002733dc00 ffffffff8116594b <d> ffff88002851f720 ffff88003f810080 ffff88002733dc20 0006000100000002 <d> ffff880024d9d240 0000000000000000 ffff880024d9d298 ffff880024d9d298 Call Trace: [<ffffffffa035ebae>] ? cfs_free+0xe/0x10 [libcfs] [<ffffffff8116594b>] kfree+0x5b/0x2a0 [<ffffffffa035ebae>] cfs_free+0xe/0x10 [libcfs] [<ffffffffa04ceb73>] lu_global_key_fini+0xa3/0xf0 [obdclass] [<ffffffffa04cf380>] key_fini+0x60/0x190 [obdclass] [<ffffffffa04cf4df>] keys_fini+0x2f/0x120 [obdclass] [<ffffffffa04cf5fd>] lu_context_fini+0x2d/0xc0 [obdclass] [<ffffffffa0b86aa2>] osd_trans_commit_cb+0xe2/0x2b0 [osd_ldiskfs] [<ffffffffa0a3f21a>] ldiskfs_journal_commit_callback+0x8a/0xc0 [ldiskfs] [<ffffffffa00a18af>] jbd2_journal_commit_transaction+0x110f/0x1530 [jbd2] [<ffffffff810096f0>] ? __switch_to+0xd0/0x320 [<ffffffff8107eabb>] ? try_to_del_timer_sync+0x7b/0xe0 [<ffffffffa00a7128>] kjournald2+0xb8/0x220 [jbd2] [<ffffffff81091d66>] kthread+0x96/0xa0 [<ffffffff8100c14a>] child_rip+0xa/0x20 [<ffffffff81091cd0>] ? kthread+0x0/0xa0 [<ffffffff8100c140>] ? child_rip+0x0/0x20 Code: 48 83 ec 08 0f 1f 44 00 00 48 89 fb e8 7a 67 ee ff 84 c0 74 07 48 83 c4 08 5b c9 c3 48 89 de 48 c7 c7 c8 0b 7a 81 e8 ed cc 39 00 <0f> 0b eb fe 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 RIP [<ffffffff81162530>] kfree_debugcheck+0x30/0x40 RSP <ffff88002733dba0> ---[ end trace ff4011ce2a20c79c ]--- Kernel panic - not syncing: Fatal exception Pid: 24218, comm: jbd2/dm-2-8 Tainted: G D --------------- 2.6.32.masterDEBUG11A #1 Call Trace: [<ffffffff814ff155>] ? panic+0xa0/0x168 [<ffffffff815032e4>] ? oops_end+0xe4/0x100 [<ffffffff8100f26b>] ? die+0x5b/0x90 [<ffffffff81502bb4>] ? do_trap+0xc4/0x160 [<ffffffff8100ce35>] ? do_invalid_op+0x95/0xb0 [<ffffffff81162530>] ? kfree_debugcheck+0x30/0x40 [<ffffffffa036def3>] ? libcfs_debug_vmsg2+0x4e3/0xb60 [libcfs] [<ffffffff8100bedb>] ? invalid_op+0x1b/0x20 [<ffffffff81162530>] ? kfree_debugcheck+0x30/0x40 [<ffffffffa035ebae>] ? cfs_free+0xe/0x10 [libcfs] [<ffffffff8116594b>] ? kfree+0x5b/0x2a0 [<ffffffffa035ebae>] ? cfs_free+0xe/0x10 [libcfs] [<ffffffffa04ceb73>] ? lu_global_key_fini+0xa3/0xf0 [obdclass] [<ffffffffa04cf380>] ? key_fini+0x60/0x190 [obdclass] [<ffffffffa04cf4df>] ? keys_fini+0x2f/0x120 [obdclass] [<ffffffffa04cf5fd>] ? lu_context_fini+0x2d/0xc0 [obdclass] [<ffffffffa0b86aa2>] ? osd_trans_commit_cb+0xe2/0x2b0 [osd_ldiskfs] [<ffffffffa0a3f21a>] ? ldiskfs_journal_commit_callback+0x8a/0xc0 [ldiskfs] [<ffffffffa00a18af>] ? jbd2_journal_commit_transaction+0x110f/0x1530 [jbd2] [<ffffffff810096f0>] ? __switch_to+0xd0/0x320 [<ffffffff8107eabb>] ? try_to_del_timer_sync+0x7b/0xe0 [<ffffffffa00a7128>] ? kjournald2+0xb8/0x220 [jbd2] [<ffffffff810920d0>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa00a7070>] ? kjournald2+0x0/0x220 [jbd2] [<ffffffff81091d66>] ? kthread+0x96/0xa0 [<ffffffff8100c14a>] ? child_rip+0xa/0x20 [<ffffffff81091cd0>] ? kthread+0x0/0xa0 [<ffffffff8100c140>] ? child_rip+0x0/0x20

            Moving kernels does not seem to reproduce the issue so it is not a lead. I am going to try some client nodes tomorrow. I saw the error on the MDS as well on my initial Master run but have not see it since.

            keith Keith Mannthey (Inactive) added a comment - Moving kernels does not seem to reproduce the issue so it is not a lead. I am going to try some client nodes tomorrow. I saw the error on the MDS as well on my initial Master run but have not see it since.
            yujian Jian Yu added a comment -

            Per the above test report, the slab corruption issue occurred only on the MDS (fat-intel-2):

            fat-intel-2: Slab corruption (Not tainted): size-2048 start=ffff8802e1b534f8, len=2048
            fat-intel-2: Slab corruption (Not tainted): size-2048 start=ffff8802e1d776f8, len=2048
            fat-intel-2: Slab corruption (Not tainted): size-2048 start=ffff8802e13ca4c8, len=2048
             sanity test_103: @@@@@@ FAIL: slab corruption detected 
            
            yujian Jian Yu added a comment - Per the above test report, the slab corruption issue occurred only on the MDS (fat-intel-2): fat-intel-2: Slab corruption (Not tainted): size-2048 start=ffff8802e1b534f8, len=2048 fat-intel-2: Slab corruption (Not tainted): size-2048 start=ffff8802e1d776f8, len=2048 fat-intel-2: Slab corruption (Not tainted): size-2048 start=ffff8802e13ca4c8, len=2048 sanity test_103: @@@@@@ FAIL: slab corruption detected

            I have started a git bisect to narrow down the code change but I fear it is not realiable data. I am not sure what has happened on my local vms (I shuffled some vms around yesterday) but I am no longer able to reproduce the core issue. I am running Lustre: 2.3.50 (from Master) with kernel-2.6.32-279.5.2 an not triggering the issue. I am moving back to kernel-2.6.32-279.1.1 (confirmed failed with Yu's test run) to see if the issue reappears.

            I will update when I know more.

            keith Keith Mannthey (Inactive) added a comment - I have started a git bisect to narrow down the code change but I fear it is not realiable data. I am not sure what has happened on my local vms (I shuffled some vms around yesterday) but I am no longer able to reproduce the core issue. I am running Lustre: 2.3.50 (from Master) with kernel-2.6.32-279.5.2 an not triggering the issue. I am moving back to kernel-2.6.32-279.1.1 (confirmed failed with Yu's test run) to see if the issue reappears. I will update when I know more.

            If there are no obvious sources of this corruption, it probably makes sense to submit this test patch as several separate changes, each based on one of the recent 2.2.* tags, to see if we can isolate when this corruption started. After that, it is hopefully possible to do a (manual?) git-bisect to find which patch is the culprit, or at least narrow down the range of patches that need to be examined manually. It is also important to check in each of the failure cases what node type the corruption is seen on (MDS, OSS, client), since that will also reduce the number of changes which might have introduced the problem.

            It would make sense to include a check for the LU-1844 list_add/list_del corruption messages as well, since I suspect that is also a sign of random memory corruption.

            adilger Andreas Dilger added a comment - If there are no obvious sources of this corruption, it probably makes sense to submit this test patch as several separate changes, each based on one of the recent 2.2.* tags, to see if we can isolate when this corruption started. After that, it is hopefully possible to do a (manual?) git-bisect to find which patch is the culprit, or at least narrow down the range of patches that need to be examined manually. It is also important to check in each of the failure cases what node type the corruption is seen on (MDS, OSS, client), since that will also reduce the number of changes which might have introduced the problem. It would make sense to include a check for the LU-1844 list_add/list_del corruption messages as well, since I suspect that is also a sign of random memory corruption.
            yujian Jian Yu added a comment -

            Hi Keith,

            FYI, with the build for patch set 5 of http://review.whamcloud.com/#change,3876, I reproduced the issue with PTLDEBUG=-1 manually:
            https://maloo.whamcloud.com/test_sets/59a5ca46-f832-11e1-b114-52540035b04c

            yujian Jian Yu added a comment - Hi Keith, FYI, with the build for patch set 5 of http://review.whamcloud.com/#change,3876 , I reproduced the issue with PTLDEBUG=-1 manually: https://maloo.whamcloud.com/test_sets/59a5ca46-f832-11e1-b114-52540035b04c
            yujian Jian Yu added a comment -

            Hi Keith,

            By using the build http://build.whamcloud.com/job/lustre-reviews/8904/ in http://review.whamcloud.com/#change,3876, I can manually reproduce the slab corruption issue on RHEL6 distro by only running sanity test 103:
            https://maloo.whamcloud.com/test_sets/2c479ade-f7d3-11e1-8b95-52540035b04c

            The autotest run for the above build skipped sanity test 103 because it's in the EXCEPT_SLOW list. I'm updating the commit message to add SLOW=yes into the test parameters.

            yujian Jian Yu added a comment - Hi Keith, By using the build http://build.whamcloud.com/job/lustre-reviews/8904/ in http://review.whamcloud.com/#change,3876 , I can manually reproduce the slab corruption issue on RHEL6 distro by only running sanity test 103: https://maloo.whamcloud.com/test_sets/2c479ade-f7d3-11e1-8b95-52540035b04c The autotest run for the above build skipped sanity test 103 because it's in the EXCEPT_SLOW list. I'm updating the commit message to add SLOW=yes into the test parameters.

            My config test didn't make it though build on the first pass but Yu has a very nice patch/test here I am watching http://review.whamcloud.com/#change,3876

            keith Keith Mannthey (Inactive) added a comment - My config test didn't make it though build on the first pass but Yu has a very nice patch/test here I am watching http://review.whamcloud.com/#change,3876

            People

              green Oleg Drokin
              bzzz Alex Zhuravlev
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: