[LU-1513] Test failure on test lustre-initialization-1, sanity, others: "kernel BUG at mm/slab.c:2833" Created: 12/Jun/12  Updated: 18/Jun/12  Resolved: 16/Jun/12

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0
Fix Version/s: Lustre 2.3.0

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by LU-1529 kernel BUG at mm/slab.c:28333 Resolved
is duplicated by LU-1530 kernel panic at mm/slab.c:2833 Closed
Related
is related to LU-1402 cfs_alloc_flags_to_gfp should transla... Resolved
is related to LU-1510 Test failure on test suite sanity-quo... Resolved
Severity: 3
Rank (Obsolete): 3973

 Description   

This issue was created by maloo for Li Wei <liwei@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/9d879d58-b4de-11e1-bad8-52540035b04c.

The sub-test lustre-initialization_1 failed with the following error:

Test system failed to start single suite, so abandoning all hope and giving up

Info required for matching: lustre-initialization-1 lustre-initialization_1

From the OSS console log:

14:52:23:LDISKFS-fs (dm-6): mounted filesystem with ordered data mode. Opts: 
14:52:23:LDISKFS-fs (dm-6): mounted filesystem with ordered data mode. Opts: 
14:52:23:------------[ cut here ]------------
14:52:23:kernel BUG at mm/slab.c:2833!
14:52:23:invalid opcode: 0000 [#1] SMP 
14:52:23:last sysfs file: /sys/devices/pci0000:00/0000:00:05.0/virtio1/block/vda/queue/max_sectors_kb
14:52:24:CPU 1 
14:52:24:Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) lustre(U) lquota(U) lov(U) osc(U) mdc(U) fid(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic sha256_generic libcfs(U) ldiskfs(U) jbd2 nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode 8139too 8139cp mii i2c_piix4 i2c_core virtio_balloon ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
14:52:24:
14:52:25:Pid: 5368, comm: llog_process_th Not tainted 2.6.32-220.17.1.el6_lustre.x86_64 #1 Red Hat KVM
14:52:25:RIP: 0010:[<ffffffff8115e7b3>]  [<ffffffff8115e7b3>] cache_grow+0x313/0x320
14:52:25:RSP: 0018:ffff88007c1d1990  EFLAGS: 00010002
14:52:25:RAX: ffff88007f821800 RBX: ffff88007f910440 RCX: 0000000000000000
14:52:25:RDX: 0000000000000000 RSI: 0000000000041252 RDI: ffff88007f910440
14:52:25:RBP: ffff88007c1d19f0 R08: 0000000000000246 R09: 00000000fffffffe
14:52:25:R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000041252
14:52:25:R13: ffff88007f8217c0 R14: 000000000000000c R15: 0000000000000000
14:52:25:FS:  0000000000000000(0000) GS:ffff880002300000(0000) knlGS:0000000000000000
14:52:25:CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
14:52:25:CR2: 00007f0a50fde450 CR3: 000000007c173000 CR4: 00000000000006e0
14:52:25:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
14:52:25:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
14:52:25:Process llog_process_th (pid: 5368, threadinfo ffff88007c1d0000, task ffff8800760ca080)
14:52:25:Stack:
14:52:25: 000000000000008e 0100000400000020 4fd7ba1600000001 0000000000062549
14:52:29:<0> 000014f800000000 000000af00000000 ffff88000002bb08 ffff88007f910440
14:52:29:<0> ffff88007fa7d480 ffff88007f8217c0 000000000000000c ffff88007f8217e0
14:52:29:Call Trace:
14:52:29: [<ffffffff8115e9c2>] cache_alloc_refill+0x202/0x240
14:52:29: [<ffffffffa043bbf0>] ? cfs_alloc+0x30/0x60 [libcfs]
14:52:29: [<ffffffff8115f6e9>] __kmalloc+0x1a9/0x220
14:52:29: [<ffffffffa043bbf0>] cfs_alloc+0x30/0x60 [libcfs]
14:52:29: [<ffffffffa0b6a94e>] filter_common_setup+0xde/0x13f0 [obdfilter]
14:52:29: [<ffffffffa0b6c280>] filter_setup+0x620/0xa20 [obdfilter]
14:52:29: [<ffffffffa059d974>] obd_setup+0x1b4/0x2f0 [obdclass]
14:52:29: [<ffffffffa058939b>] ? class_new_export+0x73b/0x970 [obdclass]
14:52:29: [<ffffffffa059dcb8>] class_setup+0x208/0x890 [obdclass]
14:52:29: [<ffffffffa05a4e1c>] class_process_config+0xbec/0x1c20 [obdclass]
14:52:29: [<ffffffffa043bbf0>] ? cfs_alloc+0x30/0x60 [libcfs]
14:52:29: [<ffffffffa059f623>] ? lustre_cfg_new+0x353/0x7e0 [obdclass]
14:52:29: [<ffffffffa05a6efb>] class_config_llog_handler+0x9bb/0x1610 [obdclass]
14:52:29: [<ffffffffa05765b0>] ? llog_lvfs_next_block+0x2d0/0x650 [obdclass]
14:52:29: [<ffffffffa0570940>] ? llog_process_thread+0x0/0xd00 [obdclass]
14:52:31: [<ffffffffa05711c8>] llog_process_thread+0x888/0xd00 [obdclass]
14:52:31: [<ffffffffa0570940>] ? llog_process_thread+0x0/0xd00 [obdclass]
14:52:31: [<ffffffff8100c14a>] child_rip+0xa/0x20
14:52:31: [<ffffffffa0570940>] ? llog_process_thread+0x0/0xd00 [obdclass]
14:52:31: [<ffffffffa0570940>] ? llog_process_thread+0x0/0xd00 [obdclass]
14:52:31: [<ffffffff8100c140>] ? child_rip+0x0/0x20
14:52:31:Code: 0f 1f 84 00 00 00 00 00 49 8d 54 24 30 48 c7 c0 fc ff ff ff 48 89 55 c8 e9 e1 fe ff ff 0f 0b eb fe ba 01 00 00 00 e9 2a fe ff ff <0f> 0b eb fe 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 41 56 
14:52:31:RIP  [<ffffffff8115e7b3>] cache_grow+0x313/0x320
14:52:31: RSP <ffff88007c1d1990>
14:52:31:---[ end trace edf3800813592bec ]---
14:52:31:Kernel panic - not syncing: Fatal exception
14:52:31:Pid: 5368, comm: llog_process_th Tainted: G      D    ----------------   2.6.32-220.17.1.el6_lustre.x86_64 #1
14:52:31:Call Trace:
14:52:31: [<ffffffff814eccea>] ? panic+0x78/0x143
14:52:31: [<ffffffff814f0e84>] ? oops_end+0xe4/0x100
14:52:31: [<ffffffff8100f26b>] ? die+0x5b/0x90
14:52:31: [<ffffffff814f0754>] ? do_trap+0xc4/0x160
14:52:32: [<ffffffff8100ce35>] ? do_invalid_op+0x95/0xb0
14:52:32: [<ffffffff8115e7b3>] ? cache_grow+0x313/0x320
14:52:32: [<ffffffff81275b56>] ? vsnprintf+0x2b6/0x5f0
14:52:32: [<ffffffff8100bedb>] ? invalid_op+0x1b/0x20
14:52:32: [<ffffffff8115e7b3>] ? cache_grow+0x313/0x320
14:52:32: [<ffffffff8115e9c2>] ? cache_alloc_refill+0x202/0x240
14:52:32: [<ffffffffa043bbf0>] ? cfs_alloc+0x30/0x60 [libcfs]
14:52:32: [<ffffffff8115f6e9>] ? __kmalloc+0x1a9/0x220
14:52:32: [<ffffffffa043bbf0>] ? cfs_alloc+0x30/0x60 [libcfs]
14:52:32: [<ffffffffa0b6a94e>] ? filter_common_setup+0xde/0x13f0 [obdfilter]
14:52:32: [<ffffffffa0b6c280>] ? filter_setup+0x620/0xa20 [obdfilter]
14:52:32: [<ffffffffa059d974>] ? obd_setup+0x1b4/0x2f0 [obdclass]
14:52:32: [<ffffffffa058939b>] ? class_new_export+0x73b/0x970 [obdclass]
14:52:32: [<ffffffffa059dcb8>] ? class_setup+0x208/0x890 [obdclass]
14:52:32: [<ffffffffa05a4e1c>] ? class_process_config+0xbec/0x1c20 [obdclass]
14:52:32: [<ffffffffa043bbf0>] ? cfs_alloc+0x30/0x60 [libcfs]
14:52:32: [<ffffffffa059f623>] ? lustre_cfg_new+0x353/0x7e0 [obdclass]
14:52:32: [<ffffffffa05a6efb>] ? class_config_llog_handler+0x9bb/0x1610 [obdclass]
14:52:33: [<ffffffffa05765b0>] ? llog_lvfs_next_block+0x2d0/0x650 [obdclass]
14:52:33: [<ffffffffa0570940>] ? llog_process_thread+0x0/0xd00 [obdclass]
14:52:33: [<ffffffffa05711c8>] ? llog_process_thread+0x888/0xd00 [obdclass]
14:52:33: [<ffffffffa0570940>] ? llog_process_thread+0x0/0xd00 [obdclass]
14:52:33: [<ffffffff8100c14a>] ? child_rip+0xa/0x20
14:52:33: [<ffffffffa0570940>] ? llog_process_thread+0x0/0xd00 [obdclass]
14:52:33: [<ffffffffa0570940>] ? llog_process_thread+0x0/0xd00 [obdclass]
14:52:33: [<ffffffff8100c140>] ? child_rip+0x0/0x20


 Comments   
Comment by Li Wei (Inactive) [ 12/Jun/12 ]

This is the same kernel assertion with the one in LU-1510, though the stacks differ.

Comment by Andreas Dilger [ 12/Jun/12 ]

This looks like kernel memory corruption, possibly due to double free, or use of freed memory. Are there any patches which have landed that appear suspicious? It might make sense to start submitting some of the recent builds back to autotest to see where the problem was first introduced.

Comment by Andreas Dilger [ 15/Jun/12 ]

This is randomly affecting a large fraction of tests, so I'm bumping this to a blocker.

All of the test failures appear to be on the OSS while doing initial configuration, so it is very likely some patch that has landed to that code recently.

Comment by Andreas Dilger [ 15/Jun/12 ]

It appears that the first test failure is seen in:

https://maloo.whamcloud.com/test_sets/5219062a-b3e1-11e1-a2dd-52540035b04c

This is running a "full" test on git commit 220c6d92200792c318f5248e8e44dbcfc712c788 "LU-1402 libcfs: CFS_ALLOC_HIGH is __GFP_HIGHMEM", but I don't see many users of CFS_ALLOC_HIGHMEM, so I'm not sure this is the culprit. That said, it would have to be one of the patches committed shortly before this one, or we would have seen the failure earlier (about 1-in-5 tests fail).

Comment by nasf (Inactive) [ 16/Jun/12 ]

The reason is found. The failure was triggered here:

lustre/obdfilter/filter.c
static int filter_iobuf_pool_init(struct filter_obd *filter)
{
        void **pool;

        ENTRY;


===>        OBD_ALLOC_GFP(filter->fo_iobuf_pool, OSS_THREADS_MAX * sizeof(*pool),
                      GFP_KERNEL);
        if (filter->fo_iobuf_pool == NULL)
                RETURN(-ENOMEM);

        filter->fo_iobuf_count = OSS_THREADS_MAX;

        RETURN(0);
}

mm/slab.c
static int cache_grow(struct kmem_cache *cachep,
                gfp_t flags, int nodeid, void *objp)
{

        struct slab *slabp;
        size_t offset;
        gfp_t local_flags;
        struct kmem_list3 *l3;

        /*
         * Be lazy and only check for valid flags here,  keeping it out of the
         * critical path in kmem_cache_alloc().
         */
===>        BUG_ON(flags & GFP_SLAB_BUG_MASK);
...
}

include/linux/gfp.h
===> #define __GFP_HIGHMEM   ((__force gfp_t)0x02u)
...
===> #define __GFP_WAIT      ((__force gfp_t)0x10u)  /* Can wait and reschedule? */
#define __GFP_HIGH      ((__force gfp_t)0x20u)  /* Should access emergency pools? */
===> #define __GFP_IO        ((__force gfp_t)0x40u)  /* Can start physical IO? */
===> #define __GFP_FS        ((__force gfp_t)0x80u)  /* Can call down to low-level FS? */
...
===> #define GFP_KERNEL      (__GFP_WAIT | __GFP_IO | __GFP_FS)
...
/* Do not use these with a slab allocator */
#define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK)

We expect "GFP_KERNEL" to be transferred into kernel slab allocator. Unfortunately, the flag "GFP_KERNEL" is converted by Lustre code as following:

libcfs/include/libcfs/libcfs.h
/*
 * Universal memory allocator API
 */
enum cfs_alloc_flags {
        /* allocation is not allowed to block */
        CFS_ALLOC_ATOMIC = 0x1,
        /* allocation is allowed to block */
        CFS_ALLOC_WAIT   = 0x2,
        /* allocation should return zeroed memory */
        CFS_ALLOC_ZERO   = 0x4,
        /* allocation is allowed to call file-system code to free/clean
         * memory */
        CFS_ALLOC_FS     = 0x8,
        /* allocation is allowed to do io to free/clean memory */
===>        CFS_ALLOC_IO     = 0x10,
        /* don't report allocation failure to the console */
        CFS_ALLOC_NOWARN = 0x20,
        /* standard allocator flag combination */
        CFS_ALLOC_STD    = CFS_ALLOC_FS | CFS_ALLOC_IO,
        CFS_ALLOC_USER   = CFS_ALLOC_WAIT | CFS_ALLOC_FS | CFS_ALLOC_IO,
};              
        
/* flags for cfs_page_alloc() in addition to enum cfs_alloc_flags */
enum cfs_alloc_page_flags {
        /* allow to return page beyond KVM. It has to be mapped into KVM by
         * cfs_kmap() and unmapped with cfs_kunmap(). */
===>        CFS_ALLOC_HIGHMEM  = 0x40,
        CFS_ALLOC_HIGHUSER = CFS_ALLOC_WAIT | CFS_ALLOC_FS | CFS_ALLOC_IO |
                             CFS_ALLOC_HIGHMEM,
};      

libcfs/libcfs/linux/linux-mem.c
static unsigned int cfs_alloc_flags_to_gfp(u_int32_t flags)
{       
        unsigned int mflags = 0;
        
        if (flags & CFS_ALLOC_ATOMIC)
                mflags |= __GFP_HIGH;
        else
                mflags |= __GFP_WAIT;
        if (flags & CFS_ALLOC_NOWARN)
                mflags |= __GFP_NOWARN;
===>        if (flags & CFS_ALLOC_IO)
                mflags |= __GFP_IO;
        if (flags & CFS_ALLOC_FS)
                mflags |= __GFP_FS;
===>        if (flags & CFS_ALLOC_HIGHMEM)
                mflags |= __GFP_HIGHMEM;
        return mflags;
}

Firstly, "GFP_KERNEL" was explained as "__GFP_WAIT | __GFP_IO | __GFP_FS", means "0x10 | 0x40 | 0x80".
Then they are regarded as Lustre flags "CFS_ALLOC_IO | CFS_ALLOC_HIGHMEM | 0x80".
And then, after the conversion of "cfs_alloc_flags_to_gfp()", they become "_GFP_IO | __GFP_HIGHMEM". Those are the real flags transferred to kernel slab allocator. But the "_GFP_HIGHMEM" is NOT what we expected.

So LU-1402 patch triggered the existing issues. I will make patch to fix it.

Comment by nasf (Inactive) [ 16/Jun/12 ]

This is the patch:

http://review.whamcloud.com/#change,3118

Comment by Andreas Dilger [ 16/Jun/12 ]

Patch landed.

Comment by Ian Colle (Inactive) [ 18/Jun/12 ]

Patch landed.

Generated at Sat Feb 10 01:17:18 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.