[LU-8435] LBUG (osc_cache.c:1290:osc_completion()) ASSERTION( equi(page->cp_state == CPS_PAGEIN, cmd == OBD_BRW_READ) ) - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.11.0
Affects Version/s: Lustre 2.7.0
Labels:
- cea
- p4b
Environment:
Bull Lustre distribution based on Lustre 2.7.2

Severity:
3
Epic:
- client
Rank (Obsolete):
9223372036854775807

Description

In the last month one of our customer hit more than 100 times a crash with the following signature:

[506626.555125] SLUB: Unable to allocate memory on node -1 (gfp=0x80c0)
[506626.562216]   cache: kvm_mmu_page_header(22:step_batch), object size: 168,
buffer size: 168, default order: 1, min order: 0
[506626.574729]   node 0: slabs: 0, objs: 0, free: 0
[506626.579974]   node 1: slabs: 0, objs: 0, free: 0
[506626.585219]   node 2: slabs: 60, objs: 2880, free: 0
[506626.590852]   node 3: slabs: 0, objs: 0, free: 0
[506626.596112] LustreError: 41604:0:(osc_cache.c:1290:osc_completion())
ASSERTION( equi(page->cp_state == CPS_PAGEIN, cmd == OBD_BRW_READ) ) failed:
cp_state:0, cmd:1
[506626.612512] LustreError: 41604:0:(osc_cache.c:1290:osc_completion()) LBUG
[506626.620186] Pid: 41604, comm: cat
[506626.623978]
                Call Trace:
[506626.628573]  [<ffffffffa05eb853>] libcfs_debug_dumpstack+0x53/0x80
[libcfs]
[506626.636448]  [<ffffffffa05ebdf5>] lbug_with_loc+0x45/0xc0 [libcfs]
[506626.643456]  [<ffffffffa0dea859>] osc_ap_completion.isra.30+0x4d9/0x5b0
[osc]
[506626.651526]  [<ffffffffa0df558d>] osc_queue_sync_pages+0x2dd/0x350 [osc]
[506626.659108]  [<ffffffffa0de750f>] osc_io_submit+0x42f/0x530 [osc]
[506626.666037]  [<ffffffffa086fbd6>] cl_io_submit_rw+0x66/0x170 [obdclass]
[506626.673531]  [<ffffffffa0b8d257>] lov_io_submit+0x2a7/0x420 [lov]
[506626.680450]  [<ffffffffa086fbd6>] cl_io_submit_rw+0x66/0x170 [obdclass]
[506626.687961]  [<ffffffffa0c67f70>] ll_readpage+0x2d0/0x560 [lustre]
[506626.694964]  [<ffffffff8116af87>] generic_file_aio_read+0x3b7/0x750
[506626.702078]  [<ffffffffa0c98485>] vvp_io_read_start+0x3c5/0x470 [lustre]
[506626.709674]  [<ffffffffa086f965>] cl_io_start+0x65/0x130 [obdclass]
[506626.716785]  [<ffffffffa0872f85>] cl_io_loop+0xa5/0x190 [obdclass]
[506626.723797]  [<ffffffffa0c34e8c>] ll_file_io_generic+0x5fc/0xae0 [lustre]
[506626.731477]  [<ffffffffa0c35db2>] ll_file_aio_read+0x192/0x530 [lustre]
[506626.738962]  [<ffffffffa0c3621b>] ll_file_read+0xcb/0x1e0 [lustre]
[506626.745962]  [<ffffffff811dea1c>] vfs_read+0x9c/0x170
[506626.751700]  [<ffffffff811df56f>] SyS_read+0x7f/0xe0
[506626.757345]  [<ffffffff81646889>] system_call_fastpath+0x16/0x1b
[506626.764138]
[506626.765990] Kernel panic - not syncing: LBUG
[506626.770850] CPU: 53 PID: 41604 Comm: cat Tainted: G           OE 
------------   3.10.0-327.22.2.el7.x86_64 #1
[506626.782104] Hardware name: BULL bullx blade/CHPU, BIOS BIOSX07.037.01.003
10/23/2015
[506626.790838]  ffffffffa0610ced 000000000f6a3070 ffff8817799eb8c0
ffffffff816360f4
[506626.799228]  ffff8817799eb940 ffffffff8162f96a ffffffff00000008
ffff8817799eb950
[506626.807618]  ffff8817799eb8f0 000000000f6a3070 ffffffffa0e01466
0000000000000246
[506626.816005] Call Trace:
[506626.818839]  [<ffffffff816360f4>] dump_stack+0x19/0x1b
[506626.824668]  [<ffffffff8162f96a>] panic+0xd8/0x1e7
[506626.830128]  [<ffffffffa05ebe5b>] lbug_with_loc+0xab/0xc0 [libcfs]
[506626.837129]  [<ffffffffa0dea859>] osc_ap_completion.isra.30+0x4d9/0x5b0
[osc]
[506626.845192]  [<ffffffffa0df558d>] osc_queue_sync_pages+0x2dd/0x350 [osc]
[506626.852766]  [<ffffffffa0de750f>] osc_io_submit+0x42f/0x530 [osc]
[506626.859702]  [<ffffffffa086fbd6>] cl_io_submit_rw+0x66/0x170 [obdclass]
[506626.867184]  [<ffffffffa0b8d257>] lov_io_submit+0x2a7/0x420 [lov]
[506626.874099]  [<ffffffffa086fbd6>] cl_io_submit_rw+0x66/0x170 [obdclass]
[506626.881611]  [<ffffffffa0c67f70>] ll_readpage+0x2d0/0x560 [lustre]
[506626.888609]  [<ffffffff8116af87>] generic_file_aio_read+0x3b7/0x750
[506626.895721]  [<ffffffffa0c98485>] vvp_io_read_start+0x3c5/0x470 [lustre]
[506626.903322]  [<ffffffffa086f965>] cl_io_start+0x65/0x130 [obdclass]
[506626.910418]  [<ffffffffa0872f85>] cl_io_loop+0xa5/0x190 [obdclass]
[506626.917420]  [<ffffffffa0c34e8c>] ll_file_io_generic+0x5fc/0xae0 [lustre]
[506626.925091]  [<ffffffffa0c35db2>] ll_file_aio_read+0x192/0x530 [lustre]
[506626.932575]  [<ffffffffa0c3621b>] ll_file_read+0xcb/0x1e0 [lustre]
[506626.939569]  [<ffffffff811dea1c>] vfs_read+0x9c/0x170
[506626.945300]  [<ffffffff811df56f>] SyS_read+0x7f/0xe0
[506626.950938]  [<ffffffff81646889>] system_call_fastpath+0x16/0x1b

The customer being a black site, we can't provide the crashdump, but will happily provide any text output you would find useful.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

crash_output.txt
24 kB
25/Jul/16 2:59 PM
foreach_bt_merge.txt
152 kB
25/Jul/16 2:59 PM
struct_analyze1.txt
50 kB
25/Jul/16 2:59 PM

Issue Links

is related to

LU-9966 sanity test_411: fail to trigger a memory allocation error

Resolved

is related to

LU-6215 Sync Lustre external tree with lustre linux kernel client

Resolved

Activity

[LU-8435] LBUG (osc_cache.c:1290:osc_completion()) ASSERTION( equi(page->cp_state == CPS_PAGEIN, cmd == OBD_BRW_READ) )

Bruno Faccini (Inactive) added a comment - 05/Sep/17 10:03 AM - edited

Aurelien, I know that you posted this new test to ensure memcg limit do not cause crashes in Lustre code, but with this new kind of crash in Kernel/memcg layer, it seems you should also propose it for the kernel regression tests suite!!

My first crash-dumps analysis results poins to a possible race during memcg lazy registration to current kmem_caches and concurrent Slab allocations, that triggers the unexpected situation, in __memcg_kmem_get_cache(), where memcg_params has still not been initialized in "ptlrpc_cache" kmem_cache.

It is also interesting to note that recent auto-tests results of sanity/tests_411 are all success, when all these crashes have occurred during single-node sessions, and that the only kmem_cache in the system that do not have memcg_params initialized are those that have been created in Lustre code.

More to come.

Bruno Faccini (Inactive) added a comment - 05/Sep/17 10:03 AM - edited Aurelien, I know that you posted this new test to ensure memcg limit do not cause crashes in Lustre code, but with this new kind of crash in Kernel/memcg layer, it seems you should also propose it for the kernel regression tests suite!! My first crash-dumps analysis results poins to a possible race during memcg lazy registration to current kmem_caches and concurrent Slab allocations, that triggers the unexpected situation, in __memcg_kmem_get_cache(), where memcg_params has still not been initialized in "ptlrpc_cache" kmem_cache. It is also interesting to note that recent auto-tests results of sanity/tests_411 are all success, when all these crashes have occurred during single-node sessions, and that the only kmem_cache in the system that do not have memcg_params initialized are those that have been created in Lustre code. More to come.

Patrick Farrell (Inactive) added a comment - 04/Sep/17 1:06 PM

On that note, Aurelien, I think we should add a write component to the test after the memory limit is set... Or perhaps a separate test. But either way - write under pressure would be good to have as well.

Patrick Farrell (Inactive) added a comment - 04/Sep/17 1:06 PM On that note, Aurelien, I think we should add a write component to the test after the memory limit is set... Or perhaps a separate test. But either way - write under pressure would be good to have as well.

Aurelien Degremont (Inactive) added a comment - 04/Sep/17 7:50 AM

Bruno, this was exactly the purpose of this test. It seems it discover other memory management issues in client code. I/O is not really expected to succeed under such constraints, but only returing EIO or ENOMEM, not crashing

Aurelien Degremont (Inactive) added a comment - 04/Sep/17 7:50 AM Bruno, this was exactly the purpose of this test. It seems it discover other memory management issues in client code. I/O is not really expected to succeed under such constraints, but only returing EIO or ENOMEM, not crashing

Oleg Drokin added a comment - 04/Sep/17 4:39 AM

Ok, thanks.
I had 4 more failures in the past 24 hours, btw.

The crashdumps are on onyx-68 in /export/crashdumps.
they are:
192.168.123.199-2017-09-01-10:34:*
192.168.123.111-2017-09-02-15:06:*
192.168.123.195-2017-09-03-13:*
192.168.123.151-2017-09-03-14:06:*
192.168.123.135-2017-09-03-14:11:*

build tree is currently in /export/centos7-nfsroot/home/green/git/lustre-release with all the modules (I'll update it on Tuesday ,but should be good for the next 30 or so hours).

Oleg Drokin added a comment - 04/Sep/17 4:39 AM Ok, thanks. I had 4 more failures in the past 24 hours, btw. The crashdumps are on onyx-68 in /export/crashdumps. they are: 192.168.123.199-2017-09-01-10:34:* 192.168.123.111-2017-09-02-15:06:* 192.168.123.195-2017-09-03-13:* 192.168.123.151-2017-09-03-14:06:* 192.168.123.135-2017-09-03-14:11:* build tree is currently in /export/centos7-nfsroot/home/green/git/lustre-release with all the modules (I'll update it on Tuesday ,but should be good for the next 30 or so hours).

Bruno Faccini (Inactive) added a comment - 02/Sep/17 10:46 PM

Oleg,
my guess is that this new sub-test sanity/test_411, introduced by change #21745, is setting a highly constraining Kernel memory limit that is very likely to trigger some memcg/slab bug.
But I am ok to have a look to the crash dump to try to confirm.

Bruno Faccini (Inactive) added a comment - 02/Sep/17 10:46 PM Oleg, my guess is that this new sub-test sanity/test_411, introduced by change #21745, is setting a highly constraining Kernel memory limit that is very likely to trigger some memcg/slab bug. But I am ok to have a look to the crash dump to try to confirm.

Oleg Drokin added a comment - 01/Sep/17 4:40 PM

Hm I just hada failure in a test introduced by this patch:

[38199.302263] Lustre: DEBUG MARKER: == sanity test 411: Slab allocation error with cgroup does not LBUG ================================== 10:34:27 (1504276467)
[38212.118675] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[38212.120795] IP: [<ffffffff811dbb04>] __memcg_kmem_get_cache+0xe4/0x220
[38212.121489] PGD 310c0a067 PUD 28e92c067 PMD 0 
[38212.122192] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[38212.122849] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_zfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) brd ext4 mbcache loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate jbd2 syscopyarea sysfillrect ata_generic sysimgblt pata_acpi ttm drm_kms_helper ata_piix drm i2c_piix4 libata serio_raw virtio_balloon pcspkr virtio_console i2c_core virtio_blk floppy nfsd ip_tables rpcsec_gss_krb5 [last unloaded: libcfs]
[38212.145920] CPU: 2 PID: 31539 Comm: dd Tainted: P        W  OE  ------------   3.10.0-debug #2
[38212.147177] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[38212.147821] task: ffff8802f2bf4800 ti: ffff880294f20000 task.ti: ffff880294f20000
[38212.152755] RIP: 0010:[<ffffffff811dbb04>]  [<ffffffff811dbb04>] __memcg_kmem_get_cache+0xe4/0x220
[38212.153730] RSP: 0018:ffff880294f237f0  EFLAGS: 00010286
[38212.154194] RAX: 0000000000000000 RBX: ffff8803232c5c40 RCX: 0000000000000002
[38212.154672] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000246
[38212.155168] RBP: ffff880294f23810 R08: 0000000000000000 R09: 0000000000000000
[38212.155647] R10: 0000000000000000 R11: 0000000200000007 R12: ffff8802f2bf4800
[38212.156134] R13: ffff88031f6a6000 R14: ffff8803232c5c40 R15: ffff8803232c5c40
[38212.156898] FS:  00007f1f35a4e740(0000) GS:ffff88033e440000(0000) knlGS:0000000000000000
[38212.159271] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[38212.159923] CR2: 0000000000000008 CR3: 00000002f011d000 CR4: 00000000000006e0
[38212.160625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[38212.161320] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[38212.163273] Stack:
[38212.163852]  ffffffff811dba68 0000000000008050 ffff8802c59a5000 ffff8802a991ee00
[38212.165119]  ffff880294f238a0 ffffffff811cca5c ffffffffa0570615 ffffc9000ab51000
[38212.166468]  ffff880200000127 ffffffffa05a5547 ffff88028b683e80 ffff8803232c5c40
[38212.168537] Call Trace:
[38212.169340]  [<ffffffff811dba68>] ? __memcg_kmem_get_cache+0x48/0x220
[38212.170547]  [<ffffffff811cca5c>] kmem_cache_alloc+0x1ec/0x640
[38212.171879]  [<ffffffffa0570615>] ? ldlm_resource_putref+0x75/0x400 [ptlrpc]
[38212.172659]  [<ffffffffa05a5547>] ? ptlrpc_request_cache_alloc+0x27/0x110 [ptlrpc]
[38212.174145]  [<ffffffffa07c0f0d>] ? mdc_resource_get_unused+0x14d/0x2a0 [mdc]
[38212.174871]  [<ffffffffa05a5547>] ptlrpc_request_cache_alloc+0x27/0x110 [ptlrpc]
[38212.177273]  [<ffffffffa05a5655>] ptlrpc_request_alloc_internal+0x25/0x480 [ptlrpc]
[38212.178618]  [<ffffffffa05a5ac3>] ptlrpc_request_alloc+0x13/0x20 [ptlrpc]
[38212.179440]  [<ffffffffa07c6a60>] mdc_enqueue_base+0x6c0/0x18a0 [mdc]
[38212.180168]  [<ffffffffa07c845b>] mdc_intent_lock+0x26b/0x520 [mdc]
[38212.180869]  [<ffffffffa161dad0>] ? ll_invalidate_negative_children+0x1e0/0x1e0 [lustre]
[38212.182291]  [<ffffffffa0584ab0>] ? ldlm_expired_completion_wait+0x240/0x240 [ptlrpc]
[38212.183569]  [<ffffffffa079723d>] lmv_intent_lock+0xc0d/0x1b50 [lmv]
[38212.184289]  [<ffffffff810ac3c1>] ? in_group_p+0x31/0x40
[38212.184941]  [<ffffffffa161e5c5>] ? ll_i2suppgid+0x15/0x40 [lustre]
[38212.185667]  [<ffffffffa161e614>] ? ll_i2gids+0x24/0xb0 [lustre]
[38212.186372]  [<ffffffff811073d2>] ? from_kgid+0x12/0x20
[38212.187062]  [<ffffffffa1609275>] ? ll_prep_md_op_data+0x235/0x520 [lustre]
[38212.187754]  [<ffffffffa161dad0>] ? ll_invalidate_negative_children+0x1e0/0x1e0 [lustre]
[38212.190244]  [<ffffffffa161fd34>] ll_lookup_it+0x2a4/0xef0 [lustre]
[38212.190918]  [<ffffffffa1620ab7>] ll_atomic_open+0x137/0x12d0 [lustre]
[38212.191636]  [<ffffffff817063d7>] ? _raw_spin_unlock+0x27/0x40
[38212.192425]  [<ffffffff811f82fb>] ? lookup_dcache+0x8b/0xb0
[38212.193270]  [<ffffffff811fd551>] do_last+0xa21/0x12b0
[38212.194603]  [<ffffffff811fdea2>] path_openat+0xc2/0x4a0
[38212.195481]  [<ffffffff811ff69b>] do_filp_open+0x4b/0xb0
[38212.196351]  [<ffffffff817063d7>] ? _raw_spin_unlock+0x27/0x40
[38212.197169]  [<ffffffff8120d137>] ? __alloc_fd+0xa7/0x130
[38212.197815]  [<ffffffff811ec553>] do_sys_open+0xf3/0x1f0
[38212.198506]  [<ffffffff811ec66e>] SyS_open+0x1e/0x20
[38212.199225]  [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b
[38212.199896] Code: 01 00 00 41 f6 85 10 03 00 00 03 0f 84 f6 00 00 00 4d 85 ed 48 c7 c2 ff ff ff ff 74 07 49 63 95 98 06 00 00 48 8b 83 e0 00 00 00 <4c> 8b 64 d0 08 4d 85 e4 0f 85 d1 00 00 00 41 f6 45 10 01 0f 84 
[38212.202617] RIP  [<ffffffff811dbb04>] __memcg_kmem_get_cache+0xe4/0x220
[38212.203345]  RSP <ffff880294f237f0>

I have a crashdump if anybody is interested.

Oleg Drokin added a comment - 01/Sep/17 4:40 PM Hm I just hada failure in a test introduced by this patch: [38199.302263] Lustre: DEBUG MARKER: == sanity test 411: Slab allocation error with cgroup does not LBUG ================================== 10:34:27 (1504276467) [38212.118675] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [38212.120795] IP: [<ffffffff811dbb04>] __memcg_kmem_get_cache+0xe4/0x220 [38212.121489] PGD 310c0a067 PUD 28e92c067 PMD 0 [38212.122192] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [38212.122849] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_zfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) brd ext4 mbcache loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate jbd2 syscopyarea sysfillrect ata_generic sysimgblt pata_acpi ttm drm_kms_helper ata_piix drm i2c_piix4 libata serio_raw virtio_balloon pcspkr virtio_console i2c_core virtio_blk floppy nfsd ip_tables rpcsec_gss_krb5 [last unloaded: libcfs] [38212.145920] CPU: 2 PID: 31539 Comm: dd Tainted: P W OE ------------ 3.10.0-debug #2 [38212.147177] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [38212.147821] task: ffff8802f2bf4800 ti: ffff880294f20000 task.ti: ffff880294f20000 [38212.152755] RIP: 0010:[<ffffffff811dbb04>] [<ffffffff811dbb04>] __memcg_kmem_get_cache+0xe4/0x220 [38212.153730] RSP: 0018:ffff880294f237f0 EFLAGS: 00010286 [38212.154194] RAX: 0000000000000000 RBX: ffff8803232c5c40 RCX: 0000000000000002 [38212.154672] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000246 [38212.155168] RBP: ffff880294f23810 R08: 0000000000000000 R09: 0000000000000000 [38212.155647] R10: 0000000000000000 R11: 0000000200000007 R12: ffff8802f2bf4800 [38212.156134] R13: ffff88031f6a6000 R14: ffff8803232c5c40 R15: ffff8803232c5c40 [38212.156898] FS: 00007f1f35a4e740(0000) GS:ffff88033e440000(0000) knlGS:0000000000000000 [38212.159271] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [38212.159923] CR2: 0000000000000008 CR3: 00000002f011d000 CR4: 00000000000006e0 [38212.160625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [38212.161320] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [38212.163273] Stack: [38212.163852] ffffffff811dba68 0000000000008050 ffff8802c59a5000 ffff8802a991ee00 [38212.165119] ffff880294f238a0 ffffffff811cca5c ffffffffa0570615 ffffc9000ab51000 [38212.166468] ffff880200000127 ffffffffa05a5547 ffff88028b683e80 ffff8803232c5c40 [38212.168537] Call Trace: [38212.169340] [<ffffffff811dba68>] ? __memcg_kmem_get_cache+0x48/0x220 [38212.170547] [<ffffffff811cca5c>] kmem_cache_alloc+0x1ec/0x640 [38212.171879] [<ffffffffa0570615>] ? ldlm_resource_putref+0x75/0x400 [ptlrpc] [38212.172659] [<ffffffffa05a5547>] ? ptlrpc_request_cache_alloc+0x27/0x110 [ptlrpc] [38212.174145] [<ffffffffa07c0f0d>] ? mdc_resource_get_unused+0x14d/0x2a0 [mdc] [38212.174871] [<ffffffffa05a5547>] ptlrpc_request_cache_alloc+0x27/0x110 [ptlrpc] [38212.177273] [<ffffffffa05a5655>] ptlrpc_request_alloc_internal+0x25/0x480 [ptlrpc] [38212.178618] [<ffffffffa05a5ac3>] ptlrpc_request_alloc+0x13/0x20 [ptlrpc] [38212.179440] [<ffffffffa07c6a60>] mdc_enqueue_base+0x6c0/0x18a0 [mdc] [38212.180168] [<ffffffffa07c845b>] mdc_intent_lock+0x26b/0x520 [mdc] [38212.180869] [<ffffffffa161dad0>] ? ll_invalidate_negative_children+0x1e0/0x1e0 [lustre] [38212.182291] [<ffffffffa0584ab0>] ? ldlm_expired_completion_wait+0x240/0x240 [ptlrpc] [38212.183569] [<ffffffffa079723d>] lmv_intent_lock+0xc0d/0x1b50 [lmv] [38212.184289] [<ffffffff810ac3c1>] ? in_group_p+0x31/0x40 [38212.184941] [<ffffffffa161e5c5>] ? ll_i2suppgid+0x15/0x40 [lustre] [38212.185667] [<ffffffffa161e614>] ? ll_i2gids+0x24/0xb0 [lustre] [38212.186372] [<ffffffff811073d2>] ? from_kgid+0x12/0x20 [38212.187062] [<ffffffffa1609275>] ? ll_prep_md_op_data+0x235/0x520 [lustre] [38212.187754] [<ffffffffa161dad0>] ? ll_invalidate_negative_children+0x1e0/0x1e0 [lustre] [38212.190244] [<ffffffffa161fd34>] ll_lookup_it+0x2a4/0xef0 [lustre] [38212.190918] [<ffffffffa1620ab7>] ll_atomic_open+0x137/0x12d0 [lustre] [38212.191636] [<ffffffff817063d7>] ? _raw_spin_unlock+0x27/0x40 [38212.192425] [<ffffffff811f82fb>] ? lookup_dcache+0x8b/0xb0 [38212.193270] [<ffffffff811fd551>] do_last+0xa21/0x12b0 [38212.194603] [<ffffffff811fdea2>] path_openat+0xc2/0x4a0 [38212.195481] [<ffffffff811ff69b>] do_filp_open+0x4b/0xb0 [38212.196351] [<ffffffff817063d7>] ? _raw_spin_unlock+0x27/0x40 [38212.197169] [<ffffffff8120d137>] ? __alloc_fd+0xa7/0x130 [38212.197815] [<ffffffff811ec553>] do_sys_open+0xf3/0x1f0 [38212.198506] [<ffffffff811ec66e>] SyS_open+0x1e/0x20 [38212.199225] [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b [38212.199896] Code: 01 00 00 41 f6 85 10 03 00 00 03 0f 84 f6 00 00 00 4d 85 ed 48 c7 c2 ff ff ff ff 74 07 49 63 95 98 06 00 00 48 8b 83 e0 00 00 00 <4c> 8b 64 d0 08 4d 85 e4 0f 85 d1 00 00 00 41 f6 45 10 01 0f 84 [38212.202617] RIP [<ffffffff811dbb04>] __memcg_kmem_get_cache+0xe4/0x220 [38212.203345] RSP <ffff880294f237f0> I have a crashdump if anybody is interested.

Peter Jones added a comment - 31/Aug/17 7:25 PM

Landed for 2.11

Peter Jones added a comment - 31/Aug/17 7:25 PM Landed for 2.11

Gerrit Updater added a comment - 31/Aug/17 7:15 PM

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/21745/
Subject: ~~LU-8435~~ tests: slab alloc error does not LBUG
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 15dac618aabf2d5611a280bce13ca79c673f4f6d

Gerrit Updater added a comment - 31/Aug/17 7:15 PM Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/21745/ Subject: LU-8435 tests: slab alloc error does not LBUG Project: fs/lustre-release Branch: master Current Patch Set: Commit: 15dac618aabf2d5611a280bce13ca79c673f4f6d

Peter Jones added a comment - 23/Apr/17 4:27 AM

Yes I meant the testing patch

Peter Jones added a comment - 23/Apr/17 4:27 AM Yes I meant the testing patch

James A Simmons added a comment - 22/Apr/17 5:57 PM - edited

Peter the original fix https://review.whamcloud.com/#/c/13956 has already landed to master. I think this is safe to close. Or do you mean https://review.whamcloud.com/#/c/21745 ?

James A Simmons added a comment - 22/Apr/17 5:57 PM - edited Peter the original fix https://review.whamcloud.com/#/c/13956 has already landed to master. I think this is safe to close. Or do you mean https://review.whamcloud.com/#/c/21745 ?

Peter Jones added a comment - 20/Apr/17 3:23 PM

I think that we need the ticket to remain open until the original patch has landed to master

Peter Jones added a comment - 20/Apr/17 3:23 PM I think that we need the ticket to remain open until the original patch has landed to master

People

Assignee:: Bruno Faccini (Inactive)

Reporter:: Sebastien Piechurski

Votes:: 0 Vote for this issue

Watchers:: 16 Start watching this issue

Dates

Created:: 25/Jul/16 2:59 PM

Updated:: 11/Nov/17 8:15 PM

Resolved:: 31/Aug/17 7:25 PM