Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8435

LBUG (osc_cache.c:1290:osc_completion()) ASSERTION( equi(page->cp_state == CPS_PAGEIN, cmd == OBD_BRW_READ) )

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.11.0
    • Lustre 2.7.0
    • Bull Lustre distribution based on Lustre 2.7.2
    • 3
    • 9223372036854775807

    Description

      In the last month one of our customer hit more than 100 times a crash with the following signature:

      [506626.555125] SLUB: Unable to allocate memory on node -1 (gfp=0x80c0)
      [506626.562216]   cache: kvm_mmu_page_header(22:step_batch), object size: 168,
      buffer size: 168, default order: 1, min order: 0
      [506626.574729]   node 0: slabs: 0, objs: 0, free: 0
      [506626.579974]   node 1: slabs: 0, objs: 0, free: 0
      [506626.585219]   node 2: slabs: 60, objs: 2880, free: 0
      [506626.590852]   node 3: slabs: 0, objs: 0, free: 0
      [506626.596112] LustreError: 41604:0:(osc_cache.c:1290:osc_completion())
      ASSERTION( equi(page->cp_state == CPS_PAGEIN, cmd == OBD_BRW_READ) ) failed:
      cp_state:0, cmd:1
      [506626.612512] LustreError: 41604:0:(osc_cache.c:1290:osc_completion()) LBUG
      [506626.620186] Pid: 41604, comm: cat
      [506626.623978]
                      Call Trace:
      [506626.628573]  [<ffffffffa05eb853>] libcfs_debug_dumpstack+0x53/0x80
      [libcfs]
      [506626.636448]  [<ffffffffa05ebdf5>] lbug_with_loc+0x45/0xc0 [libcfs]
      [506626.643456]  [<ffffffffa0dea859>] osc_ap_completion.isra.30+0x4d9/0x5b0
      [osc]
      [506626.651526]  [<ffffffffa0df558d>] osc_queue_sync_pages+0x2dd/0x350 [osc]
      [506626.659108]  [<ffffffffa0de750f>] osc_io_submit+0x42f/0x530 [osc]
      [506626.666037]  [<ffffffffa086fbd6>] cl_io_submit_rw+0x66/0x170 [obdclass]
      [506626.673531]  [<ffffffffa0b8d257>] lov_io_submit+0x2a7/0x420 [lov]
      [506626.680450]  [<ffffffffa086fbd6>] cl_io_submit_rw+0x66/0x170 [obdclass]
      [506626.687961]  [<ffffffffa0c67f70>] ll_readpage+0x2d0/0x560 [lustre]
      [506626.694964]  [<ffffffff8116af87>] generic_file_aio_read+0x3b7/0x750
      [506626.702078]  [<ffffffffa0c98485>] vvp_io_read_start+0x3c5/0x470 [lustre]
      [506626.709674]  [<ffffffffa086f965>] cl_io_start+0x65/0x130 [obdclass]
      [506626.716785]  [<ffffffffa0872f85>] cl_io_loop+0xa5/0x190 [obdclass]
      [506626.723797]  [<ffffffffa0c34e8c>] ll_file_io_generic+0x5fc/0xae0 [lustre]
      [506626.731477]  [<ffffffffa0c35db2>] ll_file_aio_read+0x192/0x530 [lustre]
      [506626.738962]  [<ffffffffa0c3621b>] ll_file_read+0xcb/0x1e0 [lustre]
      [506626.745962]  [<ffffffff811dea1c>] vfs_read+0x9c/0x170
      [506626.751700]  [<ffffffff811df56f>] SyS_read+0x7f/0xe0
      [506626.757345]  [<ffffffff81646889>] system_call_fastpath+0x16/0x1b
      [506626.764138]
      [506626.765990] Kernel panic - not syncing: LBUG
      [506626.770850] CPU: 53 PID: 41604 Comm: cat Tainted: G           OE 
      ------------   3.10.0-327.22.2.el7.x86_64 #1
      [506626.782104] Hardware name: BULL bullx blade/CHPU, BIOS BIOSX07.037.01.003
      10/23/2015
      [506626.790838]  ffffffffa0610ced 000000000f6a3070 ffff8817799eb8c0
      ffffffff816360f4
      [506626.799228]  ffff8817799eb940 ffffffff8162f96a ffffffff00000008
      ffff8817799eb950
      [506626.807618]  ffff8817799eb8f0 000000000f6a3070 ffffffffa0e01466
      0000000000000246
      [506626.816005] Call Trace:
      [506626.818839]  [<ffffffff816360f4>] dump_stack+0x19/0x1b
      [506626.824668]  [<ffffffff8162f96a>] panic+0xd8/0x1e7
      [506626.830128]  [<ffffffffa05ebe5b>] lbug_with_loc+0xab/0xc0 [libcfs]
      [506626.837129]  [<ffffffffa0dea859>] osc_ap_completion.isra.30+0x4d9/0x5b0
      [osc]
      [506626.845192]  [<ffffffffa0df558d>] osc_queue_sync_pages+0x2dd/0x350 [osc]
      [506626.852766]  [<ffffffffa0de750f>] osc_io_submit+0x42f/0x530 [osc]
      [506626.859702]  [<ffffffffa086fbd6>] cl_io_submit_rw+0x66/0x170 [obdclass]
      [506626.867184]  [<ffffffffa0b8d257>] lov_io_submit+0x2a7/0x420 [lov]
      [506626.874099]  [<ffffffffa086fbd6>] cl_io_submit_rw+0x66/0x170 [obdclass]
      [506626.881611]  [<ffffffffa0c67f70>] ll_readpage+0x2d0/0x560 [lustre]
      [506626.888609]  [<ffffffff8116af87>] generic_file_aio_read+0x3b7/0x750
      [506626.895721]  [<ffffffffa0c98485>] vvp_io_read_start+0x3c5/0x470 [lustre]
      [506626.903322]  [<ffffffffa086f965>] cl_io_start+0x65/0x130 [obdclass]
      [506626.910418]  [<ffffffffa0872f85>] cl_io_loop+0xa5/0x190 [obdclass]
      [506626.917420]  [<ffffffffa0c34e8c>] ll_file_io_generic+0x5fc/0xae0 [lustre]
      [506626.925091]  [<ffffffffa0c35db2>] ll_file_aio_read+0x192/0x530 [lustre]
      [506626.932575]  [<ffffffffa0c3621b>] ll_file_read+0xcb/0x1e0 [lustre]
      [506626.939569]  [<ffffffff811dea1c>] vfs_read+0x9c/0x170
      [506626.945300]  [<ffffffff811df56f>] SyS_read+0x7f/0xe0
      [506626.950938]  [<ffffffff81646889>] system_call_fastpath+0x16/0x1b
      

      The customer being a black site, we can't provide the crashdump, but will happily provide any text output you would find useful.

      Attachments

        1. crash_output.txt
          24 kB
        2. foreach_bt_merge.txt
          152 kB
        3. struct_analyze1.txt
          50 kB

        Issue Links

          Activity

            [LU-8435] LBUG (osc_cache.c:1290:osc_completion()) ASSERTION( equi(page->cp_state == CPS_PAGEIN, cmd == OBD_BRW_READ) )
            bfaccini Bruno Faccini (Inactive) added a comment - - edited

            Aurelien, I know that you posted this new test to ensure memcg limit do not cause crashes in Lustre code, but with this new kind of crash in Kernel/memcg layer, it seems you should also propose it for the kernel regression tests suite!!

            My first crash-dumps analysis results poins to a possible race during memcg lazy registration to current kmem_caches and concurrent Slab allocations, that triggers the unexpected situation, in __memcg_kmem_get_cache(), where memcg_params has still not been initialized in "ptlrpc_cache" kmem_cache.

            It is also interesting to note that recent auto-tests results of sanity/tests_411 are all success, when all these crashes have occurred during single-node sessions, and that the only kmem_cache in the system that do not have memcg_params initialized are those that have been created in Lustre code.

            More to come.

            bfaccini Bruno Faccini (Inactive) added a comment - - edited Aurelien, I know that you posted this new test to ensure memcg limit do not cause crashes in Lustre code, but with this new kind of crash in Kernel/memcg layer, it seems you should also propose it for the kernel regression tests suite!! My first crash-dumps analysis results poins to a possible race during memcg lazy registration to current kmem_caches and concurrent Slab allocations, that triggers the unexpected situation, in __memcg_kmem_get_cache(), where memcg_params has still not been initialized in "ptlrpc_cache" kmem_cache. It is also interesting to note that recent auto-tests results of sanity/tests_411 are all success, when all these crashes have occurred during single-node sessions, and that the only kmem_cache in the system that do not have memcg_params initialized are those that have been created in Lustre code. More to come.

            On that note, Aurelien, I think we should add a write component to the test after the memory limit is set... Or perhaps a separate test. But either way - write under pressure would be good to have as well.

            paf Patrick Farrell (Inactive) added a comment - On that note, Aurelien, I think we should add a write component to the test after the memory limit is set... Or perhaps a separate test. But either way - write under pressure would be good to have as well.

            Bruno, this was exactly the purpose of this test. It seems it discover other memory management issues in client code. I/O is not really expected to succeed under such constraints, but only returing EIO or ENOMEM, not crashing

            adegremont Aurelien Degremont (Inactive) added a comment - Bruno, this was exactly the purpose of this test. It seems it discover other memory management issues in client code. I/O is not really expected to succeed under such constraints, but only returing EIO or ENOMEM, not crashing
            green Oleg Drokin added a comment -

            Ok, thanks.
            I had 4 more failures in the past 24 hours, btw.

            The crashdumps are on onyx-68 in /export/crashdumps.
            they are:
            192.168.123.199-2017-09-01-10:34:*
            192.168.123.111-2017-09-02-15:06:*
            192.168.123.195-2017-09-03-13:*
            192.168.123.151-2017-09-03-14:06:*
            192.168.123.135-2017-09-03-14:11:*

            build tree is currently in /export/centos7-nfsroot/home/green/git/lustre-release with all the modules (I'll update it on Tuesday ,but should be good for the next 30 or so hours).

            green Oleg Drokin added a comment - Ok, thanks. I had 4 more failures in the past 24 hours, btw. The crashdumps are on onyx-68 in /export/crashdumps. they are: 192.168.123.199-2017-09-01-10:34:* 192.168.123.111-2017-09-02-15:06:* 192.168.123.195-2017-09-03-13:* 192.168.123.151-2017-09-03-14:06:* 192.168.123.135-2017-09-03-14:11:* build tree is currently in /export/centos7-nfsroot/home/green/git/lustre-release with all the modules (I'll update it on Tuesday ,but should be good for the next 30 or so hours).

            Oleg,
            my guess is that this new sub-test sanity/test_411, introduced by change #21745, is setting a highly constraining Kernel memory limit that is very likely to trigger some memcg/slab bug.
            But I am ok to have a look to the crash dump to try to confirm.

            bfaccini Bruno Faccini (Inactive) added a comment - Oleg, my guess is that this new sub-test sanity/test_411, introduced by change #21745, is setting a highly constraining Kernel memory limit that is very likely to trigger some memcg/slab bug. But I am ok to have a look to the crash dump to try to confirm.
            green Oleg Drokin added a comment -

            Hm I just hada failure in a test introduced by this patch:

            [38199.302263] Lustre: DEBUG MARKER: == sanity test 411: Slab allocation error with cgroup does not LBUG ================================== 10:34:27 (1504276467)
            [38212.118675] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
            [38212.120795] IP: [<ffffffff811dbb04>] __memcg_kmem_get_cache+0xe4/0x220
            [38212.121489] PGD 310c0a067 PUD 28e92c067 PMD 0 
            [38212.122192] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
            [38212.122849] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_zfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) brd ext4 mbcache loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate jbd2 syscopyarea sysfillrect ata_generic sysimgblt pata_acpi ttm drm_kms_helper ata_piix drm i2c_piix4 libata serio_raw virtio_balloon pcspkr virtio_console i2c_core virtio_blk floppy nfsd ip_tables rpcsec_gss_krb5 [last unloaded: libcfs]
            [38212.145920] CPU: 2 PID: 31539 Comm: dd Tainted: P        W  OE  ------------   3.10.0-debug #2
            [38212.147177] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
            [38212.147821] task: ffff8802f2bf4800 ti: ffff880294f20000 task.ti: ffff880294f20000
            [38212.152755] RIP: 0010:[<ffffffff811dbb04>]  [<ffffffff811dbb04>] __memcg_kmem_get_cache+0xe4/0x220
            [38212.153730] RSP: 0018:ffff880294f237f0  EFLAGS: 00010286
            [38212.154194] RAX: 0000000000000000 RBX: ffff8803232c5c40 RCX: 0000000000000002
            [38212.154672] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000246
            [38212.155168] RBP: ffff880294f23810 R08: 0000000000000000 R09: 0000000000000000
            [38212.155647] R10: 0000000000000000 R11: 0000000200000007 R12: ffff8802f2bf4800
            [38212.156134] R13: ffff88031f6a6000 R14: ffff8803232c5c40 R15: ffff8803232c5c40
            [38212.156898] FS:  00007f1f35a4e740(0000) GS:ffff88033e440000(0000) knlGS:0000000000000000
            [38212.159271] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
            [38212.159923] CR2: 0000000000000008 CR3: 00000002f011d000 CR4: 00000000000006e0
            [38212.160625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
            [38212.161320] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
            [38212.163273] Stack:
            [38212.163852]  ffffffff811dba68 0000000000008050 ffff8802c59a5000 ffff8802a991ee00
            [38212.165119]  ffff880294f238a0 ffffffff811cca5c ffffffffa0570615 ffffc9000ab51000
            [38212.166468]  ffff880200000127 ffffffffa05a5547 ffff88028b683e80 ffff8803232c5c40
            [38212.168537] Call Trace:
            [38212.169340]  [<ffffffff811dba68>] ? __memcg_kmem_get_cache+0x48/0x220
            [38212.170547]  [<ffffffff811cca5c>] kmem_cache_alloc+0x1ec/0x640
            [38212.171879]  [<ffffffffa0570615>] ? ldlm_resource_putref+0x75/0x400 [ptlrpc]
            [38212.172659]  [<ffffffffa05a5547>] ? ptlrpc_request_cache_alloc+0x27/0x110 [ptlrpc]
            [38212.174145]  [<ffffffffa07c0f0d>] ? mdc_resource_get_unused+0x14d/0x2a0 [mdc]
            [38212.174871]  [<ffffffffa05a5547>] ptlrpc_request_cache_alloc+0x27/0x110 [ptlrpc]
            [38212.177273]  [<ffffffffa05a5655>] ptlrpc_request_alloc_internal+0x25/0x480 [ptlrpc]
            [38212.178618]  [<ffffffffa05a5ac3>] ptlrpc_request_alloc+0x13/0x20 [ptlrpc]
            [38212.179440]  [<ffffffffa07c6a60>] mdc_enqueue_base+0x6c0/0x18a0 [mdc]
            [38212.180168]  [<ffffffffa07c845b>] mdc_intent_lock+0x26b/0x520 [mdc]
            [38212.180869]  [<ffffffffa161dad0>] ? ll_invalidate_negative_children+0x1e0/0x1e0 [lustre]
            [38212.182291]  [<ffffffffa0584ab0>] ? ldlm_expired_completion_wait+0x240/0x240 [ptlrpc]
            [38212.183569]  [<ffffffffa079723d>] lmv_intent_lock+0xc0d/0x1b50 [lmv]
            [38212.184289]  [<ffffffff810ac3c1>] ? in_group_p+0x31/0x40
            [38212.184941]  [<ffffffffa161e5c5>] ? ll_i2suppgid+0x15/0x40 [lustre]
            [38212.185667]  [<ffffffffa161e614>] ? ll_i2gids+0x24/0xb0 [lustre]
            [38212.186372]  [<ffffffff811073d2>] ? from_kgid+0x12/0x20
            [38212.187062]  [<ffffffffa1609275>] ? ll_prep_md_op_data+0x235/0x520 [lustre]
            [38212.187754]  [<ffffffffa161dad0>] ? ll_invalidate_negative_children+0x1e0/0x1e0 [lustre]
            [38212.190244]  [<ffffffffa161fd34>] ll_lookup_it+0x2a4/0xef0 [lustre]
            [38212.190918]  [<ffffffffa1620ab7>] ll_atomic_open+0x137/0x12d0 [lustre]
            [38212.191636]  [<ffffffff817063d7>] ? _raw_spin_unlock+0x27/0x40
            [38212.192425]  [<ffffffff811f82fb>] ? lookup_dcache+0x8b/0xb0
            [38212.193270]  [<ffffffff811fd551>] do_last+0xa21/0x12b0
            [38212.194603]  [<ffffffff811fdea2>] path_openat+0xc2/0x4a0
            [38212.195481]  [<ffffffff811ff69b>] do_filp_open+0x4b/0xb0
            [38212.196351]  [<ffffffff817063d7>] ? _raw_spin_unlock+0x27/0x40
            [38212.197169]  [<ffffffff8120d137>] ? __alloc_fd+0xa7/0x130
            [38212.197815]  [<ffffffff811ec553>] do_sys_open+0xf3/0x1f0
            [38212.198506]  [<ffffffff811ec66e>] SyS_open+0x1e/0x20
            [38212.199225]  [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b
            [38212.199896] Code: 01 00 00 41 f6 85 10 03 00 00 03 0f 84 f6 00 00 00 4d 85 ed 48 c7 c2 ff ff ff ff 74 07 49 63 95 98 06 00 00 48 8b 83 e0 00 00 00 <4c> 8b 64 d0 08 4d 85 e4 0f 85 d1 00 00 00 41 f6 45 10 01 0f 84 
            [38212.202617] RIP  [<ffffffff811dbb04>] __memcg_kmem_get_cache+0xe4/0x220
            [38212.203345]  RSP <ffff880294f237f0>
            

            I have a crashdump if anybody is interested.

            green Oleg Drokin added a comment - Hm I just hada failure in a test introduced by this patch: [38199.302263] Lustre: DEBUG MARKER: == sanity test 411: Slab allocation error with cgroup does not LBUG ================================== 10:34:27 (1504276467) [38212.118675] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [38212.120795] IP: [<ffffffff811dbb04>] __memcg_kmem_get_cache+0xe4/0x220 [38212.121489] PGD 310c0a067 PUD 28e92c067 PMD 0 [38212.122192] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [38212.122849] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_zfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) brd ext4 mbcache loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate jbd2 syscopyarea sysfillrect ata_generic sysimgblt pata_acpi ttm drm_kms_helper ata_piix drm i2c_piix4 libata serio_raw virtio_balloon pcspkr virtio_console i2c_core virtio_blk floppy nfsd ip_tables rpcsec_gss_krb5 [last unloaded: libcfs] [38212.145920] CPU: 2 PID: 31539 Comm: dd Tainted: P W OE ------------ 3.10.0-debug #2 [38212.147177] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [38212.147821] task: ffff8802f2bf4800 ti: ffff880294f20000 task.ti: ffff880294f20000 [38212.152755] RIP: 0010:[<ffffffff811dbb04>] [<ffffffff811dbb04>] __memcg_kmem_get_cache+0xe4/0x220 [38212.153730] RSP: 0018:ffff880294f237f0 EFLAGS: 00010286 [38212.154194] RAX: 0000000000000000 RBX: ffff8803232c5c40 RCX: 0000000000000002 [38212.154672] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000246 [38212.155168] RBP: ffff880294f23810 R08: 0000000000000000 R09: 0000000000000000 [38212.155647] R10: 0000000000000000 R11: 0000000200000007 R12: ffff8802f2bf4800 [38212.156134] R13: ffff88031f6a6000 R14: ffff8803232c5c40 R15: ffff8803232c5c40 [38212.156898] FS: 00007f1f35a4e740(0000) GS:ffff88033e440000(0000) knlGS:0000000000000000 [38212.159271] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [38212.159923] CR2: 0000000000000008 CR3: 00000002f011d000 CR4: 00000000000006e0 [38212.160625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [38212.161320] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [38212.163273] Stack: [38212.163852] ffffffff811dba68 0000000000008050 ffff8802c59a5000 ffff8802a991ee00 [38212.165119] ffff880294f238a0 ffffffff811cca5c ffffffffa0570615 ffffc9000ab51000 [38212.166468] ffff880200000127 ffffffffa05a5547 ffff88028b683e80 ffff8803232c5c40 [38212.168537] Call Trace: [38212.169340] [<ffffffff811dba68>] ? __memcg_kmem_get_cache+0x48/0x220 [38212.170547] [<ffffffff811cca5c>] kmem_cache_alloc+0x1ec/0x640 [38212.171879] [<ffffffffa0570615>] ? ldlm_resource_putref+0x75/0x400 [ptlrpc] [38212.172659] [<ffffffffa05a5547>] ? ptlrpc_request_cache_alloc+0x27/0x110 [ptlrpc] [38212.174145] [<ffffffffa07c0f0d>] ? mdc_resource_get_unused+0x14d/0x2a0 [mdc] [38212.174871] [<ffffffffa05a5547>] ptlrpc_request_cache_alloc+0x27/0x110 [ptlrpc] [38212.177273] [<ffffffffa05a5655>] ptlrpc_request_alloc_internal+0x25/0x480 [ptlrpc] [38212.178618] [<ffffffffa05a5ac3>] ptlrpc_request_alloc+0x13/0x20 [ptlrpc] [38212.179440] [<ffffffffa07c6a60>] mdc_enqueue_base+0x6c0/0x18a0 [mdc] [38212.180168] [<ffffffffa07c845b>] mdc_intent_lock+0x26b/0x520 [mdc] [38212.180869] [<ffffffffa161dad0>] ? ll_invalidate_negative_children+0x1e0/0x1e0 [lustre] [38212.182291] [<ffffffffa0584ab0>] ? ldlm_expired_completion_wait+0x240/0x240 [ptlrpc] [38212.183569] [<ffffffffa079723d>] lmv_intent_lock+0xc0d/0x1b50 [lmv] [38212.184289] [<ffffffff810ac3c1>] ? in_group_p+0x31/0x40 [38212.184941] [<ffffffffa161e5c5>] ? ll_i2suppgid+0x15/0x40 [lustre] [38212.185667] [<ffffffffa161e614>] ? ll_i2gids+0x24/0xb0 [lustre] [38212.186372] [<ffffffff811073d2>] ? from_kgid+0x12/0x20 [38212.187062] [<ffffffffa1609275>] ? ll_prep_md_op_data+0x235/0x520 [lustre] [38212.187754] [<ffffffffa161dad0>] ? ll_invalidate_negative_children+0x1e0/0x1e0 [lustre] [38212.190244] [<ffffffffa161fd34>] ll_lookup_it+0x2a4/0xef0 [lustre] [38212.190918] [<ffffffffa1620ab7>] ll_atomic_open+0x137/0x12d0 [lustre] [38212.191636] [<ffffffff817063d7>] ? _raw_spin_unlock+0x27/0x40 [38212.192425] [<ffffffff811f82fb>] ? lookup_dcache+0x8b/0xb0 [38212.193270] [<ffffffff811fd551>] do_last+0xa21/0x12b0 [38212.194603] [<ffffffff811fdea2>] path_openat+0xc2/0x4a0 [38212.195481] [<ffffffff811ff69b>] do_filp_open+0x4b/0xb0 [38212.196351] [<ffffffff817063d7>] ? _raw_spin_unlock+0x27/0x40 [38212.197169] [<ffffffff8120d137>] ? __alloc_fd+0xa7/0x130 [38212.197815] [<ffffffff811ec553>] do_sys_open+0xf3/0x1f0 [38212.198506] [<ffffffff811ec66e>] SyS_open+0x1e/0x20 [38212.199225] [<ffffffff8170fc49>] system_call_fastpath+0x16/0x1b [38212.199896] Code: 01 00 00 41 f6 85 10 03 00 00 03 0f 84 f6 00 00 00 4d 85 ed 48 c7 c2 ff ff ff ff 74 07 49 63 95 98 06 00 00 48 8b 83 e0 00 00 00 <4c> 8b 64 d0 08 4d 85 e4 0f 85 d1 00 00 00 41 f6 45 10 01 0f 84 [38212.202617] RIP [<ffffffff811dbb04>] __memcg_kmem_get_cache+0xe4/0x220 [38212.203345] RSP <ffff880294f237f0> I have a crashdump if anybody is interested.
            pjones Peter Jones added a comment -

            Landed for 2.11

            pjones Peter Jones added a comment - Landed for 2.11

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/21745/
            Subject: LU-8435 tests: slab alloc error does not LBUG
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 15dac618aabf2d5611a280bce13ca79c673f4f6d

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/21745/ Subject: LU-8435 tests: slab alloc error does not LBUG Project: fs/lustre-release Branch: master Current Patch Set: Commit: 15dac618aabf2d5611a280bce13ca79c673f4f6d
            pjones Peter Jones added a comment -

            Yes I meant the testing patch

            pjones Peter Jones added a comment - Yes I meant the testing patch
            simmonsja James A Simmons added a comment - - edited

            Peter the original fix https://review.whamcloud.com/#/c/13956 has already landed to master. I think this is safe to close. Or do you mean https://review.whamcloud.com/#/c/21745 ?

            simmonsja James A Simmons added a comment - - edited Peter the original fix https://review.whamcloud.com/#/c/13956  has already landed to master. I think this is safe to close. Or do you mean https://review.whamcloud.com/#/c/21745  ?
            pjones Peter Jones added a comment -

            I think that we need the ticket to remain open until the original patch has landed to master

            pjones Peter Jones added a comment - I think that we need the ticket to remain open until the original patch has landed to master

            People

              bfaccini Bruno Faccini (Inactive)
              spiechurski Sebastien Piechurski
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: