[LU-8435] LBUG (osc_cache.c:1290:osc_completion()) ASSERTION( equi(page->cp_state == CPS_PAGEIN, cmd == OBD_BRW_READ) ) - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.11.0
Affects Version/s: Lustre 2.7.0
Labels:
- cea
- p4b
Environment:
Bull Lustre distribution based on Lustre 2.7.2

Severity:
3
Epic:
- client
Rank (Obsolete):
9223372036854775807

Description

In the last month one of our customer hit more than 100 times a crash with the following signature:

[506626.555125] SLUB: Unable to allocate memory on node -1 (gfp=0x80c0)
[506626.562216]   cache: kvm_mmu_page_header(22:step_batch), object size: 168,
buffer size: 168, default order: 1, min order: 0
[506626.574729]   node 0: slabs: 0, objs: 0, free: 0
[506626.579974]   node 1: slabs: 0, objs: 0, free: 0
[506626.585219]   node 2: slabs: 60, objs: 2880, free: 0
[506626.590852]   node 3: slabs: 0, objs: 0, free: 0
[506626.596112] LustreError: 41604:0:(osc_cache.c:1290:osc_completion())
ASSERTION( equi(page->cp_state == CPS_PAGEIN, cmd == OBD_BRW_READ) ) failed:
cp_state:0, cmd:1
[506626.612512] LustreError: 41604:0:(osc_cache.c:1290:osc_completion()) LBUG
[506626.620186] Pid: 41604, comm: cat
[506626.623978]
                Call Trace:
[506626.628573]  [<ffffffffa05eb853>] libcfs_debug_dumpstack+0x53/0x80
[libcfs]
[506626.636448]  [<ffffffffa05ebdf5>] lbug_with_loc+0x45/0xc0 [libcfs]
[506626.643456]  [<ffffffffa0dea859>] osc_ap_completion.isra.30+0x4d9/0x5b0
[osc]
[506626.651526]  [<ffffffffa0df558d>] osc_queue_sync_pages+0x2dd/0x350 [osc]
[506626.659108]  [<ffffffffa0de750f>] osc_io_submit+0x42f/0x530 [osc]
[506626.666037]  [<ffffffffa086fbd6>] cl_io_submit_rw+0x66/0x170 [obdclass]
[506626.673531]  [<ffffffffa0b8d257>] lov_io_submit+0x2a7/0x420 [lov]
[506626.680450]  [<ffffffffa086fbd6>] cl_io_submit_rw+0x66/0x170 [obdclass]
[506626.687961]  [<ffffffffa0c67f70>] ll_readpage+0x2d0/0x560 [lustre]
[506626.694964]  [<ffffffff8116af87>] generic_file_aio_read+0x3b7/0x750
[506626.702078]  [<ffffffffa0c98485>] vvp_io_read_start+0x3c5/0x470 [lustre]
[506626.709674]  [<ffffffffa086f965>] cl_io_start+0x65/0x130 [obdclass]
[506626.716785]  [<ffffffffa0872f85>] cl_io_loop+0xa5/0x190 [obdclass]
[506626.723797]  [<ffffffffa0c34e8c>] ll_file_io_generic+0x5fc/0xae0 [lustre]
[506626.731477]  [<ffffffffa0c35db2>] ll_file_aio_read+0x192/0x530 [lustre]
[506626.738962]  [<ffffffffa0c3621b>] ll_file_read+0xcb/0x1e0 [lustre]
[506626.745962]  [<ffffffff811dea1c>] vfs_read+0x9c/0x170
[506626.751700]  [<ffffffff811df56f>] SyS_read+0x7f/0xe0
[506626.757345]  [<ffffffff81646889>] system_call_fastpath+0x16/0x1b
[506626.764138]
[506626.765990] Kernel panic - not syncing: LBUG
[506626.770850] CPU: 53 PID: 41604 Comm: cat Tainted: G           OE 
------------   3.10.0-327.22.2.el7.x86_64 #1
[506626.782104] Hardware name: BULL bullx blade/CHPU, BIOS BIOSX07.037.01.003
10/23/2015
[506626.790838]  ffffffffa0610ced 000000000f6a3070 ffff8817799eb8c0
ffffffff816360f4
[506626.799228]  ffff8817799eb940 ffffffff8162f96a ffffffff00000008
ffff8817799eb950
[506626.807618]  ffff8817799eb8f0 000000000f6a3070 ffffffffa0e01466
0000000000000246
[506626.816005] Call Trace:
[506626.818839]  [<ffffffff816360f4>] dump_stack+0x19/0x1b
[506626.824668]  [<ffffffff8162f96a>] panic+0xd8/0x1e7
[506626.830128]  [<ffffffffa05ebe5b>] lbug_with_loc+0xab/0xc0 [libcfs]
[506626.837129]  [<ffffffffa0dea859>] osc_ap_completion.isra.30+0x4d9/0x5b0
[osc]
[506626.845192]  [<ffffffffa0df558d>] osc_queue_sync_pages+0x2dd/0x350 [osc]
[506626.852766]  [<ffffffffa0de750f>] osc_io_submit+0x42f/0x530 [osc]
[506626.859702]  [<ffffffffa086fbd6>] cl_io_submit_rw+0x66/0x170 [obdclass]
[506626.867184]  [<ffffffffa0b8d257>] lov_io_submit+0x2a7/0x420 [lov]
[506626.874099]  [<ffffffffa086fbd6>] cl_io_submit_rw+0x66/0x170 [obdclass]
[506626.881611]  [<ffffffffa0c67f70>] ll_readpage+0x2d0/0x560 [lustre]
[506626.888609]  [<ffffffff8116af87>] generic_file_aio_read+0x3b7/0x750
[506626.895721]  [<ffffffffa0c98485>] vvp_io_read_start+0x3c5/0x470 [lustre]
[506626.903322]  [<ffffffffa086f965>] cl_io_start+0x65/0x130 [obdclass]
[506626.910418]  [<ffffffffa0872f85>] cl_io_loop+0xa5/0x190 [obdclass]
[506626.917420]  [<ffffffffa0c34e8c>] ll_file_io_generic+0x5fc/0xae0 [lustre]
[506626.925091]  [<ffffffffa0c35db2>] ll_file_aio_read+0x192/0x530 [lustre]
[506626.932575]  [<ffffffffa0c3621b>] ll_file_read+0xcb/0x1e0 [lustre]
[506626.939569]  [<ffffffff811dea1c>] vfs_read+0x9c/0x170
[506626.945300]  [<ffffffff811df56f>] SyS_read+0x7f/0xe0
[506626.950938]  [<ffffffff81646889>] system_call_fastpath+0x16/0x1b

The customer being a black site, we can't provide the crashdump, but will happily provide any text output you would find useful.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

crash_output.txt
24 kB
25/Jul/16 2:59 PM
foreach_bt_merge.txt
152 kB
25/Jul/16 2:59 PM
struct_analyze1.txt
50 kB
25/Jul/16 2:59 PM

Issue Links

is related to

LU-9966 sanity test_411: fail to trigger a memory allocation error

Resolved

is related to

LU-6215 Sync Lustre external tree with lustre linux kernel client

Resolved

Activity

[LU-8435] LBUG (osc_cache.c:1290:osc_completion()) ASSERTION( equi(page->cp_state == CPS_PAGEIN, cmd == OBD_BRW_READ) )

Andreas Dilger added a comment - 11/Nov/17 8:15 PM

I think it makes sense to disable cgroup testing if client is running version < 4.4, until we know there is a fix in RHEL and what version it is in.

Andreas Dilger added a comment - 11/Nov/17 8:15 PM I think it makes sense to disable cgroup testing if client is running version < 4.4, until we know there is a fix in RHEL and what version it is in.

Bruno Faccini (Inactive) added a comment - 05/Oct/17 11:02 PM

And here is the answer I got for BZ #197075 !!... :

Kernel memory accounting is totally unusable in 3.10, because it lacks
dcache reclaim and there are a lot of implementation bugs. The one you
caught is just one of many that have been fixed since 3.10. That's why
it stayed disabled by default until 4.x.

So IMHO the best you can do if you really want to use kernel memory
accounting is upgrade to 4.x or backport all related patches.

Bruno Faccini (Inactive) added a comment - 05/Oct/17 11:02 PM And here is the answer I got for BZ #197075 !!... : Kernel memory accounting is totally unusable in 3.10, because it lacks dcache reclaim and there are a lot of implementation bugs. The one you caught is just one of many that have been fixed since 3.10. That's why it stayed disabled by default until 4.x. So IMHO the best you can do if you really want to use kernel memory accounting is upgrade to 4.x or backport all related patches.

Bruno Faccini (Inactive) added a comment - 29/Sep/17 2:29 PM - edited

Hmm, still not a lot of answer to my report at http://www.spinics.net/lists/cgroups/msg18187.html ....

So, I have submitted #197075 bug report to bugzilla.kernel.org, https://bugzilla.kernel.org/show_bug.cgi?id=197075.

Bruno Faccini (Inactive) added a comment - 29/Sep/17 2:29 PM - edited Hmm, still not a lot of answer to my report at http://www.spinics.net/lists/cgroups/msg18187.html .... So, I have submitted #197075 bug report to bugzilla.kernel.org, https://bugzilla.kernel.org/show_bug.cgi?id=197075 .

Bruno Faccini (Inactive) added a comment - 19/Sep/17 10:45 PM

Problem description and 2x patch proposals have been posted/emailed to cgroups@vger.kernel.org.

Bruno Faccini (Inactive) added a comment - 19/Sep/17 10:45 PM Problem description and 2x patch proposals have been posted/emailed to cgroups@vger.kernel.org.

Bruno Faccini (Inactive) added a comment - 12/Sep/17 4:41 PM

Hmm, in fact I think the kernel patch ("memcg: don't call memcg_update_all_caches if new cache id fits") that I have indicated in my just previous comment is not likely to fix finally.

After I have browsed again more recent Kernels source code, problem appears to be fixed in 4.x kernels because now memcg_params is no longer a pointer to but an embedded struct memcg_cache_params inside the kmem_cache! But looks like problem is still in 3.x.

BTW, I have found a possible other/alternate way to fix it with the following patch :

[root@onyx-68 linux-3.10.0-327.22.2.el7-debug]# diff -urN mm/memcontrol.c mm/memcontrol.c.bfi+
--- mm/memcontrol.c     2016-06-09 06:31:12.000000000 -0700
+++ mm/memcontrol.c.bfi+        2017-09-12 09:24:53.235452071 -0700
@@ -3213,7 +3213,7 @@ int memcg_register_cache(struct mem_cgroup *memcg, struct kmem_cache *s,
 {
        size_t size = sizeof(struct memcg_cache_params);
 
-       if (!memcg_kmem_enabled())
+       if (!memcg_kmem_enabled() && memcg_limited_groups_array_size == 0)
                return 0;
 
        if (!memcg)
[root@onyx-68 linux-3.10.0-327.22.2.el7-debug]#

where memcg_register_cache() routine has been changed by memcg_alloc_cache_params() in some 3.x versions but where same fix applies.

I will try to propose either patch to the Kernel mailing list.

Bruno Faccini (Inactive) added a comment - 12/Sep/17 4:41 PM Hmm, in fact I think the kernel patch ("memcg: don't call memcg_update_all_caches if new cache id fits") that I have indicated in my just previous comment is not likely to fix finally. After I have browsed again more recent Kernels source code, problem appears to be fixed in 4.x kernels because now memcg_params is no longer a pointer to but an embedded struct memcg_cache_params inside the kmem_cache! But looks like problem is still in 3.x. BTW, I have found a possible other/alternate way to fix it with the following patch : [root@onyx-68 linux-3.10.0-327.22.2.el7-debug]# diff -urN mm/memcontrol.c mm/memcontrol.c.bfi+ --- mm/memcontrol.c 2016-06-09 06:31:12.000000000 -0700 +++ mm/memcontrol.c.bfi+ 2017-09-12 09:24:53.235452071 -0700 @@ -3213,7 +3213,7 @@ int memcg_register_cache(struct mem_cgroup *memcg, struct kmem_cache *s, { size_t size = sizeof(struct memcg_cache_params); - if (!memcg_kmem_enabled()) + if (!memcg_kmem_enabled() && memcg_limited_groups_array_size == 0) return 0; if (!memcg) [root@onyx-68 linux-3.10.0-327.22.2.el7-debug]# where memcg_register_cache() routine has been changed by memcg_alloc_cache_params() in some 3.x versions but where same fix applies. I will try to propose either patch to the Kernel mailing list.

Bruno Faccini (Inactive) added a comment - 08/Sep/17 11:02 PM

After browsing Kernel sources in kernel.org, I think this issue is fixed in 4.x and 3.x more recent Kernels, mainly with changes that have been introduced in patch/commit "memcg: don't call memcg_update_all_caches if new cache id fits" available at
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/patch/mm/memcontrol.c?id=f3bb3043a092368a255bca5d1c6f4352c96a3b2d, and where ALL kmem_cache are systematically parsed to re-size/initialize a new memcg_params when a kmem mermory-cgroup is being started.

And again, I need to insist that this bug is very unlikely to be triggered if we keep running with SLUB configured instead to switch back to SLAB.

Bruno Faccini (Inactive) added a comment - 08/Sep/17 11:02 PM After browsing Kernel sources in kernel.org, I think this issue is fixed in 4.x and 3.x more recent Kernels, mainly with changes that have been introduced in patch/commit "memcg: don't call memcg_update_all_caches if new cache id fits" available at https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/patch/mm/memcontrol.c?id=f3bb3043a092368a255bca5d1c6f4352c96a3b2d , and where ALL kmem_cache are systematically parsed to re-size/initialize a new memcg_params when a kmem mermory-cgroup is being started. And again, I need to insist that this bug is very unlikely to be triggered if we keep running with SLUB configured instead to switch back to SLAB.

Bruno Faccini (Inactive) added a comment - 08/Sep/17 3:03 PM

This problem should have already been addressed by patch at https://patchwork.kernel.org/patch/2850152/.
But since the RHEL7 Kernel version we have experienced these crashes with already includes it, after digging into the Kernel related source code, I think the following one is required to fully fix this issue :

[root@onyx-68 linux-3.10.0-327.22.2.el7-debug]# diff -urN mm/memcontrol.c mm/memcontrol.c.bfi
--- mm/memcontrol.c     2016-06-09 06:31:12.000000000 -0700
+++ mm/memcontrol.c.bfi 2017-09-08 07:37:18.647281366 -0700
@@ -3163,7 +3163,15 @@ int memcg_update_cache_size(struct kmem_cache *s, int num_groups)
 
        VM_BUG_ON(s->memcg_params && !s->memcg_params->is_root_cache);
 
-       if (num_groups > memcg_limited_groups_array_size) {
+       /* resize/grow existing memcg_params or allocate it if had not already
+        * been done during kmem_cache creation because none of previously used
+        * kmem memcg were present at this time (i.e.
+        * memcg_limited_groups_array_size != 0 but memcg_kmem_enabled()
+        * returned false). It could not have been necessary if
+        * memcg_caches_array_size() was not used to anticipate more than slots
+        * required and if memcg_limited_groups_array_size would simply
+        * increment upon each new kmem memcg creation.
+       if (num_groups > memcg_limited_groups_array_size || !s->memcg_params) {
                int i;
                ssize_t size = memcg_caches_array_size(num_groups);
 
@@ -3203,7 +3211,8 @@
                 * bigger than the others. And all updates will reset this
                 * anyway.
                 */
-               kfree(cur_params);
+               if (cur_params)
+                       kfree(cur_params);
        }
        return 0;
 }
[root@onyx-68 linux-3.10.0-327.22.2.el7-debug]#

Oleg, you may want to try it ? And on my side I will check if this problem is still present in recent Kernels and report it if not.

Bruno Faccini (Inactive) added a comment - 08/Sep/17 3:03 PM This problem should have already been addressed by patch at https://patchwork.kernel.org/patch/2850152/ . But since the RHEL7 Kernel version we have experienced these crashes with already includes it, after digging into the Kernel related source code, I think the following one is required to fully fix this issue : [root@onyx-68 linux-3.10.0-327.22.2.el7-debug]# diff -urN mm/memcontrol.c mm/memcontrol.c.bfi --- mm/memcontrol.c 2016-06-09 06:31:12.000000000 -0700 +++ mm/memcontrol.c.bfi 2017-09-08 07:37:18.647281366 -0700 @@ -3163,7 +3163,15 @@ int memcg_update_cache_size(struct kmem_cache *s, int num_groups) VM_BUG_ON(s->memcg_params && !s->memcg_params->is_root_cache); - if (num_groups > memcg_limited_groups_array_size) { + /* resize/grow existing memcg_params or allocate it if had not already + * been done during kmem_cache creation because none of previously used + * kmem memcg were present at this time (i.e. + * memcg_limited_groups_array_size != 0 but memcg_kmem_enabled() + * returned false). It could not have been necessary if + * memcg_caches_array_size() was not used to anticipate more than slots + * required and if memcg_limited_groups_array_size would simply + * increment upon each new kmem memcg creation. + if (num_groups > memcg_limited_groups_array_size || !s->memcg_params) { int i; ssize_t size = memcg_caches_array_size(num_groups); @@ -3203,7 +3211,8 @@ * bigger than the others. And all updates will reset this * anyway. */ - kfree(cur_params); + if (cur_params) + kfree(cur_params); } return 0; } [root@onyx-68 linux-3.10.0-327.22.2.el7-debug]# Oleg, you may want to try it ? And on my side I will check if this problem is still present in recent Kernels and report it if not.

Bruno Faccini (Inactive) added a comment - 07/Sep/17 9:06 AM - edited

Well, I have been quite puzzled with these crash-dumps, because according to the Kernel source code, the only case where memcg_params could be NULL is when a kmem_cache is being created and !memcg_kmem_enabled() (tested in memcg_register_cache(), this is when no memory group with kmem limit is still active), but then all kmem_cache should have their memcg_params==NULL, and not only the Lustre ones!! I have tested this on a fresh system after reboot with no memory cgroup configured, all kmem_cache, and even the Lustre ones after a delayed Lustre start and modules load, have their memcg_params==NULL.

Finally, I found a possible scenario where this can happen, and this is confirmed by/in all Kernel logs from the crash-dumps, the sanity/test_411 test (likely to be the only one to create+use a memory group), has to be run at least 2 times with Lustre modules unload in between (causing Lustre kmem_cache to be destroyed and re-created).
Then, trying to reproduce by doing so on a test system I was not able to make it since !memcg_kmem_enabled() could never be satisfied because of "child/memcg" kmem_cache (created during 1st sub-test run and memory group creation) still being present, and this even being fully empty/freed. But this could be where the SLAB vs SLUB config should have made the difference, because it appears that SLAB mechanism include a regular kmem_cache shrink/drain cleanup that could have caused these kind of leaked kmem_cache to be garbage collected, when this never (or may be a lot later) could happen running with SLUB.

But anyway, the fact that memcg_params could be NULL for a recently created kmem_cache when no kmem memory cgroup is still active (but some had already existed before!), needs to be handled by current code. I will check if this is already the case in recent Kernels, and push a fix if not.

Oleg, this looks like a new argument to confirm the need to test Kernels with "unsupported" configs!

Bruno Faccini (Inactive) added a comment - 07/Sep/17 9:06 AM - edited Well, I have been quite puzzled with these crash-dumps, because according to the Kernel source code, the only case where memcg_params could be NULL is when a kmem_cache is being created and !memcg_kmem_enabled() (tested in memcg_register_cache(), this is when no memory group with kmem limit is still active), but then all kmem_cache should have their memcg_params==NULL, and not only the Lustre ones!! I have tested this on a fresh system after reboot with no memory cgroup configured, all kmem_cache, and even the Lustre ones after a delayed Lustre start and modules load, have their memcg_params==NULL. Finally, I found a possible scenario where this can happen, and this is confirmed by/in all Kernel logs from the crash-dumps, the sanity/test_411 test (likely to be the only one to create+use a memory group), has to be run at least 2 times with Lustre modules unload in between (causing Lustre kmem_cache to be destroyed and re-created). Then, trying to reproduce by doing so on a test system I was not able to make it since !memcg_kmem_enabled() could never be satisfied because of "child/memcg" kmem_cache (created during 1st sub-test run and memory group creation) still being present, and this even being fully empty/freed. But this could be where the SLAB vs SLUB config should have made the difference, because it appears that SLAB mechanism include a regular kmem_cache shrink/drain cleanup that could have caused these kind of leaked kmem_cache to be garbage collected, when this never (or may be a lot later) could happen running with SLUB. But anyway, the fact that memcg_params could be NULL for a recently created kmem_cache when no kmem memory cgroup is still active (but some had already existed before!), needs to be handled by current code. I will check if this is already the case in recent Kernels, and push a fix if not. Oleg, this looks like a new argument to confirm the need to test Kernels with "unsupported" configs!

Oleg Drokin added a comment - 05/Sep/17 3:23 PM

noet that I am running unsupported redhat kernel config with a lot of debugging and I also enable SLAB, where as redhat defaults to SLUB.

This does not make the bugs any less real of course, but at times it takes some time to convince the RH people that it's ok to have unsupported config because it just helps to detect the bugs earlier vs random crashes elsewhere later.

Oleg Drokin added a comment - 05/Sep/17 3:23 PM noet that I am running unsupported redhat kernel config with a lot of debugging and I also enable SLAB, where as redhat defaults to SLUB. This does not make the bugs any less real of course, but at times it takes some time to convince the RH people that it's ok to have unsupported config because it just helps to detect the bugs earlier vs random crashes elsewhere later.

Bruno Faccini (Inactive) added a comment - 05/Sep/17 10:03 AM - edited

Aurelien, I know that you posted this new test to ensure memcg limit do not cause crashes in Lustre code, but with this new kind of crash in Kernel/memcg layer, it seems you should also propose it for the kernel regression tests suite!!

My first crash-dumps analysis results poins to a possible race during memcg lazy registration to current kmem_caches and concurrent Slab allocations, that triggers the unexpected situation, in __memcg_kmem_get_cache(), where memcg_params has still not been initialized in "ptlrpc_cache" kmem_cache.

It is also interesting to note that recent auto-tests results of sanity/tests_411 are all success, when all these crashes have occurred during single-node sessions, and that the only kmem_cache in the system that do not have memcg_params initialized are those that have been created in Lustre code.

More to come.

Bruno Faccini (Inactive) added a comment - 05/Sep/17 10:03 AM - edited Aurelien, I know that you posted this new test to ensure memcg limit do not cause crashes in Lustre code, but with this new kind of crash in Kernel/memcg layer, it seems you should also propose it for the kernel regression tests suite!! My first crash-dumps analysis results poins to a possible race during memcg lazy registration to current kmem_caches and concurrent Slab allocations, that triggers the unexpected situation, in __memcg_kmem_get_cache(), where memcg_params has still not been initialized in "ptlrpc_cache" kmem_cache. It is also interesting to note that recent auto-tests results of sanity/tests_411 are all success, when all these crashes have occurred during single-node sessions, and that the only kmem_cache in the system that do not have memcg_params initialized are those that have been created in Lustre code. More to come.

People

Assignee:: Bruno Faccini (Inactive)

Reporter:: Sebastien Piechurski

Votes:: 0 Vote for this issue

Watchers:: 16 Start watching this issue

Dates

Created:: 25/Jul/16 2:59 PM

Updated:: 11/Nov/17 8:15 PM

Resolved:: 31/Aug/17 7:25 PM