[LU-8435] LBUG (osc_cache.c:1290:osc_completion()) ASSERTION( equi(page->cp_state == CPS_PAGEIN, cmd == OBD_BRW_READ) ) - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.11.0
Affects Version/s: Lustre 2.7.0
Labels:
- cea
- p4b
Environment:
Bull Lustre distribution based on Lustre 2.7.2

Severity:
3
Epic:
- client
Rank (Obsolete):
9223372036854775807

Description

In the last month one of our customer hit more than 100 times a crash with the following signature:

[506626.555125] SLUB: Unable to allocate memory on node -1 (gfp=0x80c0)
[506626.562216]   cache: kvm_mmu_page_header(22:step_batch), object size: 168,
buffer size: 168, default order: 1, min order: 0
[506626.574729]   node 0: slabs: 0, objs: 0, free: 0
[506626.579974]   node 1: slabs: 0, objs: 0, free: 0
[506626.585219]   node 2: slabs: 60, objs: 2880, free: 0
[506626.590852]   node 3: slabs: 0, objs: 0, free: 0
[506626.596112] LustreError: 41604:0:(osc_cache.c:1290:osc_completion())
ASSERTION( equi(page->cp_state == CPS_PAGEIN, cmd == OBD_BRW_READ) ) failed:
cp_state:0, cmd:1
[506626.612512] LustreError: 41604:0:(osc_cache.c:1290:osc_completion()) LBUG
[506626.620186] Pid: 41604, comm: cat
[506626.623978]
                Call Trace:
[506626.628573]  [<ffffffffa05eb853>] libcfs_debug_dumpstack+0x53/0x80
[libcfs]
[506626.636448]  [<ffffffffa05ebdf5>] lbug_with_loc+0x45/0xc0 [libcfs]
[506626.643456]  [<ffffffffa0dea859>] osc_ap_completion.isra.30+0x4d9/0x5b0
[osc]
[506626.651526]  [<ffffffffa0df558d>] osc_queue_sync_pages+0x2dd/0x350 [osc]
[506626.659108]  [<ffffffffa0de750f>] osc_io_submit+0x42f/0x530 [osc]
[506626.666037]  [<ffffffffa086fbd6>] cl_io_submit_rw+0x66/0x170 [obdclass]
[506626.673531]  [<ffffffffa0b8d257>] lov_io_submit+0x2a7/0x420 [lov]
[506626.680450]  [<ffffffffa086fbd6>] cl_io_submit_rw+0x66/0x170 [obdclass]
[506626.687961]  [<ffffffffa0c67f70>] ll_readpage+0x2d0/0x560 [lustre]
[506626.694964]  [<ffffffff8116af87>] generic_file_aio_read+0x3b7/0x750
[506626.702078]  [<ffffffffa0c98485>] vvp_io_read_start+0x3c5/0x470 [lustre]
[506626.709674]  [<ffffffffa086f965>] cl_io_start+0x65/0x130 [obdclass]
[506626.716785]  [<ffffffffa0872f85>] cl_io_loop+0xa5/0x190 [obdclass]
[506626.723797]  [<ffffffffa0c34e8c>] ll_file_io_generic+0x5fc/0xae0 [lustre]
[506626.731477]  [<ffffffffa0c35db2>] ll_file_aio_read+0x192/0x530 [lustre]
[506626.738962]  [<ffffffffa0c3621b>] ll_file_read+0xcb/0x1e0 [lustre]
[506626.745962]  [<ffffffff811dea1c>] vfs_read+0x9c/0x170
[506626.751700]  [<ffffffff811df56f>] SyS_read+0x7f/0xe0
[506626.757345]  [<ffffffff81646889>] system_call_fastpath+0x16/0x1b
[506626.764138]
[506626.765990] Kernel panic - not syncing: LBUG
[506626.770850] CPU: 53 PID: 41604 Comm: cat Tainted: G           OE 
------------   3.10.0-327.22.2.el7.x86_64 #1
[506626.782104] Hardware name: BULL bullx blade/CHPU, BIOS BIOSX07.037.01.003
10/23/2015
[506626.790838]  ffffffffa0610ced 000000000f6a3070 ffff8817799eb8c0
ffffffff816360f4
[506626.799228]  ffff8817799eb940 ffffffff8162f96a ffffffff00000008
ffff8817799eb950
[506626.807618]  ffff8817799eb8f0 000000000f6a3070 ffffffffa0e01466
0000000000000246
[506626.816005] Call Trace:
[506626.818839]  [<ffffffff816360f4>] dump_stack+0x19/0x1b
[506626.824668]  [<ffffffff8162f96a>] panic+0xd8/0x1e7
[506626.830128]  [<ffffffffa05ebe5b>] lbug_with_loc+0xab/0xc0 [libcfs]
[506626.837129]  [<ffffffffa0dea859>] osc_ap_completion.isra.30+0x4d9/0x5b0
[osc]
[506626.845192]  [<ffffffffa0df558d>] osc_queue_sync_pages+0x2dd/0x350 [osc]
[506626.852766]  [<ffffffffa0de750f>] osc_io_submit+0x42f/0x530 [osc]
[506626.859702]  [<ffffffffa086fbd6>] cl_io_submit_rw+0x66/0x170 [obdclass]
[506626.867184]  [<ffffffffa0b8d257>] lov_io_submit+0x2a7/0x420 [lov]
[506626.874099]  [<ffffffffa086fbd6>] cl_io_submit_rw+0x66/0x170 [obdclass]
[506626.881611]  [<ffffffffa0c67f70>] ll_readpage+0x2d0/0x560 [lustre]
[506626.888609]  [<ffffffff8116af87>] generic_file_aio_read+0x3b7/0x750
[506626.895721]  [<ffffffffa0c98485>] vvp_io_read_start+0x3c5/0x470 [lustre]
[506626.903322]  [<ffffffffa086f965>] cl_io_start+0x65/0x130 [obdclass]
[506626.910418]  [<ffffffffa0872f85>] cl_io_loop+0xa5/0x190 [obdclass]
[506626.917420]  [<ffffffffa0c34e8c>] ll_file_io_generic+0x5fc/0xae0 [lustre]
[506626.925091]  [<ffffffffa0c35db2>] ll_file_aio_read+0x192/0x530 [lustre]
[506626.932575]  [<ffffffffa0c3621b>] ll_file_read+0xcb/0x1e0 [lustre]
[506626.939569]  [<ffffffff811dea1c>] vfs_read+0x9c/0x170
[506626.945300]  [<ffffffff811df56f>] SyS_read+0x7f/0xe0
[506626.950938]  [<ffffffff81646889>] system_call_fastpath+0x16/0x1b

The customer being a black site, we can't provide the crashdump, but will happily provide any text output you would find useful.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

crash_output.txt
24 kB
25/Jul/16 2:59 PM
foreach_bt_merge.txt
152 kB
25/Jul/16 2:59 PM
struct_analyze1.txt
50 kB
25/Jul/16 2:59 PM

Issue Links

is related to

LU-9966 sanity test_411: fail to trigger a memory allocation error

Resolved

is related to

LU-6215 Sync Lustre external tree with lustre linux kernel client

Resolved

Activity

[LU-8435] LBUG (osc_cache.c:1290:osc_completion()) ASSERTION( equi(page->cp_state == CPS_PAGEIN, cmd == OBD_BRW_READ) )

Bruno Faccini (Inactive) added a comment - 14/Apr/17 11:18 AM

Hello Sebastien,
Patch has been back-ported and has landed on b2_7_fe branch, do you agree to close ?

Bruno Faccini (Inactive) added a comment - 14/Apr/17 11:18 AM Hello Sebastien, Patch has been back-ported and has landed on b2_7_fe branch, do you agree to close ?

Ben Evans (Inactive) added a comment - 05/Aug/16 3:56 PM

I'm running a test on this now. Seems good, no crashes ... yet.

Ben Evans (Inactive) added a comment - 05/Aug/16 3:56 PM I'm running a test on this now. Seems good, no crashes ... yet.

Aurelien Degremont (Inactive) added a comment - 05/Aug/16 1:44 PM

My bad, I miss your last comment . That's the problem working too often with 2.7.2 only, you sometimes miss what was added in master.

I've pushed a reproducer for this in sanity. Do whatever you want with it.

Aurelien Degremont (Inactive) added a comment - 05/Aug/16 1:44 PM My bad, I miss your last comment . That's the problem working too often with 2.7.2 only, you sometimes miss what was added in master. I've pushed a reproducer for this in sanity. Do whatever you want with it.

Gerrit Updater added a comment - 05/Aug/16 1:40 PM

Aurelien Degremont (aurelien.degremont@cea.fr) uploaded a new patch: http://review.whamcloud.com/21745
Subject: ~~LU-8435~~ tests: slab alloc error does not LBUG
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0e8a912a7dfc65d14d1f252efea043637620f518

Gerrit Updater added a comment - 05/Aug/16 1:40 PM Aurelien Degremont (aurelien.degremont@cea.fr) uploaded a new patch: http://review.whamcloud.com/21745 Subject: LU-8435 tests: slab alloc error does not LBUG Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 0e8a912a7dfc65d14d1f252efea043637620f518

Bruno Faccini (Inactive) added a comment - 05/Aug/16 12:37 PM

Well in fact, I just found that this has already been fixed in master by Andreas fix for ~~LU-6215~~ (http://review.whamcloud.com/13956, commit eaba382df06cd6f570b16c5f759f612b057e5324). Thus I will simply back-port it to b2_7!

Bruno Faccini (Inactive) added a comment - 05/Aug/16 12:37 PM Well in fact, I just found that this has already been fixed in master by Andreas fix for LU-6215 ( http://review.whamcloud.com/13956 , commit eaba382df06cd6f570b16c5f759f612b057e5324). Thus I will simply back-port it to b2_7!

Bruno Faccini (Inactive) added a comment - 05/Aug/16 12:28 PM

And use of list_for_each_entry_safe() + list_del() looks like a better one
This will be my first patch attempt to fix this, I will then try to expose it to your reproducer to see how this works then.

Bruno Faccini (Inactive) added a comment - 05/Aug/16 12:28 PM And use of list_for_each_entry_safe() + list_del() looks like a better one This will be my first patch attempt to fix this, I will then try to expose it to your reproducer to see how this works then.

Aurelien Degremont (Inactive) added a comment - 05/Aug/16 11:46 AM

I agree with this analysis. cp_state is CPS_PAGEIN in almost all our crashdumps. Except for 1 or 2 but this could means that the page was freed between the 2 iterations IMO?

list_for_each_entry + list_del_init seems a really bad idea.

Aurelien Degremont (Inactive) added a comment - 05/Aug/16 11:46 AM I agree with this analysis. cp_state is CPS_PAGEIN in almost all our crashdumps. Except for 1 or 2 but this could means that the page was freed between the 2 iterations IMO? list_for_each_entry + list_del_init seems a really bad idea.

Bruno Faccini (Inactive) added a comment - 05/Aug/16 7:44 AM

Well having a better look to the code involved in error path, my guess is that the problem may be caused by the following piece:

2621 int osc_queue_sync_pages(const struct lu_env *env, struct osc_object *obj,
2622                          struct list_head *list, int cmd, int brw_flags)
2623 {
2624         struct client_obd     *cli = osc_cli(obj);
2625         struct osc_extent     *ext;
2626         struct osc_async_page *oap;
2627         int     page_count = 0;
2628         int     mppr       = cli->cl_max_pages_per_rpc;
2629         pgoff_t start      = CL_PAGE_EOF;
2630         pgoff_t end        = 0;
2631         ENTRY;
2632 
2633         list_for_each_entry(oap, list, oap_pending_item) {
2634                 pgoff_t index = osc_index(oap2osc(oap));
2635                 if (index > end)
2636                         end = index;
2637                 if (index < start)
2638                         start = index;
2639                 ++page_count;
2640                 mppr <<= (page_count > mppr);
2641         }
2642 
2643         ext = osc_extent_alloc(obj);   <<<<<<<<< Memory allocation failure can occur here
2644         if (ext == NULL) {
2645                 list_for_each_entry(oap, list, oap_pending_item) {
2646                         list_del_init(&oap->oap_pending_item);     <<<<<<<<<<<<
2647                         osc_ap_completion(env, cli, oap, 0, -ENOMEM);
2648                 }
2649                 RETURN(-ENOMEM);
2650         }

the list_del_init() should cause the current oap to have its next/prev oap_pending_item list pointers to point on itself and thus to be again selected by list-for_each_entry() to be passed for a 2nd call to osc_ap_completion() when its processing has already been done during 1st call to osc_ap_completion() for it, and particularly its associated cl_page->cp_state has already been set to CPS_CACHED in osc_ap_completion()~~>osc_completion()~~>cl_page_completion(), hence the Assert du to no longer CPS_PAGEIN.

I am working on a patch to fix this and to be pushed soon.

Bruno Faccini (Inactive) added a comment - 05/Aug/16 7:44 AM Well having a better look to the code involved in error path, my guess is that the problem may be caused by the following piece: 2621 int osc_queue_sync_pages(const struct lu_env *env, struct osc_object *obj, 2622 struct list_head *list, int cmd, int brw_flags) 2623 { 2624 struct client_obd *cli = osc_cli(obj); 2625 struct osc_extent *ext; 2626 struct osc_async_page *oap; 2627 int page_count = 0; 2628 int mppr = cli->cl_max_pages_per_rpc; 2629 pgoff_t start = CL_PAGE_EOF; 2630 pgoff_t end = 0; 2631 ENTRY; 2632 2633 list_for_each_entry(oap, list, oap_pending_item) { 2634 pgoff_t index = osc_index(oap2osc(oap)); 2635 if (index > end) 2636 end = index; 2637 if (index < start) 2638 start = index; 2639 ++page_count; 2640 mppr <<= (page_count > mppr); 2641 } 2642 2643 ext = osc_extent_alloc(obj); <<<<<<<<< Memory allocation failure can occur here 2644 if (ext == NULL) { 2645 list_for_each_entry(oap, list, oap_pending_item) { 2646 list_del_init(&oap->oap_pending_item); <<<<<<<<<<<< 2647 osc_ap_completion(env, cli, oap, 0, -ENOMEM); 2648 } 2649 RETURN(-ENOMEM); 2650 } the list_del_init() should cause the current oap to have its next/prev oap_pending_item list pointers to point on itself and thus to be again selected by list-for_each_entry() to be passed for a 2nd call to osc_ap_completion() when its processing has already been done during 1st call to osc_ap_completion() for it, and particularly its associated cl_page->cp_state has already been set to CPS_CACHED in osc_ap_completion() >osc_completion() >cl_page_completion(), hence the Assert du to no longer CPS_PAGEIN. I am working on a patch to fix this and to be pushed soon.

Aurelien Degremont (Inactive) added a comment - 01/Aug/16 1:49 PM

ConstrainRAMSpace=yes is the line. That's the same issue than us. If you can afford to turn it off, this should work around the crash.

Aurelien Degremont (Inactive) added a comment - 01/Aug/16 1:49 PM ConstrainRAMSpace=yes is the line. That's the same issue than us. If you can afford to turn it off, this should work around the crash.

Ben Evans (Inactive) added a comment - 01/Aug/16 1:47 PM

the cgroup.conf looks like this:

CgroupAutomount=yes
CgroupMountpoint="/dev"
ConstrainCores=yes
ConstrainRAMSpace=yes
TaskAffinity=no

Ben Evans (Inactive) added a comment - 01/Aug/16 1:47 PM the cgroup.conf looks like this: CgroupAutomount=yes CgroupMountpoint="/dev" ConstrainCores=yes ConstrainRAMSpace=yes TaskAffinity=no

Aurelien Degremont (Inactive) added a comment - 01/Aug/16 1:30 PM

Could you verify memory cgroup is enabled in your SLURM environment? See /etc/slurm/cgroup.conf
SLURM is tuning these files.

https://github.com/SchedMD/slurm/blob/slurm-16.05/src/plugins/task/cgroup/task_cgroup_memory.c#L305

Aurelien Degremont (Inactive) added a comment - 01/Aug/16 1:30 PM Could you verify memory cgroup is enabled in your SLURM environment? See /etc/slurm/cgroup.conf SLURM is tuning these files. https://github.com/SchedMD/slurm/blob/slurm-16.05/src/plugins/task/cgroup/task_cgroup_memory.c#L305

People

Assignee:: Bruno Faccini (Inactive)

Reporter:: Sebastien Piechurski

Votes:: 0 Vote for this issue

Watchers:: 16 Start watching this issue

Dates

Created:: 25/Jul/16 2:59 PM

Updated:: 11/Nov/17 8:15 PM

Resolved:: 31/Aug/17 7:25 PM