[LU-8435] LBUG (osc_cache.c:1290:osc_completion()) ASSERTION( equi(page->cp_state == CPS_PAGEIN, cmd == OBD_BRW_READ) ) - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.11.0
Affects Version/s: Lustre 2.7.0
Labels:
- cea
- p4b
Environment:
Bull Lustre distribution based on Lustre 2.7.2

Severity:
3
Epic:
- client
Rank (Obsolete):
9223372036854775807

Description

In the last month one of our customer hit more than 100 times a crash with the following signature:

[506626.555125] SLUB: Unable to allocate memory on node -1 (gfp=0x80c0)
[506626.562216]   cache: kvm_mmu_page_header(22:step_batch), object size: 168,
buffer size: 168, default order: 1, min order: 0
[506626.574729]   node 0: slabs: 0, objs: 0, free: 0
[506626.579974]   node 1: slabs: 0, objs: 0, free: 0
[506626.585219]   node 2: slabs: 60, objs: 2880, free: 0
[506626.590852]   node 3: slabs: 0, objs: 0, free: 0
[506626.596112] LustreError: 41604:0:(osc_cache.c:1290:osc_completion())
ASSERTION( equi(page->cp_state == CPS_PAGEIN, cmd == OBD_BRW_READ) ) failed:
cp_state:0, cmd:1
[506626.612512] LustreError: 41604:0:(osc_cache.c:1290:osc_completion()) LBUG
[506626.620186] Pid: 41604, comm: cat
[506626.623978]
                Call Trace:
[506626.628573]  [<ffffffffa05eb853>] libcfs_debug_dumpstack+0x53/0x80
[libcfs]
[506626.636448]  [<ffffffffa05ebdf5>] lbug_with_loc+0x45/0xc0 [libcfs]
[506626.643456]  [<ffffffffa0dea859>] osc_ap_completion.isra.30+0x4d9/0x5b0
[osc]
[506626.651526]  [<ffffffffa0df558d>] osc_queue_sync_pages+0x2dd/0x350 [osc]
[506626.659108]  [<ffffffffa0de750f>] osc_io_submit+0x42f/0x530 [osc]
[506626.666037]  [<ffffffffa086fbd6>] cl_io_submit_rw+0x66/0x170 [obdclass]
[506626.673531]  [<ffffffffa0b8d257>] lov_io_submit+0x2a7/0x420 [lov]
[506626.680450]  [<ffffffffa086fbd6>] cl_io_submit_rw+0x66/0x170 [obdclass]
[506626.687961]  [<ffffffffa0c67f70>] ll_readpage+0x2d0/0x560 [lustre]
[506626.694964]  [<ffffffff8116af87>] generic_file_aio_read+0x3b7/0x750
[506626.702078]  [<ffffffffa0c98485>] vvp_io_read_start+0x3c5/0x470 [lustre]
[506626.709674]  [<ffffffffa086f965>] cl_io_start+0x65/0x130 [obdclass]
[506626.716785]  [<ffffffffa0872f85>] cl_io_loop+0xa5/0x190 [obdclass]
[506626.723797]  [<ffffffffa0c34e8c>] ll_file_io_generic+0x5fc/0xae0 [lustre]
[506626.731477]  [<ffffffffa0c35db2>] ll_file_aio_read+0x192/0x530 [lustre]
[506626.738962]  [<ffffffffa0c3621b>] ll_file_read+0xcb/0x1e0 [lustre]
[506626.745962]  [<ffffffff811dea1c>] vfs_read+0x9c/0x170
[506626.751700]  [<ffffffff811df56f>] SyS_read+0x7f/0xe0
[506626.757345]  [<ffffffff81646889>] system_call_fastpath+0x16/0x1b
[506626.764138]
[506626.765990] Kernel panic - not syncing: LBUG
[506626.770850] CPU: 53 PID: 41604 Comm: cat Tainted: G           OE 
------------   3.10.0-327.22.2.el7.x86_64 #1
[506626.782104] Hardware name: BULL bullx blade/CHPU, BIOS BIOSX07.037.01.003
10/23/2015
[506626.790838]  ffffffffa0610ced 000000000f6a3070 ffff8817799eb8c0
ffffffff816360f4
[506626.799228]  ffff8817799eb940 ffffffff8162f96a ffffffff00000008
ffff8817799eb950
[506626.807618]  ffff8817799eb8f0 000000000f6a3070 ffffffffa0e01466
0000000000000246
[506626.816005] Call Trace:
[506626.818839]  [<ffffffff816360f4>] dump_stack+0x19/0x1b
[506626.824668]  [<ffffffff8162f96a>] panic+0xd8/0x1e7
[506626.830128]  [<ffffffffa05ebe5b>] lbug_with_loc+0xab/0xc0 [libcfs]
[506626.837129]  [<ffffffffa0dea859>] osc_ap_completion.isra.30+0x4d9/0x5b0
[osc]
[506626.845192]  [<ffffffffa0df558d>] osc_queue_sync_pages+0x2dd/0x350 [osc]
[506626.852766]  [<ffffffffa0de750f>] osc_io_submit+0x42f/0x530 [osc]
[506626.859702]  [<ffffffffa086fbd6>] cl_io_submit_rw+0x66/0x170 [obdclass]
[506626.867184]  [<ffffffffa0b8d257>] lov_io_submit+0x2a7/0x420 [lov]
[506626.874099]  [<ffffffffa086fbd6>] cl_io_submit_rw+0x66/0x170 [obdclass]
[506626.881611]  [<ffffffffa0c67f70>] ll_readpage+0x2d0/0x560 [lustre]
[506626.888609]  [<ffffffff8116af87>] generic_file_aio_read+0x3b7/0x750
[506626.895721]  [<ffffffffa0c98485>] vvp_io_read_start+0x3c5/0x470 [lustre]
[506626.903322]  [<ffffffffa086f965>] cl_io_start+0x65/0x130 [obdclass]
[506626.910418]  [<ffffffffa0872f85>] cl_io_loop+0xa5/0x190 [obdclass]
[506626.917420]  [<ffffffffa0c34e8c>] ll_file_io_generic+0x5fc/0xae0 [lustre]
[506626.925091]  [<ffffffffa0c35db2>] ll_file_aio_read+0x192/0x530 [lustre]
[506626.932575]  [<ffffffffa0c3621b>] ll_file_read+0xcb/0x1e0 [lustre]
[506626.939569]  [<ffffffff811dea1c>] vfs_read+0x9c/0x170
[506626.945300]  [<ffffffff811df56f>] SyS_read+0x7f/0xe0
[506626.950938]  [<ffffffff81646889>] system_call_fastpath+0x16/0x1b

The customer being a black site, we can't provide the crashdump, but will happily provide any text output you would find useful.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

crash_output.txt
24 kB
25/Jul/16 2:59 PM
foreach_bt_merge.txt
152 kB
25/Jul/16 2:59 PM
struct_analyze1.txt
50 kB
25/Jul/16 2:59 PM

Issue Links

is related to

LU-9966 sanity test_411: fail to trigger a memory allocation error

Resolved

is related to

LU-6215 Sync Lustre external tree with lustre linux kernel client

Resolved

LBUG (osc_cache.c:1290:osc_completion()) ASSERTION( equi(page->cp_state == CPS_PAGEIN, cmd == OBD_BRW_READ) )

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates