Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.7.0
-
Bull Lustre distribution based on Lustre 2.7.2
-
3
-
9223372036854775807
Description
In the last month one of our customer hit more than 100 times a crash with the following signature:
[506626.555125] SLUB: Unable to allocate memory on node -1 (gfp=0x80c0) [506626.562216] cache: kvm_mmu_page_header(22:step_batch), object size: 168, buffer size: 168, default order: 1, min order: 0 [506626.574729] node 0: slabs: 0, objs: 0, free: 0 [506626.579974] node 1: slabs: 0, objs: 0, free: 0 [506626.585219] node 2: slabs: 60, objs: 2880, free: 0 [506626.590852] node 3: slabs: 0, objs: 0, free: 0 [506626.596112] LustreError: 41604:0:(osc_cache.c:1290:osc_completion()) ASSERTION( equi(page->cp_state == CPS_PAGEIN, cmd == OBD_BRW_READ) ) failed: cp_state:0, cmd:1 [506626.612512] LustreError: 41604:0:(osc_cache.c:1290:osc_completion()) LBUG [506626.620186] Pid: 41604, comm: cat [506626.623978] Call Trace: [506626.628573] [<ffffffffa05eb853>] libcfs_debug_dumpstack+0x53/0x80 [libcfs] [506626.636448] [<ffffffffa05ebdf5>] lbug_with_loc+0x45/0xc0 [libcfs] [506626.643456] [<ffffffffa0dea859>] osc_ap_completion.isra.30+0x4d9/0x5b0 [osc] [506626.651526] [<ffffffffa0df558d>] osc_queue_sync_pages+0x2dd/0x350 [osc] [506626.659108] [<ffffffffa0de750f>] osc_io_submit+0x42f/0x530 [osc] [506626.666037] [<ffffffffa086fbd6>] cl_io_submit_rw+0x66/0x170 [obdclass] [506626.673531] [<ffffffffa0b8d257>] lov_io_submit+0x2a7/0x420 [lov] [506626.680450] [<ffffffffa086fbd6>] cl_io_submit_rw+0x66/0x170 [obdclass] [506626.687961] [<ffffffffa0c67f70>] ll_readpage+0x2d0/0x560 [lustre] [506626.694964] [<ffffffff8116af87>] generic_file_aio_read+0x3b7/0x750 [506626.702078] [<ffffffffa0c98485>] vvp_io_read_start+0x3c5/0x470 [lustre] [506626.709674] [<ffffffffa086f965>] cl_io_start+0x65/0x130 [obdclass] [506626.716785] [<ffffffffa0872f85>] cl_io_loop+0xa5/0x190 [obdclass] [506626.723797] [<ffffffffa0c34e8c>] ll_file_io_generic+0x5fc/0xae0 [lustre] [506626.731477] [<ffffffffa0c35db2>] ll_file_aio_read+0x192/0x530 [lustre] [506626.738962] [<ffffffffa0c3621b>] ll_file_read+0xcb/0x1e0 [lustre] [506626.745962] [<ffffffff811dea1c>] vfs_read+0x9c/0x170 [506626.751700] [<ffffffff811df56f>] SyS_read+0x7f/0xe0 [506626.757345] [<ffffffff81646889>] system_call_fastpath+0x16/0x1b [506626.764138] [506626.765990] Kernel panic - not syncing: LBUG [506626.770850] CPU: 53 PID: 41604 Comm: cat Tainted: G OE ------------ 3.10.0-327.22.2.el7.x86_64 #1 [506626.782104] Hardware name: BULL bullx blade/CHPU, BIOS BIOSX07.037.01.003 10/23/2015 [506626.790838] ffffffffa0610ced 000000000f6a3070 ffff8817799eb8c0 ffffffff816360f4 [506626.799228] ffff8817799eb940 ffffffff8162f96a ffffffff00000008 ffff8817799eb950 [506626.807618] ffff8817799eb8f0 000000000f6a3070 ffffffffa0e01466 0000000000000246 [506626.816005] Call Trace: [506626.818839] [<ffffffff816360f4>] dump_stack+0x19/0x1b [506626.824668] [<ffffffff8162f96a>] panic+0xd8/0x1e7 [506626.830128] [<ffffffffa05ebe5b>] lbug_with_loc+0xab/0xc0 [libcfs] [506626.837129] [<ffffffffa0dea859>] osc_ap_completion.isra.30+0x4d9/0x5b0 [osc] [506626.845192] [<ffffffffa0df558d>] osc_queue_sync_pages+0x2dd/0x350 [osc] [506626.852766] [<ffffffffa0de750f>] osc_io_submit+0x42f/0x530 [osc] [506626.859702] [<ffffffffa086fbd6>] cl_io_submit_rw+0x66/0x170 [obdclass] [506626.867184] [<ffffffffa0b8d257>] lov_io_submit+0x2a7/0x420 [lov] [506626.874099] [<ffffffffa086fbd6>] cl_io_submit_rw+0x66/0x170 [obdclass] [506626.881611] [<ffffffffa0c67f70>] ll_readpage+0x2d0/0x560 [lustre] [506626.888609] [<ffffffff8116af87>] generic_file_aio_read+0x3b7/0x750 [506626.895721] [<ffffffffa0c98485>] vvp_io_read_start+0x3c5/0x470 [lustre] [506626.903322] [<ffffffffa086f965>] cl_io_start+0x65/0x130 [obdclass] [506626.910418] [<ffffffffa0872f85>] cl_io_loop+0xa5/0x190 [obdclass] [506626.917420] [<ffffffffa0c34e8c>] ll_file_io_generic+0x5fc/0xae0 [lustre] [506626.925091] [<ffffffffa0c35db2>] ll_file_aio_read+0x192/0x530 [lustre] [506626.932575] [<ffffffffa0c3621b>] ll_file_read+0xcb/0x1e0 [lustre] [506626.939569] [<ffffffff811dea1c>] vfs_read+0x9c/0x170 [506626.945300] [<ffffffff811df56f>] SyS_read+0x7f/0xe0 [506626.950938] [<ffffffff81646889>] system_call_fastpath+0x16/0x1b
The customer being a black site, we can't provide the crashdump, but will happily provide any text output you would find useful.