[LU-3521] LBUG in cl_page_cache_add() using '--enable-invariants' configure option Created: 27/Jun/13 Updated: 05/Sep/13 Resolved: 05/Sep/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.5.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Prakash Surya (Inactive) | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 8856 |
| Description |
|
Building Lustre using the '--enable-invariants' configure option and running a simple single shared file FIO causes an LBUG. This is easily reproducible using a stock 2.4.0 tag: $ cat /tmp/test.fio [global] directory=/mnt/lustre filename=ssf size=1m blocksize=1m direct=0 ioengine=sync numjobs=32 group_reporting=1 fallocate=none runtime=60 stonewall $ git describe v2_4_50_0 $ sh autogen.sh && ./configure --enable-invariants && make $ sudo FSTYPE=zfs sh ./lustre/tests/llmount.sh $ sudo fio /tmp/test.fio Here's the LBUG triggered: LustreError: 2491:0:(cl_page.c:1392:cl_page_cache_add()) page@ffff880027691800[3 ffff88001c1ede70:0 ^(null)_ffff880027691600 1 0 1 ffff880037eef608 (null) 0x0] LustreError: 2491:0:(cl_page.c:1392:cl_page_cache_add()) page@ffff880027691600[2 ffff880006e73148:0 ^ffff880027691800_(null) 1 0 1 ffff88003b2f2a10 (null) 0x0] LustreError: 2491:0:(cl_page.c:1392:cl_page_cache_add()) vvp-page@ffff8800276918c0(0:0:0) vm@ffffea000024cf50 20000000000805 4:0 ffff880027691800 0 lru LustreError: 2491:0:(cl_page.c:1392:cl_page_cache_add()) lov-page@ffff880027691910 LustreError: 2491:0:(cl_page.c:1392:cl_page_cache_add()) osc-page@ffff8800276916e8: 1< 0x845fed 0 0 - - > 2< 0 0 0 0x0 0x100 | (null) ffff880010e8ca40 ffff88001c1de6c0 > 3< - (null) 0 0 0 > 4< 0 0 8 413696 - | - - - - > 5< - - - - | 0 - | 0 - -> LustreError: 2491:0:(cl_page.c:1392:cl_page_cache_add()) end page@ffff880027691600 LustreError: 2491:0:(cl_page.c:1392:cl_page_cache_add()) cl_page_invariant(pg) LustreError: 2491:0:(cl_page.c:1392:cl_page_cache_add()) ASSERTION( 0 ) failed: LustreError: 2491:0:(cl_page.c:1392:cl_page_cache_add()) LBUG Kernel panic - not syncing: LBUG Pid: 2491, comm: fio Tainted: P --------------- 2.6.32-358.5chaos.ch5.1.x86_64 #1 Call Trace: [<ffffffff8150d3ec>] ? panic+0xa7/0x16f [<ffffffffa0b69eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] [<ffffffffa05bfb34>] ? cl_page_cache_add+0x144/0x5b0 [obdclass] [<ffffffffa0c172aa>] ? lov_page_cache_add+0x15a/0x2d0 [lov] [<ffffffffa05bfc4f>] ? cl_page_cache_add+0x25f/0x5b0 [obdclass] [<ffffffffa05bd6ac>] ? cl_page_find0+0x47c/0x8f0 [obdclass] [<ffffffffa0931f61>] ? cl2osc_io+0x21/0xa0 [osc] [<ffffffffa05b6219>] ? cl_env_hops_keycmp+0x19/0x70 [obdclass] [<ffffffffa0b7fa0d>] ? cfs_hash_bd_lookup_intent+0xdd/0x130 [libcfs] [<ffffffffa0f63c55>] ? vvp_io_commit_write+0x3f5/0x5f0 [lustre] [<ffffffffa05cc82c>] ? cl_io_commit_write+0xdc/0x2e0 [obdclass] [<ffffffffa0f3856e>] ? ll_commit_write+0xee/0x320 [lustre] [<ffffffffa0f50860>] ? ll_write_end+0x30/0x60 [lustre] [<ffffffff8111a21a>] ? generic_file_buffered_write+0x18a/0x2e0 [<ffffffff8111bc20>] ? __generic_file_aio_write+0x260/0x490 [<ffffffffa05c18fc>] ? cl_lock_trace0+0x11c/0x130 [obdclass] [<ffffffff8111bed8>] ? generic_file_aio_write+0x88/0x100 [<ffffffffa0f6614b>] ? vvp_io_write_start+0xcb/0x2d0 [lustre] [<ffffffffa05c8d7a>] ? cl_io_start+0xea/0x1f0 [obdclass] [<ffffffffa05ce0a0>] ? cl_io_loop+0x180/0x200 [obdclass] [<ffffffffa0f0bb80>] ? ll_file_io_generic+0x450/0x600 [lustre] [<ffffffffa0f0cac2>] ? ll_file_aio_write+0x142/0x2c0 [lustre] [<ffffffffa0f0cdac>] ? ll_file_write+0x16c/0x2a0 [lustre] [<ffffffff81180b38>] ? vfs_write+0xb8/0x1a0 [<ffffffff81181431>] ? sys_write+0x51/0x90 [<ffffffff810dc0a5>] ? __audit_syscall_exit+0x265/0x290 [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b |
| Comments |
| Comment by Peter Jones [ 28/Jun/13 ] |
|
Niu Could you please advise on this one? Thanks Peter |
| Comment by Niu Yawei (Inactive) [ 01/Jul/13 ] |
|
The assertion is triggered for the sub cl_page is not cl_page_in_use(): page@ffff880027691600[2 ffff880006e73148:0 ^ffff880027691800_(null) 1 0 1 ffff88003b2f2a10 (null) 0x0] Actually, this assertion is incorrect for the sub cl_page, because we didn't hold extra ref for the sub cl_page before (see cl_vmpage_page()). I think there should be more other wrong asserstions like this, because we never test the '--enable-invariants' option. |
| Comment by Niu Yawei (Inactive) [ 01/Jul/13 ] |
|
Fixed several wrong assumptions when 'enable-invariants': http://review.whamcloud.com/6832 |
| Comment by Peter Jones [ 05/Sep/13 ] |
|
Landed for 2.5 |