[LU-3521] LBUG in cl_page_cache_add() using '--enable-invariants' configure option Created: 27/Jun/13  Updated: 05/Sep/13  Resolved: 05/Sep/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.5.0

Type: Bug Priority: Minor
Reporter: Prakash Surya (Inactive) Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 8856

 Description   

Building Lustre using the '--enable-invariants' configure option and running a simple single shared file FIO causes an LBUG. This is easily reproducible using a stock 2.4.0 tag:

$ cat /tmp/test.fio                                                                                                                    
[global]
directory=/mnt/lustre
filename=ssf
size=1m
blocksize=1m
direct=0
ioengine=sync
numjobs=32
group_reporting=1
fallocate=none
runtime=60
stonewall

$ git describe
v2_4_50_0

$ sh autogen.sh && ./configure --enable-invariants && make
$ sudo FSTYPE=zfs sh ./lustre/tests/llmount.sh
$ sudo fio /tmp/test.fio

Here's the LBUG triggered:

LustreError: 2491:0:(cl_page.c:1392:cl_page_cache_add()) page@ffff880027691800[3 ffff88001c1ede70:0 ^(null)_ffff880027691600 1 0 1 ffff880037eef608 (null) 0x0]
LustreError: 2491:0:(cl_page.c:1392:cl_page_cache_add()) page@ffff880027691600[2 ffff880006e73148:0 ^ffff880027691800_(null) 1 0 1 ffff88003b2f2a10 (null) 0x0]
LustreError: 2491:0:(cl_page.c:1392:cl_page_cache_add()) vvp-page@ffff8800276918c0(0:0:0) vm@ffffea000024cf50 20000000000805 4:0 ffff880027691800 0 lru 
LustreError: 2491:0:(cl_page.c:1392:cl_page_cache_add()) lov-page@ffff880027691910
LustreError: 2491:0:(cl_page.c:1392:cl_page_cache_add()) osc-page@ffff8800276916e8: 1< 0x845fed 0 0 - - > 2< 0 0 0 0x0 0x100 | (null) ffff880010e8ca40 ffff88001c1de6c0 > 3< - (null) 0 0 0 > 4< 0 0 8 413696 - | - - - - > 5< - - - - | 0 - | 0 - ->
LustreError: 2491:0:(cl_page.c:1392:cl_page_cache_add()) end page@ffff880027691600
LustreError: 2491:0:(cl_page.c:1392:cl_page_cache_add()) cl_page_invariant(pg)  
LustreError: 2491:0:(cl_page.c:1392:cl_page_cache_add()) ASSERTION( 0 ) failed: 
LustreError: 2491:0:(cl_page.c:1392:cl_page_cache_add()) LBUG                      
Kernel panic - not syncing: LBUG                                                   
Pid: 2491, comm: fio Tainted: P           ---------------    2.6.32-358.5chaos.ch5.1.x86_64 #1
Call Trace:                                                                        
 [<ffffffff8150d3ec>] ? panic+0xa7/0x16f                                           
 [<ffffffffa0b69eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]                           
 [<ffffffffa05bfb34>] ? cl_page_cache_add+0x144/0x5b0 [obdclass]                   
 [<ffffffffa0c172aa>] ? lov_page_cache_add+0x15a/0x2d0 [lov]                       
 [<ffffffffa05bfc4f>] ? cl_page_cache_add+0x25f/0x5b0 [obdclass]                   
 [<ffffffffa05bd6ac>] ? cl_page_find0+0x47c/0x8f0 [obdclass]                       
 [<ffffffffa0931f61>] ? cl2osc_io+0x21/0xa0 [osc]                                  
 [<ffffffffa05b6219>] ? cl_env_hops_keycmp+0x19/0x70 [obdclass]                    
 [<ffffffffa0b7fa0d>] ? cfs_hash_bd_lookup_intent+0xdd/0x130 [libcfs]              
 [<ffffffffa0f63c55>] ? vvp_io_commit_write+0x3f5/0x5f0 [lustre]                   
 [<ffffffffa05cc82c>] ? cl_io_commit_write+0xdc/0x2e0 [obdclass]                   
 [<ffffffffa0f3856e>] ? ll_commit_write+0xee/0x320 [lustre]                        
 [<ffffffffa0f50860>] ? ll_write_end+0x30/0x60 [lustre]                            
 [<ffffffff8111a21a>] ? generic_file_buffered_write+0x18a/0x2e0                    
 [<ffffffff8111bc20>] ? __generic_file_aio_write+0x260/0x490                       
 [<ffffffffa05c18fc>] ? cl_lock_trace0+0x11c/0x130 [obdclass]                      
 [<ffffffff8111bed8>] ? generic_file_aio_write+0x88/0x100                          
 [<ffffffffa0f6614b>] ? vvp_io_write_start+0xcb/0x2d0 [lustre]                     
 [<ffffffffa05c8d7a>] ? cl_io_start+0xea/0x1f0 [obdclass]                          
 [<ffffffffa05ce0a0>] ? cl_io_loop+0x180/0x200 [obdclass]                          
 [<ffffffffa0f0bb80>] ? ll_file_io_generic+0x450/0x600 [lustre]                    
 [<ffffffffa0f0cac2>] ? ll_file_aio_write+0x142/0x2c0 [lustre]                     
 [<ffffffffa0f0cdac>] ? ll_file_write+0x16c/0x2a0 [lustre]                         
 [<ffffffff81180b38>] ? vfs_write+0xb8/0x1a0                                       
 [<ffffffff81181431>] ? sys_write+0x51/0x90                                        
 [<ffffffff810dc0a5>] ? __audit_syscall_exit+0x265/0x290                           
 [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b


 Comments   
Comment by Peter Jones [ 28/Jun/13 ]

Niu

Could you please advise on this one?

Thanks

Peter

Comment by Niu Yawei (Inactive) [ 01/Jul/13 ]

The assertion is triggered for the sub cl_page is not cl_page_in_use():

page@ffff880027691600[2 ffff880006e73148:0 ^ffff880027691800_(null) 1 0 1 ffff88003b2f2a10 (null) 0x0]

Actually, this assertion is incorrect for the sub cl_page, because we didn't hold extra ref for the sub cl_page before (see cl_vmpage_page()). I think there should be more other wrong asserstions like this, because we never test the '--enable-invariants' option.

Comment by Niu Yawei (Inactive) [ 01/Jul/13 ]

Fixed several wrong assumptions when 'enable-invariants': http://review.whamcloud.com/6832

Comment by Peter Jones [ 05/Sep/13 ]

Landed for 2.5

Generated at Sat Feb 10 01:34:37 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.