[LU-2637] Invalid kernel paging request in osc_import_event Created: 17/Jan/13  Updated: 14/Dec/21  Resolved: 14/Dec/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Prakash Surya (Inactive) Assignee: Zhenyu Xu
Resolution: Cannot Reproduce Votes: 0
Labels: llnl

Severity: 3
Rank (Obsolete): 6169

 Description   

I hit this today on a few of the IONs for Sequoia. I was running a mount and umount of the filesystem in a loop, upon killing the script I hit the crash below a some of the nodes:

LustreError: 8462:0:(llite_lib.c:543:client_common_fill_super()) cannot start close thread: rc -513
Unable to handle kernel paging request for data at address 0x00000010            
Faulting instruction address: 0x80000000046825cc                                
Oops: Kernel access of bad area, sig: 11 [#1]                                   
SMP NR_CPUS=68 Blue Gene/Q                                                      
Modules linked in: lmv(U) mgc(U) lustre(U) mdc(U) fid(U) fld(U) lov(U) osc(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) bgvrnic bgmudm
NIP: 80000000046825cc LR: 8000000003bc3acc CTR: 8000000004682590                
REGS: c0000003c4a0eaa0 TRAP: 0300   Not tainted  (2.6.32-220.23.3.bgq.18llnl.V1R1M2.bgq62_16.ppc64)
MSR: 0000000080029000 <EE,ME,CE>  CR: 44088488  XER: 20000000                   
DEAR: 0000000000000010, ESR: 0000000000000000                                   
TASK = c0000003e4fdf360[8462] 'mount.lustre' THREAD: c0000003c4a0c000 CPU: 3     
GPR00: 0000000003060580 c0000003c4a0ed20 80000000046f07a0 c0000003ce1726f0       
GPR04: c000000360b9f800 00000000000012e0 c0000003c4a0f0b0 00000000640a0000       
GPR08: c000000360b9f954 0000000000000000 0000000000000001 8000000004684240       
GPR12: 8000000003bfebb0 c000000000764c00 c0000003c4a68000 00000000000005c2       
GPR16: 0000000000000001 0000000000002d20 0000000002000400 000000000000028a       
GPR20: 0000000000000295 0000000000020000 80000000025220e0 0000000000000001       
GPR24: 0000000000000000 c0000003ce16c138 8000000000b2424c 0000000040080000       
GPR28: c0000003ce1726f0 8000000000b24248 80000000046eda40 c0000003c4a0ed20       
NIP [80000000046825cc] .osc_import_event+0xa5c/0x26d0 [osc]                     
LR [8000000003bc3acc] .ptlrpc_deactivate_import+0x1fc/0x7d0 [ptlrpc]            
Call Trace:                                                                     
[c0000003c4a0ed20] [c000000360b9f8b0] 0xc000000360b9f8b0 (unreliable)           
[c0000003c4a0ee60] [8000000003bc3acc] .ptlrpc_deactivate_import+0x1fc/0x7d0 [ptlrpc]
[c0000003c4a0ef30] [8000000003bc47b8] .ptlrpc_invalidate_import+0x1d8/0xef0 [ptlrpc]
[c0000003c4a0f0d0] [8000000004690238] .osc_precleanup+0x2a8/0x720 [osc]         
[c0000003c4a0f190] [80000000024ac3a0] .class_cleanup+0x240/0x17a0 [obdclass]    
[c0000003c4a0f310] [80000000024b3208] .class_process_config+0x20f8/0x4b00 [obdclass]
[c0000003c4a0f460] [80000000024b614c] .class_manual_cleanup+0x53c/0x1760 [obdclass]
[c0000003c4a0f5c0] [800000000691d6e4] .ll_put_super+0x2c4/0x800 [lustre]        
[c0000003c4a0f750] [800000000691e3d8] .ll_fill_super+0x7b8/0xae20 [lustre]      
[c0000003c4a0f900] [80000000024e1144] .lustre_fill_super+0x4c4/0x8e0 [obdclass] 
[c0000003c4a0f9d0] [c0000000000d4508] .get_sb_nodev+0x84/0xe8                   
[c0000003c4a0fa80] [80000000024ba618] .lustre_get_sb+0x28/0x40 [obdclass]       
[c0000003c4a0fb10] [c0000000000d2f14] .vfs_kern_mount+0x80/0x114                
[c0000003c4a0fbc0] [c0000000000d3010] .do_kern_mount+0x58/0x130                 
[c0000003c4a0fc80] [c0000000000f12fc] .do_mount+0x84c/0x908                     
[c0000003c4a0fd70] [c0000000000f1470] .SyS_mount+0xb8/0x124                     
[c0000003c4a0fe30] [c000000000000580] syscall_exit+0x0/0x2c                     
Instruction dump:                                                               
419e00f4 801a0000 20b14000 7fa50040 41dd1868 801d0000 780907e1 40820108         
eb7900c0 777b4008 4182097c e9390000 <e9290010> eb6901c0 2fbb0000 419e0d10       
Kernel panic - not syncing: Fatal exception                                      
Call Trace:                                                                     
[c0000003c4a0e7d0] [c000000000008d1c] .show_stack+0x7c/0x184 (unreliable)       
[c0000003c4a0e880] [c000000000431ef4] .panic+0x80/0x1ac                         
[c0000003c4a0e910] [c000000000019d40] .die+0x1a4/0x1bc                          
[c0000003c4a0e9b0] [c00000000001f95c] .bad_page_fault+0xb8/0xd4                 
[c0000003c4a0ea30] [c000000000014e4c] storage_fault_common+0x48/0x4c            
--- Exception: 300 at .osc_import_event+0xa5c/0x26d0 [osc]                      
    LR = .ptlrpc_deactivate_import+0x1fc/0x7d0 [ptlrpc]                          
[c0000003c4a0ed20] [c000000360b9f8b0] 0xc000000360b9f8b0 (unreliable)           
[c0000003c4a0ee60] [8000000003bc3acc] .ptlrpc_deactivate_import+0x1fc/0x7d0 [ptlrpc]
[c0000003c4a0ef30] [8000000003bc47b8] .ptlrpc_invalidate_import+0x1d8/0xef0 [ptlrpc]
[c0000003c4a0f0d0] [8000000004690238] .osc_precleanup+0x2a8/0x720 [osc]         
[c0000003c4a0f190] [80000000024ac3a0] .class_cleanup+0x240/0x17a0 [obdclass]    
[c0000003c4a0f310] [80000000024b3208] .class_process_config+0x20f8/0x4b00 [obdclass]
[c0000003c4a0f460] [80000000024b614c] .class_manual_cleanup+0x53c/0x1760 [obdclass]
[c0000003c4a0f5c0] [800000000691d6e4] .ll_put_super+0x2c4/0x800 [lustre]        
[c0000003c4a0f750] [800000000691e3d8] .ll_fill_super+0x7b8/0xae20 [lustre]      
[c0000003c4a0f900] [80000000024e1144] .lustre_fill_super+0x4c4/0x8e0 [obdclass] 
[c0000003c4a0f9d0] [c0000000000d4508] .get_sb_nodev+0x84/0xe8                   
[c0000003c4a0fa80] [80000000024ba618] .lustre_get_sb+0x28/0x40 [obdclass]       
[c0000003c4a0fb10] [c0000000000d2f14] .vfs_kern_mount+0x80/0x114                
[c0000003c4a0fbc0] [c0000000000d3010] .do_kern_mount+0x58/0x130                 
[c0000003c4a0fc80] [c0000000000f12fc] .do_mount+0x84c/0x908                     
[c0000003c4a0fd70] [c0000000000f1470] .SyS_mount+0xb8/0x124                     
[c0000003c4a0fe30] [c000000000000580] syscall_exit+0x0/0x2c 


 Comments   
Comment by Peter Jones [ 18/Jan/13 ]

Bobijam

Could you please look into this one?

Thanks

Peter

Comment by Zhenyu Xu [ 21/Jan/13 ]

Prakash Surya,

For 2.4 Lustre, I could not find the calling path which can reach osc_precleanup(osc, OBD_CLEANUP_EARLY) stage (only in that stage would ptlrpc_deactivate_import() be called), are you sure it's 2.4 Lustre?

Also could you decipher which code does osc_import_event+0xa5c refer to? The issue could be something hasn't set up completely while ll_put_super() presumes its totality.

Generated at Sat Feb 10 01:26:54 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.